C# LINQ: Transform and Query Collections with Elegance

Introduction

LINQ is the best feature in C#. Not generics. Not async/await. LINQ. It is the reason I reach for C# over Java for anything data-heavy, and it has been that way since C# 3.0 shipped it in 2007.

Everything implementing IEnumerable<T> gets the full pipeline: filter, project, group, join, aggregate, sort. Arrays, lists, dictionaries, XML, database tables through EF Core. You describe what you want, the compiler figures out how to iterate, and the resulting code reads like intent rather than mechanics. Nested foreach loops and temporary dictionaries disappear.

LINQ Basics: Query vs Method Syntax

Two syntaxes, same IL output. Query syntax looks like SQL. Method syntax chains lambdas. You need to pick one as your default, and for most teams the answer should be method syntax.

QueryVsMethod.cs

using System;
using System.Linq;
using System.Collections.Generic;
var numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
// Query syntax -- reads like SQLvar queryResult = from n in numbers
 where n % 2 == 0select n;
// Method syntax -- uses lambdas and chainingvar methodResult = numbers.Where(n => n % 2 == 0);
// Both produce: 2, 4, 6, 8, 10foreach (var n in queryResult)
 Console.Write($"{n} ");

Most open-source .NET code uses method syntax, most Stack Overflow answers use it, and it covers every LINQ operator. Query syntax has no equivalent for Count(), Take(), Distinct(), or a dozen other methods, so you end up mixing styles anyway. The one exception: joins. Joins genuinely read better in query syntax. But for everything else, method syntax wins by default.

Filtering and Projection

Where and Select are the bread and butter. You already know what they do. The interesting one is SelectMany — it flattens nested collections into a single sequence, and it is criminally underused.

FilterAndProject.cs

public recordEmployee(string Name, decimal Salary, string Role);
public recordDepartment(string Name, List<Employee> Staff);
var departments = new List<Department>
{
 new("Engineering", new()
 {
 new("Alice", 95000m, "Senior Dev"),
 new("Bob", 78000m, "Junior Dev"),
 new("Carol", 110000m, "Lead Dev")
 }),
 new("Marketing", new()
 {
 new("Dave", 72000m, "Analyst"),
 new("Eve", 88000m, "Manager")
 })
};
// Where: Filter employees earning above 80kvar highEarners = departments
 .SelectMany(d => d.Staff)
 .Where(e => e.Salary > 80000m);
// Select: Project into an anonymous typevar nameSalaryPairs = highEarners
 .Select(e => new { e.Name, AnnualPay = e.Salary });
// SelectMany: Flatten all employees from all departmentsvar allEmployeeNames = departments
 .SelectMany(d => d.Staff)
 .Select(e => e.Name);
// Output: Alice, Carol, Eve (high earners)foreach (var item in nameSalaryPairs)
 Console.WriteLine($"{item.Name}: {item.AnnualPay:C}");

Without SelectMany you get IEnumerable<List<Employee>> when you wanted the flat IEnumerable<Employee>. Once you learn it, you start seeing it everywhere: order items across orders, tags across blog posts, permissions across roles. So learn it now.

You can chain multiple Where calls and they compose as logical AND. Single predicate reads better for static filters. Separate Where calls work better when building filters dynamically from user input.

Ordering and Grouping

Sorting with OrderBy and ThenBy

OrderBy sorts ascending. OrderByDescending sorts descending. For a secondary sort, chain ThenBy or ThenByDescending. Do not use two OrderBy calls in a row — the second one re-sorts from scratch and throws away the first ordering. Seen this bug in production more than once.

Grouping with GroupBy

GroupBy partitions data by a key. Each group has a Key and is itself enumerable.

OrderingAndGrouping.cs

var products = new List<(string Name, string Category, decimal Price)>
{
 ("Laptop", "Electronics", 1200m),
 ("Headphones", "Electronics", 85m),
 ("Desk Chair", "Furniture", 350m),
 ("Monitor", "Electronics", 450m),
 ("Bookshelf", "Furniture", 180m),
 ("Keyboard", "Electronics", 120m),
 ("Standing Desk", "Furniture", 600m)
};
// Sort by category, then by price descendingvar sorted = products
 .OrderBy(p => p.Category)
 .ThenByDescending(p => p.Price);
// Group by category with a summaryvar grouped = products
 .GroupBy(p => p.Category)
 .Select(g => new
 {
 Category = g.Key,
 Count = g.Count(),
 AvgPrice = g.Average(p => p.Price),
 MostExpensive = g.OrderByDescending(p => p.Price).First().Name
 });
foreach (var group in grouped)
{
 Console.WriteLine(
 $"{group.Category}: {group.Count} items, " +
 $"Avg {group.AvgPrice:C}, Top: {group.MostExpensive}");
}
// Electronics: 4 items, Avg $463.75, Top: Laptop// Furniture: 3 items, Avg $376.67, Top: Standing Desk

Nothing surprising here.

GroupBy has to iterate the entire source to build the groups. It cannot produce results lazily. For very large in-memory collections, that matters.

Joining Data Sources

LINQ joins are ugly in method syntax. Five arguments to Join: outer collection, inner collection, outer key selector, inner key selector, result selector. You will get the argument order wrong the first time. And the second. GroupJoin is the same idea but groups all matches from the second collection under each item from the first — conceptually a left outer join. The method syntax version below has customers joined with their orders, and then a GroupJoin that calculates totals per customer. Read through the key selectors carefully — once the pattern clicks, it sticks. But getting there is genuinely painful compared to writing the same join in SQL, which is why query syntax earns its keep for exactly this use case.

JoiningData.cs

public recordCustomer(int Id, string Name, string City);
public recordOrder(int OrderId, int CustomerId, decimal Total);
var customers = new List<Customer>
{
 new(1, "Priya", "Toronto"),
 new(2, "Marco", "Milan"),
 new(3, "Yuki", "Tokyo")
};
var orders = new List<Order>
{
 new(101, 1, 250.00m), new(102, 2, 175.50m),
 new(103, 1, 89.99m), new(104, 3, 320.00m),
 new(105, 2, 410.75m)
};
// Inner Join: pair each order with its customervar orderDetails = customers.Join(
 orders,
 c => c.Id, // outer key
 o => o.CustomerId, // inner key
 (c, o) => new { c.Name, o.OrderId, o.Total });
// GroupJoin: each customer with ALL their ordersvar customerOrders = customers.GroupJoin(
 orders,
 c => c.Id,
 o => o.CustomerId,
 (c, orderGroup) => new
 {
 c.Name,
 OrderCount = orderGroup.Count(),
 TotalSpent = orderGroup.Sum(o => o.Total)
 });
foreach (var co in customerOrders)
 Console.WriteLine($"{co.Name}: {co.OrderCount} orders, {co.TotalSpent:C}");
// Priya: 2 orders, $339.99// Marco: 2 orders, $586.25// Yuki: 1 orders, $320.00

And here is why I said query syntax wins for joins. Same logic, actually readable:

JoinQuerySyntax.cs

// Query syntax makes joins very readablevar result = from c in customers
 join o in orders on c.Id equals o.CustomerId
 select new { c.Name, c.City, o.OrderId, o.Total };
// Cross join: every customer paired with every ordervar crossJoin = from c in customers
 from o in orders
 select new { c.Name, o.OrderId };

Cross joins give you every combination of two sets. Useful for test matrices or cartesian products. But two lists of 1,000 items each produce a million results. Memory goes fast.

Aggregation Operations

Sum, Average, Count, Min, Max — self-explanatory. The interesting one is Aggregate. It is a general-purpose fold: you provide an accumulator function, it walks the entire sequence. String concatenation, running totals, building objects from sequences — anything that collapses a collection into one result.

Aggregation.cs

var scores = new List<int> { 92, 87, 75, 95, 68, 88, 91 };
// Built-in aggregationsvar total = scores.Sum(); // 596var average = scores.Average(); // 85.14...var count = scores.Count(); // 7var highest = scores.Max(); // 95var lowest = scores.Min(); // 68// Count with predicate: how many passed (>= 80)?var passed = scores.Count(s => s >= 80); // 5// Custom Aggregate: build a comma-separated stringvar words = new[] { "LINQ", "is", "incredibly", "powerful" };
var sentence = words.Aggregate((current, next) =>
 $"{current} {next}");
// "LINQ is incredibly powerful"// Aggregate with seed: running productvar nums = new[] { 2, 3, 4 };
var product = nums.Aggregate(1, (acc, n) => acc * n);
// 1 * 2 * 3 * 4 = 24
Console.WriteLine($"Total: {total}, Avg: {average:F1}, Passed: {passed}/{count}");
Console.WriteLine(sentence);

The seeded overload is clean — separate the initial state from the accumulator logic. But watch out: Min, Max, Average, and seedless Aggregate all throw InvalidOperationException on empty sequences. Guard with Any() or prepend DefaultIfEmpty(). This will bite you in production if you forget.

Deferred vs Immediate Execution

This is where people get burned.

LINQ queries do not execute when you define them. They execute when you enumerate the results — in a foreach, a ToList() call, or any method that needs the actual values. So if the underlying data changes between query definition and enumeration, the query sees the changed data. Sounds obvious written out like this. But in a real codebase with multiple methods passing IEnumerable<T> around, the execution point gets far enough from the definition point that the behavior surprises people.

DeferredExecution.cs

var names = new List<string> { "Alice", "Bob", "Charlie" };
// Deferred: query is defined but NOT executed yetvar query = names.Where(n => n.Length > 3);
// Modify the source AFTER defining the query
names.Add("Diana");
// NOW the query executes -- and it sees Diana!foreach (var name in query)
 Console.WriteLine(name);
// Output: Alice, Charlie, Diana// Immediate execution: snapshot the results with ToList()var snapshot = names.Where(n => n.Length > 3).ToList();
names.Add("Evelyn");
// snapshot does NOT contain Evelyn
Console.WriteLine($"Snapshot count: {snapshot.Count}"); // 3// Methods that force immediate execution:// ToList(), ToArray(), ToDictionary(), ToHashSet()// Count(), Sum(), Average(), Min(), Max(), First(), Single()// Any(), All(), Contains()

Returns IEnumerable<T>? Deferred. Returns a concrete value? Immediate. Where, Select, OrderBy — deferred. ToList, Count, First, Any — immediate.

And if you enumerate the same deferred query twice, the work runs twice. Against a database, that means two SQL queries. Materialize with ToList() when you know you will need the results more than once. Or don't, and watch your DBA file a bug report.

LINQ Performance Tips

LINQ is not free.

The most common mistake is enumerating the same query multiple times. Pass an IEnumerable<T> to two consumers, and the work runs twice. Materialize with ToList() or ToArray() when reuse is likely. Any() short-circuits on the first element. Count() > 0 iterates the entire collection. For checking emptiness, always prefer Any(). FirstOrDefault() stops on the first match. SingleOrDefault() scans everything to verify uniqueness even after finding it. Use Single only when you actually need that guarantee.

Filter before you sort. Always. Where is O(n) and OrderBy is O(n log n). Filtering first means sorting a smaller collection, and the difference is dramatic when the filter eliminates most elements.

If you call Contains() inside a Where predicate against another collection, convert that collection to a HashSet<T> first. HashSet lookup is O(1); List lookup is O(n). LINQ will not make that conversion for you, and I have seen this exact pattern turn a 50ms endpoint into a 12-second one on a list of 100k items.

For game engine inner loops or real-time processing, plain for loops over arrays or Span<T> will outperform LINQ. Fine. But replacing readable queries with manual loops "just in case" is the kind of premature optimization that makes codebases worse, not better. Write clear LINQ first. Profile. Optimize where the numbers say it matters.

Conclusion

Where LINQ stops helping is anything that needs to translate to efficient SQL through EF Core — complex CTEs, window functions, database-specific hints. At that point you are fighting the abstraction instead of benefiting from it, and raw SQL or stored procedures are the honest answer.

Anurag Sinha

Founder & Lead Developer

Anurag is a full stack developer and self-taught programmer from India with over five years of experience building web applications, APIs, and cloud-native systems. He personally researches, writes, tests, and reviews every tutorial on Codertronix.