C# Case Expression via Extension Methods

As a veteran C# developer yourself, I’m sure you’re familiar with the `switch` statement. Since it is a statement this means you cannot effectively use this ever-so-useful construct in an expression, such as in a LINQ query. This is a shame, and it irks me greatly that I have to resort to emulating the switch behavior with a series of chained ternary operators (a ? b : c ? d : e ? f : g …) in LINQ. Lucky for you, I am more susceptible to NIH than any man alive. I felt the need to investigate options in making a functional `case` expression for C#, or an equivalent cheap look-alike :).

LINQ is great at expressing basic data transformation operations like joins, groupings, projections, etc., but it’s not so great at conditional processing. A foreach loop with a switch statement would be a better engine for this type of task, but frankly sometimes it’s just too tempting to start off by writing a LINQ query to get the job done. Using LINQ brings you the benefits of not having to worry about implementation details while also not increasing your bug surface area when you implement these basic transformations wrong.

Let’s look at an example naive query that needs to do conditional processing:

from line in lines
where line.Length > 0
let cols = line.Split('\t')
where cols.Length == 2
let row = new {
  // 'A' = added, 'D' = deleted, 'M' = modified
  operation = cols[0].Trim(),
  data = cols[1].Trim()
let added = "added " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()
let deleted = "deleted " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()
let modified = "modified " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()
select (row.operation == "A") ? added
  : (row.operation == "D") ? deleted
  : (row.operation == "M") ? modified
  : String.Empty

This works but is wasteful in terms of processing since we’re always generating the `added`, `deleted`, and `modified` strings regardless of the input condition. The more dependent variables you introduce, the more waste the query has to generate and select out.

What we really want here is a switch statement, but a statement cannot belong in an expression. Expressions may only be composed of other expressions. Let’s see how this query transforms when I introduce my Case() extension method that I wrote for LinqFilter:

from line in lines
where line.Length > 0
let cols = line.Split('\t')
where cols.Length == 2
let row = new {
  // 'A' = added, 'D' = deleted, 'M' = modified
  operation = cols[0].Trim(),
  data = cols[1].Trim()
select row.operation.Case(
  () => String.Empty,                           // default case
  Match.Case("A", () => "added " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()),
  Match.Case("D", () => "deleted " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()),
  Match.Case("M", () => "modified " + row.data + " " + File.ReadAllLines(row.data).Length.ToString())

Through the use of generic extension methods and lambda expressions, we’re able to get a lot of expressibility here. This overload of the Case() extension method accepts first a default case lambda expression which will only be invoked when all other cases fail to match the source value, which in this case is `row.operation`’s value. What follows is a `params CaseMatch<T, U>[]` which is C# syntactic sugar for writing something along the lines of `new CaseMatch<T, U>[] { … }` at the call site.

These `CaseMatch<T, U>`s are small containers that hold the case match value and the lambda expression to invoke to yield the result of the case expression if the match is made. We use lambdas so that the expression to return is not evaluated until the match is made. This prevents unnecessary work from being done or causing side effects. Think of it as passing in a function to be evaluated rather than hard-coding an expression in the parameter to be evaluated at the call site of the Case() extension method. There are two generic arguments used: `T` and `U`. `T` represents the type you are matching on and `U` represents the type you wish to define as the result of the Case() method. Just because you are matching on string values doesn’t mean you always want to return a string value from your case expressions. :)

A small static class named `Match` was created which houses a single static method `Case` in order to shorten the syntax of creating `CaseMatch<T,U>` instances. Since static methods can use type inference to automagically determine your `T` and `U` generic types, this significantly shortens the amount of code you have to write in order to define cases. Otherwise, you would have to write `new CaseMatch<string, string>(“A”, () => “added” + row.data)` each time. Which looks shorter/simpler to you?

When you call the Case() extension method, the `CaseMatch<T,U>` params array is processed in order and each test value is compared against the source value which Case() was called on. If there is a match, the method returns the evaluated lambda for that case. There is no checking for non-unique test values, so if you repeat a case only the first case will ever receive the match. It is an O(n) straightforward algorithm and does no precomputation or table lookups.

Another overload of Case() is available for you to provide an `IEqualityComparer<T>` instance. This is a big win over the switch statement IMO, which to the best of my knowledge does not allow custom equality comparers to perform the matching logic and is limited to the behavior set forth by the C# language specification.

With this ability, you could specify `StringComparer.OrdinalIgnoreCase` in order to do case-insensitive string matching, something not easily/safely done with the switch statement. The ability to supply a custom IEqualityComparer<T> also opens up the set of possibilities for doing case matches on non-primitive types, like custom classes and structs that would not normally be able to be used in a switch statement.

In order to play with this extension method for yourselves, either browse my LinqFilter SVN repository and download the code from the LinqFilter.Extensions project (CaseMatch.cs and Extensions/CaseExtensions.cs), or download the LinqFilter release and play with it in your LINQ queries.

LinqFilter: Run LINQ Code From The Command Line Interface!

Having recently acquired a taste for using git on Windows with msysGit, I’ve been getting a lot more productive with my use of bash and other command-line tools in Windows. Shifting data around on the command line gets pretty hairy very quickly. Unfortunately, the basic set of Un*x utilities that process text data is just not powerful/flexible enough and usually each tool has some ridiculous custom syntax to learn, all of them different. I already know a  language powerful enough to process text efficiently, succinctly, and cleanly: LINQ! So I thought to myself, why not take advantage of LINQ to write simple little one-off text-processing scripts? Creating a new console application every time to handle this task becomes arduous, to say the least. Enter: LinqFilter!

For the impatient, you may download the latest release of the tool here (ZIP download). If this tool ever becomes popular enough, I have no problems hosting it elsewhere.

LinqFilter is, in a nutshell, a way to dynamically compile user-supplied C# v3.5 LINQ code and execute it instantly, sending the resulting item strings to Console.Out delimited by newlines or custom delimiters. An input IEnumerable<string> named lines is provided to allow the query to read lines from Console.In. There are many command-line options available to customize how LinqFilter behaves.

Let’s take the following example LINQ query:

LinqFilter -q "from line in lines select line"

This is a simple echo query. It will echo all lines read in from Console.In to Console.Out and it will do so in a streaming fashion. There is no storage of lines read in or written out. As a line comes in, it is run through the query and written out.

The “-q” command-line option appends a line of code to the ((QUERY)) buffer. You could supply multiple lines of code by supplying multiple “-q” options in order.

How does this work? The LinqFilter tool basically concatenates your query code into the following abbreviated class template:

public static class DynamicQuery {
    public static IEnumerable<string> GetQuery(IEnumerable<string> lines, string[] args) {
        IEnumerable<string> query = ((QUERY));
        return query;

The ((QUERY)) token is your query expression code. The ((PRE)) token is replaced with lines of C# code you supply in order to do one-time, pre-query setup and validation work. The ((POST)) token is replaced with lines of C# code you supply that takes effect after the query variable is assigned. This section is rarely used but is there for completeness.

As you can see, the query is enclosed in a simple static method that returns an IEnumerable<string>. The host console application supplies the lines from Console.In, but your query is not required to use that and can source data from somewhere else, or make up its own. :)

The args parameter is used to collect command-line arguments from the “-a <argument>” command-line option so that the query may be stored in a static file yet still use dynamic data passed in from the command line.

Let’s look at an example with a ((PRE)) section:

LinqFilter -pre "if (args.Length == 0) throw new Exception(\"Need an argument!\");" -q "from line in lines where line.StartsWith(args[0]) select line.Substring(args[0].Length)" -a "Hello"

Here we put in a full C# statement in the ((PRE)) section via the “-pre” command-line option to handle validation of arguments. The query itself is a simple filter to only return lines that start with args[0], i.e. “Hello”.

The best feature of the tool is the ability to store your queries off into a separate file and use the “-i” parameter to import them. Let’s leave that for another time.

In the mean time, I encourage you to download the tool and explore its immense usefulness. I must have written 30 or so one-off queries by now. I find new uses for it every day, which makes it a fantastic tool in my opinion and I’m very glad I took the time to write it. I hope you enjoy it and find it just as useful as I have!

P.S. – if you ever get lost, just type LinqFilter –help on the command line with no arguments and a detailed usage text will appear. :)

IEnumerable and LINQ

I cannot stress enough the importance of knowing how LINQ queries work when they are based on an IEnumerable source.

When one defines a query based on an IEnumerable source, the query variable represents just that: the query, NOT the results of enumerating the query.

Each time you enumerate over the query object, you are calculating the results of that query on-demand. There is no caching of results. The LINQ IEnumerable implementation makes no assumptions that enumerating the same query twice in a row will produce the same results and so it let’s you do so without any qualms.

If what you meant to do was to run the query once and store the results for future operations to work on, then creating a List variable to store those results in memory sounds like a reasonable approach to solving this problem. List implements IEnumerable so it is a good candidate for replacement in future LINQ queries that want to work on the results of the first query, not having to constantly recompute that query itself multiple times.

var query = from x in something select x;
var results = query.ToList();

Use the `results` variable when you want to reference the results of `query`.

The same restrictions apply to IQueryable LINQ queries. When enumerating over an IQueryable, it calls the underlying IQueryProvider to transform your query operations into whatever form is best for that provider to execute your query against its data source.

Be *very* careful when including IQueryable variables in another IQueryable LINQ query because you will be effectively telling your query provider to combine those queries together, and it will be up to the query provider to figure out how to do so or to raise an exception telling you that that’s unsupported or simply not possible. If you rather meant to pass the results of that query into another then you should use the AsEnumerable extension method. That should guarantee that the two queries are kept independent and that the results of the first are fed into the second.

As an aside, the ToList extension method *always* creates a new List instance, regardless of the type of IEnumerable it is called on. Be careful when using this method too many times because you will create a new list each time it is called. This can be very wasteful with memory.

Also, when taking an IEnumerable as a parameter to your method, it would be wise to provide the guarantee to the caller that it will be enumerated only zero or one times. You can accomplish this by being careful in your implementation or by using the ToList extension method to work on a local List variable when you know you need to work on the set more than once.