LinqFilter Code Injection

Unless you’ve been living under a rock, you may have heard of such things as SQL Injection and XSS (cross-site scripting) attacks, just to name the popular forms of the attack. These types of attacks are language specific forms of the general “Code Injection” attack. These types of attacks happen when developers concatenate source code with unchecked user input and get to the stage where that fully trusted source code is eventually executed.

SQL Injection happens when a developer concatenates user input into a SQL Query but the user input may contain special characters like single-quotes which changes the meaning of the query. You could do this in a safe way by either properly escaping the raw user input or passing it in to the SQL Server out-of-band in a parameterized query. But I’m not here to blather on about SQL Injection attacks.

Today I’d like to talk about C# Code Injection, specifically as it applies to LinqFilter.

In LinqFilter, I naively concatenate what should be untrusted user input LINQ code directly into an implicity trusted C# code template embedded in the program. I invoke the C# compiler provider on that resulting string and execute the code in the compiled assembly. By concatenating unchecked user input into an existing code template, I’ve opened the door to C# Code Injection. The user has numerous ways of defeating the requirement that the query be a valid C# 3.5 expression.

One can easily terminate the expression with a ‘;’ and go on to write regular C# statements. LinqFilter will naively concatenate your code that it assumes is an expression into the code template and compile it. The compiler doesn’t care what happened so long as the code compiles as a valid C# program. This defeats the purpose (somewhat) of the LinqFilter tool so it’s generally not wise to do so and you’d only be hurting yourself, unless of course you’re executing someone else’s malicious LinqFilter scripts.

What I’d really like to do is put in a safety check to validate that the user input is indeed a C# 3.5 expression. This involves getting a parser for the C# 3.5 language, not an easy task. All I need to do is simply validate that the input parses correctly as an expression and only an expression. I don’t need to do any type checking or semantic analysis work; the compiler will handle that. I just need to know that the user can’t “escape out” of the expression boundaries.

Unfortunately, there are no prepackaged C# parsers that I could find. This means using one of the plethora of available parser-generator toolkits or writing my own. Neither options are sounding very attractive to me at the moment so I’ll probably just hold off entirely on the work since it provides no benefit to myself. I know how to use LinqFilter effectively and there’s clearly no community around it (yet) that are sharing scripts so there’s really no security problem. If someone else is concerned enough about the problem, I gladly welcome patches :).

I think I’d also like to figure out how the CAS (code access security) feature works and incorporate that into the generated code so that queries are executed in a safe context and won’t be going out and destroying your files by invoking File.Delete() or something sinister, but this is a low priority as well.

C# Case Expression via Extension Methods

As a veteran C# developer yourself, I’m sure you’re familiar with the `switch` statement. Since it is a statement this means you cannot effectively use this ever-so-useful construct in an expression, such as in a LINQ query. This is a shame, and it irks me greatly that I have to resort to emulating the switch behavior with a series of chained ternary operators (a ? b : c ? d : e ? f : g …) in LINQ. Lucky for you, I am more susceptible to NIH than any man alive. I felt the need to investigate options in making a functional `case` expression for C#, or an equivalent cheap look-alike :).

LINQ is great at expressing basic data transformation operations like joins, groupings, projections, etc., but it’s not so great at conditional processing. A foreach loop with a switch statement would be a better engine for this type of task, but frankly sometimes it’s just too tempting to start off by writing a LINQ query to get the job done. Using LINQ brings you the benefits of not having to worry about implementation details while also not increasing your bug surface area when you implement these basic transformations wrong.

Let’s look at an example naive query that needs to do conditional processing:

from line in lines
where line.Length > 0
let cols = line.Split('\t')
where cols.Length == 2
let row = new {
  // 'A' = added, 'D' = deleted, 'M' = modified
  operation = cols[0].Trim(),
  data = cols[1].Trim()
}
let added = "added " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()
let deleted = "deleted " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()
let modified = "modified " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()
select (row.operation == "A") ? added
  : (row.operation == "D") ? deleted
  : (row.operation == "M") ? modified
  : String.Empty

This works but is wasteful in terms of processing since we’re always generating the `added`, `deleted`, and `modified` strings regardless of the input condition. The more dependent variables you introduce, the more waste the query has to generate and select out.

What we really want here is a switch statement, but a statement cannot belong in an expression. Expressions may only be composed of other expressions. Let’s see how this query transforms when I introduce my Case() extension method that I wrote for LinqFilter:

from line in lines
where line.Length > 0
let cols = line.Split('\t')
where cols.Length == 2
let row = new {
  // 'A' = added, 'D' = deleted, 'M' = modified
  operation = cols[0].Trim(),
  data = cols[1].Trim()
}
select row.operation.Case(
  () => String.Empty,                           // default case
  Match.Case("A", () => "added " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()),
  Match.Case("D", () => "deleted " + row.data + " " + File.ReadAllLines(row.data).Length.ToString()),
  Match.Case("M", () => "modified " + row.data + " " + File.ReadAllLines(row.data).Length.ToString())
)

Through the use of generic extension methods and lambda expressions, we’re able to get a lot of expressibility here. This overload of the Case() extension method accepts first a default case lambda expression which will only be invoked when all other cases fail to match the source value, which in this case is `row.operation`’s value. What follows is a `params CaseMatch<T, U>[]` which is C# syntactic sugar for writing something along the lines of `new CaseMatch<T, U>[] { … }` at the call site.

These `CaseMatch<T, U>`s are small containers that hold the case match value and the lambda expression to invoke to yield the result of the case expression if the match is made. We use lambdas so that the expression to return is not evaluated until the match is made. This prevents unnecessary work from being done or causing side effects. Think of it as passing in a function to be evaluated rather than hard-coding an expression in the parameter to be evaluated at the call site of the Case() extension method. There are two generic arguments used: `T` and `U`. `T` represents the type you are matching on and `U` represents the type you wish to define as the result of the Case() method. Just because you are matching on string values doesn’t mean you always want to return a string value from your case expressions. :)

A small static class named `Match` was created which houses a single static method `Case` in order to shorten the syntax of creating `CaseMatch<T,U>` instances. Since static methods can use type inference to automagically determine your `T` and `U` generic types, this significantly shortens the amount of code you have to write in order to define cases. Otherwise, you would have to write `new CaseMatch<string, string>(“A”, () => “added” + row.data)` each time. Which looks shorter/simpler to you?

When you call the Case() extension method, the `CaseMatch<T,U>` params array is processed in order and each test value is compared against the source value which Case() was called on. If there is a match, the method returns the evaluated lambda for that case. There is no checking for non-unique test values, so if you repeat a case only the first case will ever receive the match. It is an O(n) straightforward algorithm and does no precomputation or table lookups.

Another overload of Case() is available for you to provide an `IEqualityComparer<T>` instance. This is a big win over the switch statement IMO, which to the best of my knowledge does not allow custom equality comparers to perform the matching logic and is limited to the behavior set forth by the C# language specification.

With this ability, you could specify `StringComparer.OrdinalIgnoreCase` in order to do case-insensitive string matching, something not easily/safely done with the switch statement. The ability to supply a custom IEqualityComparer<T> also opens up the set of possibilities for doing case matches on non-primitive types, like custom classes and structs that would not normally be able to be used in a switch statement.

In order to play with this extension method for yourselves, either browse my LinqFilter SVN repository and download the code from the LinqFilter.Extensions project (CaseMatch.cs and Extensions/CaseExtensions.cs), or download the LinqFilter release and play with it in your LINQ queries.

LinqFilter: Run LINQ Code From The Command Line Interface!

Having recently acquired a taste for using git on Windows with msysGit, I’ve been getting a lot more productive with my use of bash and other command-line tools in Windows. Shifting data around on the command line gets pretty hairy very quickly. Unfortunately, the basic set of Un*x utilities that process text data is just not powerful/flexible enough and usually each tool has some ridiculous custom syntax to learn, all of them different. I already know a  language powerful enough to process text efficiently, succinctly, and cleanly: LINQ! So I thought to myself, why not take advantage of LINQ to write simple little one-off text-processing scripts? Creating a new console application every time to handle this task becomes arduous, to say the least. Enter: LinqFilter!

For the impatient, you may download the latest release of the tool here (ZIP download). If this tool ever becomes popular enough, I have no problems hosting it elsewhere.

LinqFilter is, in a nutshell, a way to dynamically compile user-supplied C# v3.5 LINQ code and execute it instantly, sending the resulting item strings to Console.Out delimited by newlines or custom delimiters. An input IEnumerable<string> named lines is provided to allow the query to read lines from Console.In. There are many command-line options available to customize how LinqFilter behaves.

Let’s take the following example LINQ query:

LinqFilter -q "from line in lines select line"

This is a simple echo query. It will echo all lines read in from Console.In to Console.Out and it will do so in a streaming fashion. There is no storage of lines read in or written out. As a line comes in, it is run through the query and written out.

The “-q” command-line option appends a line of code to the ((QUERY)) buffer. You could supply multiple lines of code by supplying multiple “-q” options in order.

How does this work? The LinqFilter tool basically concatenates your query code into the following abbreviated class template:

public static class DynamicQuery {
    public static IEnumerable<string> GetQuery(IEnumerable<string> lines, string[] args) {
        ((PRE))
        IEnumerable<string> query = ((QUERY));
        ((POST))
        return query;
    }
}

The ((QUERY)) token is your query expression code. The ((PRE)) token is replaced with lines of C# code you supply in order to do one-time, pre-query setup and validation work. The ((POST)) token is replaced with lines of C# code you supply that takes effect after the query variable is assigned. This section is rarely used but is there for completeness.

As you can see, the query is enclosed in a simple static method that returns an IEnumerable<string>. The host console application supplies the lines from Console.In, but your query is not required to use that and can source data from somewhere else, or make up its own. :)

The args parameter is used to collect command-line arguments from the “-a <argument>” command-line option so that the query may be stored in a static file yet still use dynamic data passed in from the command line.

Let’s look at an example with a ((PRE)) section:

LinqFilter -pre "if (args.Length == 0) throw new Exception(\"Need an argument!\");" -q "from line in lines where line.StartsWith(args[0]) select line.Substring(args[0].Length)" -a "Hello"

Here we put in a full C# statement in the ((PRE)) section via the “-pre” command-line option to handle validation of arguments. The query itself is a simple filter to only return lines that start with args[0], i.e. “Hello”.

The best feature of the tool is the ability to store your queries off into a separate file and use the “-i” parameter to import them. Let’s leave that for another time.

In the mean time, I encourage you to download the tool and explore its immense usefulness. I must have written 30 or so one-off queries by now. I find new uses for it every day, which makes it a fantastic tool in my opinion and I’m very glad I took the time to write it. I hope you enjoy it and find it just as useful as I have!

P.S. – if you ever get lost, just type LinqFilter –help on the command line with no arguments and a detailed usage text will appear. :)

Immutable DataContract generator for Visual Studio 2008 T4

At a first glance, using WCF appears to limit one’s capabilities in working with immutable data structures, namely immutable data contracts (classes decorated with [DataContract] and [DataMember]) attributes. After some thought and a little experimentation, I came to a reasonable solution implemented in T4 where one can code-generate the immutable data structure and its related mutable class builder structure used to construct said immutable data contract instances.

First, let me demonstrate and explain a bit of the basic code pattern behind immutable data contracts before we move onto the T4 solution. Hopefully, the code generation opportunities may strike you as obvious as they did to me.

[DataContract(Name = "DataContract1")]
public class DataContract1
{
    [DataMember(Name = "ID")]
    private int _ID;
    [DataMember(Name = "Code")]
    private string _Code;
    [DataMember(Name = "Name")]
    private string _Name;

    private DataContract1() { }

    public int ID { get { return this._ID; } }
    public string Code { get { return this._Code; } }
    public string Name { get { return this._Name; } }

    public sealed class Mutable
    {
        public int ID { get; set; }
        public string Code { get; set; }
        public string Name { get; set; }
    }

    public static implicit operator DataContract1(Mutable mutable)
    {
        var imm = new DataContract1();
        imm._ID = mutable.ID;
        imm._Code = mutable.Code;
        imm._Name = mutable.Name;
        return imm;
    }
}

This is the very basic, bare-bones code for a fully-functional immutable data contract. WCF, requiring full-trust access, will serialize and deserialize to/from the private fields marked with the [DataMember] attributes at runtime using reflection in its DataContractSerializer.

Note that public instantiation of the immutable DataContract1 class is strictly forbidden as we only have a default private constructor. No one may access our private fields either. The only exposed public properties are read-only.

Now let’s look at the nested Mutable class. This is what you would normally expect to see a regular mutable DataContract implemented as, except it has no [DataContract] attribute on its class declaration nor any [DataMember] attributes decorated on its public properties, so we cannot use it as a WCF data contract (well, strictly speaking, we could, but it wouldn’t be recommended as it has virtually no metadata exposed for it).

The real interesting part is this implicit conversion operator defined in the main immutable DataContract1 class. This is a nary-used, albeit powerful feature of C#. One may implement an implicit or explicit conversion operator to allow two classes or structs to be convertible between one another (or only in one direction) using either an implicit cast-like behavior or an explicit one at the “call site”. A conversion operator operates much like a traditional type cast. You’ve seen these everywhere. It’s when you do `object a; string b = (string)a;` That (string)a expression is a unary (takes one operand) cast expression. It tells the compiler that you, as the developer, knows for absolute sure that the object stored in `a` is of type `string` and to store the string reference value in the `b` variable.

An explicit conversion operator is closest in appearance to the cast expression. The difference between implicit and explicit is that implicit does not require you to write a cast expression at the conversion site where the conversion is required.

In our case with the immutable (DataContract1) and mutable (DataContract1.Mutable) classes, our implicit conversion operator allows us to “convert” the mutable builder class into a new instance of the immutable class. This will appear completely transparent to the developer, which is a nice feature.

A usage example of this pattern would be as follows:

class Program
{
    static DataContract1 CreateDC1()
    {
        return new DataContract1.Mutable()
        {
            ID = 27,
            Code = "JSD",
            Name = "James S. Dunne"
        };
        // Notice that the `DataContract1.Mutable` reference in the return statement is implicitly converted to
        // the required `DataContract1` return type via the static implicit operator we defined.
    }

    static void Main(string[] args)
    {
        var dc1 = CreateDC1();
        WriteDC1(dc1);
    }

    static void WriteDC1(DataContract1 dc1)
    {
        Console.WriteLine(dc1.ID.ToString());
        Console.WriteLine(dc1.Code);
        Console.WriteLine(dc1.Name);

        dc1.Name = "Bob Smith";              // ERROR! Cannot modify because the property is read-only.
    }
}

This small program is demonstrating how one would use the Mutable nested builder class to construct an immutable instance from. Creating immutable instances of objects should be done once and only once. From there it is impossible to modify (mutate) them further, unless the types of your properties exposed are mutable themselves.

This immutability guarantee is therefore shallow, and it is up to you to guarantee deep immutability through all of your exposed types. For instance, if you expose a List<T> property via this immutable structure, other functions are free to modify your List by adding, removing, and modifying elements within it, but they cannot change the property to point to a wholly different List<T> reference. For this reason, it is recommended to make use of the .NET framework’s read-only immutable collection classes for exposing lists within your data contract.

As a matter of self-discipline, I avoid embedding List<T> and other collection types within my data contracts. I also keep my data contracts free of object references, thereby reducing the potential object graph to a degenerate case of a single object.

T4 Solution

Now that we see the basic immutable data contract pattern and how to use it, let’s explore opportunities to create a T4 code generation solution to make our lives easier so that we don’t have to type up all this boilerplate for our multitudes of data contracts that we have to expose over our service layer and use in our business logic.

[Immutable, DataContract(Name = "DataContract1")]
public partial class DataContract1
{
    /// <summary>
    /// The unique identifier of the data contract.
    /// </summary>
    [DataMember(Name = "ID")]
    private int _ID;

    [DataMember(Name = "Code")]
    private string _Code;

    [DataMember(Name = "Name")]
    private string _Name;
}

This is all the metadata and implementation code that the developer should be required to write in order to help the T4 generate the rest of the boilerplate. The T4 template will generate an output C# source file looking something like this:

public partial class DataContract1
{
    #region Constructors

    private DataContract1()
    {
    }

    public DataContract1(
        int ID,
        string Code,
        string Name
    )
    {
        this._ID = ID;
        this._Code = Code;
        this._Name = Name;
    }

    #endregion

    #region Public read-only properties

    /// <summary>
    /// The unique identifier of the data contract.
    /// </summary>
    public int ID
    {
        get { return this._ID; }
    }

    public string Code
    {
        get { return this._Code; }
    }

    public string Name
    {
        get { return this._Name; }
    }

    #endregion

    #region Mutable nested builder class

    /// <summary>
    /// A mutable class used to assign values to be ultimately copied into an immutable instance of the parent class.
    /// </summary>
    public sealed class Mutable : ICloneable
    {
        #region Public read-write properties

        /// <summary>
        /// The unique identifier of the data contract.
        /// </summary>
        public int ID { get; set; }
        public string Code { get; set; }
        public string Name { get; set; }

        #endregion

        #region ICloneable Members

        /// <summary>
        /// Clones this Mutable instance to a new Mutable instance using simple property assignment.
        /// </summary>
        public object Clone()
        {
            return new Mutable()
            {
                ID = this.ID,
                Code = this.Code,
                Name = this.Name,
            };
        }

        #endregion
    }

    #endregion

    #region Conversion operators

    /// <summary>
    /// Copies the values from the Mutable instance into a new immutable instance of this class.
    /// </summary>
    public static implicit operator DataContract1(Mutable mutable)
    {
        var imm = new DataContract1();
        imm._ID = mutable.ID;
        imm._Code = mutable.Code;
        imm._Name = mutable.Name;
        return imm;
    }

    /// <summary>
    /// Copies the values from this immutable instance into a new Mutable instance.
    /// </summary>
    public static explicit operator Mutable(DataContract1 immutable)
    {
        var mutable = new Mutable();
        mutable.ID = immutable._ID;
        mutable.Code = immutable._Code;
        mutable.Name = immutable._Name;
        return mutable;
    }

    #endregion
}

Notice that this is a lot more verbose that the abridged version posted for demonstration purposes at the top of this blog.

How does this all work? The T4 template makes use of EnvDTE, the namespace that Visual Studio exposes for doing in-editor code refactoring work, intended primarily for use by Visual Studio add-ins. The project does not have to be compiled in order for this to work.

I decided against using the Microsoft.Cci or regular reflection APIs since these require a compiled assembly in order to do their work. Since this pattern has to be implemented in a single type in a single assembly, neither of those are acceptable solutions. The alternative is to have the developer design their data contracts in some other format that could be parsed by the T4 template and have it generate all the code on behalf of the developer. Since this solution would raise the learning curve already high enough for immutable data structures, this is thrown out as well.

Having the developer code up part of the immutable data contract him/herself is beneficial in that the developer has some control over the class itself and is able to add methods, fields, and properties (with some restrictions) yet also gain the benefits of code generation provided by the T4 template.

Of course, to maintain true immutability, it doesn’t make much sense to have any publicly writable properties. The T4 template imposes some limitations like this on the existing partial class code. The T4 template also heavily relies on naming conventions so those are checked and enforced before generating a single line of code. T4 will generate descriptive and helpful warnings in your Error List pane if it detects anything is awry with the existing partial class code.

Download the code

My code for the T4 template is housed within the following public SVN repository:

Get it here (svn://bittwiddlers.org/WellDunne/trunk/public/ImmutableT4). Or browse the repository over the web from here.

The file is named ImmutablePartialsGenerator.tt and lives in the project folder named ImmutableT4 under the main solution folder. Browse the latest version here.

Note that the T4 template requires Visual Studio’s EnvDTE namespace to have a complete view of the code, so if it does not work within the first moments of opening the project, do not be surprised. Give it time to parse over your code and build a complete model and then try the T4 template again. This is a limitation of Visual Studio and not of this template.

IQueryable to LINQ C# syntax formatter

Technically, the title of this article should be System.Linq.Expressions.Expression to LINQ C# syntax formatter but that’s a bit lengthy.

To get right to the point, I’ve developed a class to parse an Expression tree generated by a LINQ IQueryable query and produce C# LINQ syntax as output. I tried searching the internet for similar work and nothing was immediately available or that was in source form for me to integrate with my tool. So, I wrote my own.

Here’s some sample output from the class:

// C# LINQ query:
var query =
     from x in (
          from ea in TableA
          join et in TableB on ea.TableBID equals et.TableBID
          join es in TableC on ea.TableCID equals es.TableCID
          join st in TableD on ea.TableDID equals (int?)st.TableDID
          join sf in TableE on ea.TableEID equals (int?)sf.TableEID
          join eb in TableF on ea.TableFID equals eb.TableFID
          join psf in TableF on ea.PreviousTableFID equals (int?)psf.TableFID into temp from ps in temp.DefaultIfEmpty()
          select new { ea = ea, et = et, es = es, st = st, sf = sf, eb = eb, ps = ps }
     )
     where ((x.ea.DueDate < (DateTime?)DateTime.Now.Date) && (x.es.Code != "C"))
     select x
     .OrderBy(a => a.ea.DueDate)
     .Skip(20)
     .Take(10);

This class is intended primarily for display purposes. It should not be used in its current state for attempting to write out compilable C# LINQ syntax.

Download the code:

LINQ-to-SQL auditing web-based tool

I’ve been developing a LINQ-to-SQL auditing web-based tool for the last few days and I’d like to share some progress with you all.

This auditing tool takes an instance of our data repository implementation class, finds all public methods via reflection, and executes them one-by-one with a special auditing mode enabled. Each of our query methods is required to call one of our various audit methods depending on the query execution scenario. For instance, if enumeration over an IQueryable is needed to pull back multiple results, call AuditEnumerable(query) on that query. If a stored procedure needs to be executed, call AuditStoredProcedure(db => db.MethodToCallProcedure(prm1, prm2)). If you need a single result, either null or an instance, call AuditSingle(query).

Needless to say, all of our data repository methods have been made to follow development patterns such that no query execution is performed while in auditing mode. Conversely, if the auditing mode is turned off, the methods should behave as normal and return actual data from a connection to SQL Server.

This web page is deployed as a single aspx file with no code-behind for “mobility” purposes. It is currently deployed, for better or worse, in our internal WCF ASP.NET services host project right next to our *.svc files that host the service responsible for connecting to our data repository methods. The nice thing is that this tool is zero-configuration and Just Works (TM). It is contained all in one file and can be deployed virtually anywhere in any environment.

The main interface for the tool is a paged listing of our data repository methods and the auditing output per each method. An example:

IPagedData<ActivityDetails> GetT0sByT1IDTypeID(T1ID id0, T2ID id1, T3ID id3, PagingInfo paging, T0SortingInfo sorting)

-- This is a PAGING implementation.
DECLARE @p0 int;
DECLARE @p1 int;
DECLARE @p2 int;
DECLARE @p3 int;
SET @p0 = 777;
SET @p1 = 778;
SET @p2 = 20;
SET @p3 = 10;
SELECT [t8].[ID], ...
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[OrderByColumn]) AS [ROW_NUMBER], [t0].[ID], ...
    FROM [dbo].[Table0] AS [t0]
    INNER JOIN [dbo].[Table1] AS [t1] ON [t0].[T1ID] = [t1].[ID]
    INNER JOIN [dbo].[Table2] AS [t2] ON [t0].[T2ID] = [t2].[ID]
    INNER JOIN [dbo].[Table3] AS [t3] ON [t0].[T3ID] = [t3].[ID]
    INNER JOIN [dbo].[Table4] AS [t4] ON [t0].[T4ID] = [t4].[ID]
    INNER JOIN [dbo].[Table5] AS [t5] ON [t0].[T5ID] = [t5].[ID]
    LEFT OUTER JOIN (
        SELECT 1 AS [test], [t6].*
        FROM [dbo].[Table6] AS [t6]
        ) AS [t7] ON [t0].[T7ID] = ([t7].[ID])
    WHERE ([t0].[T0ID] = @p0) AND ([t0].[T1ID] = @p1)
    ) AS [t8]
WHERE [t8].[ROW_NUMBER] BETWEEN @p2 + 1 AND @p2 + @p3
ORDER BY [t8].[ROW_NUMBER];
GO

This is only a part of the output of the tool so far. I’ve also trimmed some of the more sensitive information from the query. I blogged about the SQL syntax highlighter I developed in my previous post so if you’re curious how I did that go back one post from here.

The method names in the listing page are enabled as hyperlinks to go into a “details” mode. In this details page, you can actually fill in parameter values (and properties found on classes for those parameter types) used to alter the generated SQL query. You can even Execute the method with your parameters and see the resulting output formatted as XML.

We’ve enabled all of our domain models that our data repository methods work with to be [DataContract] decorated. This gets us the ability to serialize these objects into whatever format we desire at any time. I’ve chosen to display the results as XML for simplicity’s sake. I could even format the execution results as JSON using the DataContractJsonSerializer and pass that data into a jQuery-driven data table implementation for a more familiar grid-like view, not to mention a much more user-friendly interface to work with.

I’ve got numerous plans for features I’d like to implement for this tool. I will consider releasing some of the framework and tool code if I can lighten up some of the dependencies involved.

T-SQL HTML formatting code in C#

As usual, when I blog, I only blog about things I find that are unique and that haven’t been posted before. Today is no exception. I give you an HTML formatter for T-SQL, written in C# using Regular Expressions.

public static string HTMLColorizeSQL(string sql)
{
    string output = HttpUtility.HtmlEncode(sql);
    output = Regex.Replace(output,
        @"^--(?<comment>[^\r\n]*)(?<post>\r\n|$)",
        @"<span class=""sql_comment"">--${comment}</span>${post}",
        RegexOptions.IgnoreCase | RegexOptions.Multiline
    );
    output = Regex.Replace(output,
        @"(?<=(\[|\b))(?<keyword>(SELECT|FROM|WHERE|ORDER|INNER|JOIN|OUTER|LEFT|RIGHT|CROSS" +
            @"|DISTINCT|DECLARE|SET|EXEC|NOT|IN|IS|NULL|BETWEEN|GROUP|BY|ASC|DESC|OVER|AS|ON" +
            @"|AND|OR|TOP|GO|CASE|WHEN|ELSE|THEN|IF|BEGIN|END|LIKE))\b",
        @"<span class=""sql_keyword"">${keyword}</span>",
        RegexOptions.IgnoreCase
    );
    output = Regex.Replace(output,
        @"(\b(?<keyword>ROW_NUMBER|COUNT|CONVERT|COALESCE|CAST)(?<post>\())",
        @"<span class=""sql_function"">${keyword}</span>${post}",
        RegexOptions.IgnoreCase
    );
    output = Regex.Replace(output,
        @"(?<param>\@[\w\d_]+)",
        @"<span class=""sql_param"">${param}</span>",
        RegexOptions.IgnoreCase
    );
    return output;
}

Please allow me to express how much I HATE the use of regular expressions for parsing tasks like this. Normally I would be completely content to sit down and hack up a custom parser in C# for T-SQL or at least try to generate one from existing tools. I figured the most I’m trying to accomplish here is syntax highlighting, so what’s the harm in going with Regex here?

Note that this implementation comes with a few caveats.

  1. This is not a full keyword or function list, but seems to be enough to cover what LINQ-to-SQL would generate.
  2. The <span> tags are applied even within ‘sql_comment’ spans, so some clever CSS selector trickery is required to properly format comment lines.
  3. Things that are NOT highlighted because LINQ-to-SQL does not generate them in its output:
    /* comments */ are not highlighted.
    SQL server type names are not highlighted.
    SQL strings are not highlighted.

The CSS that I am using is in this form:

/* SQL keyword: a span with a 'sql_keyword' class not found within an that has a 'sql_comment' class */
:not(.sql_comment)>span.sql_keyword
{
	color: #33f;
	font-weight: bold;
}

/* SQL function: a span with a 'sql_function' class not found within an that has a 'sql_comment' class */
:not(.sql_comment)>span.sql_function
{
	color: #3f6;
	font-weight: bold;
}

/* SQL parameter */
:not(.sql_comment)>span.sql_param
{
	color: #993;
	font-weight: bold;
}

/* SQL single-line comment */
.sql_comment
{
	color: Olive;
	font-weight: bold;
}

The key pattern to take away here is the use of the :not(x) CSS selector. It says to format a keyword, parameter, or function name if and only if the span is not contained within a span with the class ‘sql_comment’ applied.

I hope you enjoy that and I hope to see future improvements coming back this way!

Resumes

What’s on your resume? Is it just a collection of what I refer to as buzzword bingo with filler words in between? How does it stack up against other resumes?

Oh, you listed C# 3.0 on your resume? Great! Just what we’re looking for. You have experience with ASP.NET? Wow! What a match! Let me put your resume next to all the other identical copies in my big whoop folder.

When I take a look at a resume I expect to see at least one thing that tells me what you are capable of. One thing that shows what you can do better than others. No, I don’t care if you have experience with the MOSS framework or that you’ve used Ektron CMS. Show me what tasks you’ve done to better the quality of the software you were involved with.

Things that are particularly impressive to me are experience with multi-threaded programming, diagnostic abilities, and having an valid opinion about some piece of technology. Of those, I would have to say that diagnostic abilities reign supreme. Having good diagnostic skills lets you solve just about any problem that comes across your plate. Find a clever way to describe such abilities on your resume.

I could not care less the architecture or purpose of your previous projects unless you were primarily or secondarily responsible for designing such an architecture. Why bother wasting precious white space on such useless information that tells me nothing about you? I’m not interested in acquiring your previous projects. I might be more interested to see what you can do for me or my company.

Don’t worry about proper resume formatting or whatever your English courses have engrained in your brain. Nobody gives a rip about the layout of your resume. Focus on content, not form. One page, maybe two pages max. If you feel the need to stretch it out beyond that, only dedicate that space to providing valuable information as to what you have done yourself, not your prior teams.

Furthermore, don’t pigeonhole yourself into breaking up your resume into the de facto standard sections: objective, skills, experience, awards, etc. Just find some organized way to convey the information we as employers (or interviewers) need to see to set you apart. The more unique your resume looks, the better shot you have at grabbing my attention.

Remember, your resume is your advertising material. It should serve a purpose other than the circuitousness of simply being a resume. You are on the market. You are selling your skills, knowledge, wisdom, and experience. Mark down some highlights which show you off. Nobody rewards points for modesty on a resume.

Algorithmic Test for Senior Software Engineers

Having had enough of the tired old “ask a question get an answer” interview tactics, I decided to put together a test bed application to test the interviewee’s real-world problem solving skills. Based on feedback from other team members, I did not actually force this upon on any of our recent interviewees.

The purpose of the test is to find out if the interviewee can actually develop an algorithm to solve a problem from scratch and can fit it into an existing project. Another critical skill to test is how well the candidate can read specifications and more importantly how attentive to detail he/she is.

This test is based on an algorithm I developed long long ago to convert game maps from Wolfenstein 3-D to DooM 2(which was quite a success but I unfortunately have lost the code since then). The two games are based on very different technologies and the ways in which each stores map information is completely different. Wolf3D stores its maps as a simple 2D square tiled grid filled with block type numbers where the number represents a specific wall texture type and sometimes an implied function like a door. DooM (and DooM 2) store their maps as a set of convex polygonal sectors with a floor and ceiling height. A sector is made up of a set of line segments with special properties applied to one or both sides of the line segment.

The algorithm that the candidate is tasked to implement must convert the incoming 2D tiled grid filled with solid/empty values into a set of line segments which represent the inner edges of the walls found between a solid and an empty tile. I chose to simplify the requirements to deal only with solid/empty tiles instead of 100-or-so different tile types.

Here is the algorithm’s method signature:

IEnumerable<LineSegment> TransformBlocksToLineSegments(RectangularMap map, Point startingPosition)

There are many ways to implement the algorithm and many detail points to consider:

  • The order of beginning/ending points for the line segment implies the direction that the line segment follows. In DooM, this ordering becomes very important for determining which side is the front vs. back. An enclosed sector must have all its line segments pointing in a clockwise order, i.e. the start of one line segment must point at the start of the next line segment and so on until the end of the last line segment points to the beginning of the first line segment.
  • Consider allowing/disallowing orphaned areas that the player can/cannot reach from the starting point. A recursive flood-fill implementation of the algorithm will skip over orphaned areas whereas a nested-for-loop approach will include all areas, regardless of reachability.
  • Joining line segments per each tile together into the longest-running line segment can be challenging. The simplest case to implement is to create a single line segment per tile’s edges and leave it at that, but that’s very wasteful in terms of number of line segments.

There are 6 sample solutions implementing the algorithm using various combinations of these points to compare the candidate’s implementation against. The test bed application also includes a benchmark feature which compares the averaged run times of each implementation.

The intent of the test is a take-home problem where the candidate can spend as much or as little time as he/she likes. The 6 sample implementations are included in a binary-only assembly form, but this is easy to defeat with a tool like Reflector.

The fun part is the ability to draw out rectangles on the grids and to see the live results of the selected algorithm and how it reacts to the new information in the map. Left-click-dragging will create an empty rectangular space and right-click-dragging will create a solid rectangular space.

If you are planning to use this test, be aware of cheating. Have the candidate explain their implementation to you and be sure they’re not just copying someone else’s solution and taking credit for the work.

Download here:

IEnumerable and LINQ

I cannot stress enough the importance of knowing how LINQ queries work when they are based on an IEnumerable source.

When one defines a query based on an IEnumerable source, the query variable represents just that: the query, NOT the results of enumerating the query.

Each time you enumerate over the query object, you are calculating the results of that query on-demand. There is no caching of results. The LINQ IEnumerable implementation makes no assumptions that enumerating the same query twice in a row will produce the same results and so it let’s you do so without any qualms.

If what you meant to do was to run the query once and store the results for future operations to work on, then creating a List variable to store those results in memory sounds like a reasonable approach to solving this problem. List implements IEnumerable so it is a good candidate for replacement in future LINQ queries that want to work on the results of the first query, not having to constantly recompute that query itself multiple times.

var query = from x in something select x;
var results = query.ToList();

Use the `results` variable when you want to reference the results of `query`.

The same restrictions apply to IQueryable LINQ queries. When enumerating over an IQueryable, it calls the underlying IQueryProvider to transform your query operations into whatever form is best for that provider to execute your query against its data source.

Be *very* careful when including IQueryable variables in another IQueryable LINQ query because you will be effectively telling your query provider to combine those queries together, and it will be up to the query provider to figure out how to do so or to raise an exception telling you that that’s unsupported or simply not possible. If you rather meant to pass the results of that query into another then you should use the AsEnumerable extension method. That should guarantee that the two queries are kept independent and that the results of the first are fed into the second.

As an aside, the ToList extension method *always* creates a new List instance, regardless of the type of IEnumerable it is called on. Be careful when using this method too many times because you will create a new list each time it is called. This can be very wasteful with memory.

Also, when taking an IEnumerable as a parameter to your method, it would be wise to provide the guarantee to the caller that it will be enumerated only zero or one times. You can accomplish this by being careful in your implementation or by using the ToList extension method to work on a local List variable when you know you need to work on the set more than once.