Integration testing SSIS ETL packages

Integration testing SSIS ETL packages can be quite a challenge! The reason for this is often the package is written against several large source databases with several gigabytes (or even terabytes) of data. The only way people have of testing the SSIS package is to run it, which takes several hours so the feedback loop is very slow. This also can leave a company without an environment for a day (or more) whilst the issue is fixed.

How can we go about writing an automated test around our SSIS package to ensure that any changes are going to work before we run it against our large production databases? The answer to this is by writing a full end to end integration test that spins up the source database, inserts some data into it, spins up a destination database then runs the SSIS package and then asserts the data is now in the destination database. Once we have this in place we can test every aspect of an SSIS package to make sure it is functioning correctly. The build can then be run on a CI server so we have the same level of confidence about our SSIS package as we do for our production code!

A short rant before I begin explaining how the build works… It continues to amaze me how many companies do not have their database in source control so the only version of the database is the one sitting on the instance in live (and often different versions of it scattered around test environments). When you sit down and think about this it is a crazy situation. It is not that tough to put your database into source control and write a CI build around it. Having your database in source control gives you so many benefits its so surprising to me it is neglected as an after thought.

Rant over lets get on to how to create an integration test around an SSIS package. I have created a proof of concept that you are free to clone and use as you see fit see SSISIntegrationTesting on github. I have tried to make the readme on the repository as comprehensive as possible so if that is sufficient feel free to dive over there and check out the code.

The code uses the local sql server instance to create throw away databases for testing. This is a really useful trick and one that I’ve used in my SqlJuxt F# database comparison project. The great thing about using the local db instance on sql server is that it is available on any box where sql server management tools are installed. So you do not even need the full version of sql server installed to get up and running with it. So it makes it easy to get the build up and running on your build server.

The other trick to making your SSIS package testable is by parameterising your connection strings. To do this go to the data flow view in visual studio and click on the data flow tab. From there right click on the connection in the connection manager pane at the bottom and select “parameterize”. This allows you to pass in a parameter to override the connection string but it will default to the existing connection string you have set up.

If we open up the SSISIntegrationTesting.sln in the repository you will see the package1.dtsx SSIS package. This is a very simple package that uses ETL to copy all data from the products table in the source database to a products table in the destination database. Obviously in reality your SSIS job will be much more complex than this but by solving testing for this simple base case we can build from here.

I am a big fan of writing your tests using XBehave. This allows you to write meaningful descriptions in your test using Given, When, Then. On top of this I like to use builder classes to build the data and write descriptive methods for asserting the data. In my view the test should be readable in that you should be able to walk up to it and realise exactly what it is doing. Too many tests in my view have reams and reams of code and you have to spend quite a while working out what is going on.

From here on I think the best way to finish this article is to go through the integration test in the project and describe it step by step. I am going to paste the code line by line and then add a description below it. Although I do not think you will really need much of a description as the code is self describing as previously mentioned. All of the code for the test is in the PackageScenarios.cs file in the SSISTests project inside the SSISIntegrationTesting.sln in the github repository.

 "Given a source database with one row in the products table"
._(() =>
{
    sourceDatabase = _testServer.CreateNew();
    sourceDatabase.ExecuteScript("database.sql");
    var connection = Database.OpenConnection(sourceDatabase.ConnectionString);
    connection.Products.Insert(ProductCode: 1, ShippingWeight: 2f, ShippingLength: 3f,
        ShippingWidth: 4f, ShippingHeight: 5f, UnitCost: 6f, PerOrder: 2);
});

The first step in the test sets up an empty source database. It then runs in our schema which is stored in the database.sql file. Note in a real project the schema should come from your database CI build. It then uses Simple.Data to insert a product into the products table. Simple.Data is an excellent lightweight ORM that we can use to make it easier to write queries against our database. In this example Simple.Data takes advantages of the C# dynamic type to create an insert statement for us.

"And an empty destination database with a products table"
._(() =>
{
    destDatabase = _testServer.CreateNew();
    destDatabase.ExecuteScript("database.sql");
});

Next we create an another database this time to use for our destination database. Again we run in our schema which is contained in the database.sql file.

"When I execute the migration package against the source and dest databases"
._(() => result = PackageRunner.Run("Package1.dtsx", new
{
    Source_ConnectionString = sourceDatabase.ConnectionString.ToSsisCompatibleConnectionString(),
    Dest_ConnectionString = destDatabase.ConnectionString.ToSsisCompatibleConnectionString(),                    
}));

Now comes the action of testing the SSIS package. Notice here that we are passing in the connection strings of our source and destination SSIS packages for use. This will override the connection strings in the package so our two test databases will be used.

"Then the package should execute successfully"
._(() => result.Should().BeTrue());

I have built the package runner to return a bool as to whether or not it succeeded. Which is sufficient for this proof of concept but if you wanted to extend this to return any specific errors that came back then you could do so. Here we just assert that the package ran successfully.

"And the products table in the destination database should contain the row from the source database"
._(() => destDatabase.AssertTable().Products.ContainsExactlyOneRowMatching(
   new {
       ProductCode = 1,
       ShippingWeight= 2f,
       ShippingLength= 3f,
       ShippingWidth= 4f,
       ShippingHeight= 5f,
       UnitCost= 6f,
       PerOrder= 2
        }
   ));

Lastly we assert that the data is in fact in the destination database. This line of code looks quite magical so let me explain how it works. The AssertTable() extension method returns a dynamic which means after the “.” we can put anything we want, in this case we put “Products” as we want the products table. We then override the TryGetMember method on dynamic object to grab the string “products” and pass that along to our next method which is ContainsExactlyOneRowMatching. This method under the covers takes the anonymous C# that you pass in and uses Simple.Data to construct a sql query that can be run against the database. This means that this assertion is very efficient as it tries to select a single row from the products table with a where clause with all of the fields you pass in using the anonymous object. I think the syntax for this is very neat as it allows you to quickly assert data in your database and it is very readable.

Note all of the databases created by the test are destroyed in the Dispose method of the test. Go into the code if you want to see exactly how this happens.

There we have it, a full repeatable end to end SSIS integration test. I believe we have the building blocks here to create tests for more complex SSIS builds. I hope this helps you constructing your own builds, feel free to use any of the code or get in touch via the comments if you want any help/advice or have any suggestions as to how I can make this proof of concept even better.

XBehave – compiling the test model using the Razor Engine

In the last post I left off describing how I implemented the parsing of the XUnit console runner xml in the XUnit.Reporter console app. In this post I want to talk through how you can take advantage of the excellent RazorEngine to render the model to html.

I am going to talk through the console app line by line. The first line of the app:

var reporterArgs = Args.Parse<ReporterArgs>(args);

Here we are using the excellent PowerArgs library to parse the command line arguments out into a model. I love how the API for PowerArgs has been designed. It has a slew of features which I won’t go into here for example it supports prompting for missing arguments all out of the box.

Engine.Razor.Compile(AssemblyResource.InThisAssembly("TestView.cshtml").GetText(), "testTemplate", typeof(TestPageModel));

This line uses the RazorEngine to compile the view containing my html, it gives the view the key name “testTemplate” in the RazorEngine’s internal cache. What is neat about this is that we can deploy the TestView.cshtml as an embedded resource so it becomes part of our assembly. We can then use the AssemblyResource class to grab the text from the embedded resource to pass to the razor engine.

var model = TestPageModelBuilder.Create()
                                .WithPageTitle(reporterArgs.PageTitle)
                                .WithTestXmlFromPath(reporterArgs.Xml)
                                .Build();

We then create the TestPageModel using the TestPageModelBuilder. Using a builder here gives you very readable code. Inside the builder we are using the XmlParser from earlier to parse the xml and generate the List of TestAssemblyModels. We also take the optional PageTitle argument here to place in the page title in our view template.

var output = new StringWriter();
Engine.Razor.Run("testTemplate", output, typeof(TestPageModel), model);
File.WriteAllText(reporterArgs.Html, output.ToString());

The last 3 lines simply create a StringWriter which is used by the engine to write the compiled template to. Calling Engine.Razor.Run runs the template we compiled earlier using the key we set “testTemplate”. After this line fires our html will have been written to the StringWriter so all we have to do is extract it and then write it out to the html file path that was parsed in.

That’s all there is to it. We now have a neat way to extract the Given, When, Then gherkin syntax from our XBehave texts and export it to html in whatever shape we chose. From there you could post to an internal wiki or email the file to someone, that could all be done automatically as part of a CI build.

If anyone has any feedback on any of the code then that is always welcomed. Please check out the XUnit.Reporter repository for all of the source code.

XBehave – Exporting your Given, Then, When to html

For a project in my day job we have been using the excellent XBehave for our integration tests. I love XBehave in that it lets you use a Given, When, Then syntax over the top of XUnit. There are a couple of issues with XBehave that I have yet to find a neat solution for (unless I am missing something please tell me if this is the case). The issues are

1) There is not a nice way to extract the Given, When, Then gherkin syntax out of the C# assembly for reporting
2) The runner treats each step in the scenario as a separate test

To solve these to problems I am writing an open source parser that takes the xml produced by the XUnit console runner and parses it to a C# model. I can then use razor to render these models as html and spit out the resultant html.

This will mean that I can take the html and post it up to a wiki so every time a new build runs the tests it would be able to update the wiki with the latest set of tests that are in the code and even say whether the tests pass or fail. Allowing a business/product owner to review the tests, see which pieces of functionality are covered and which features have been completed.

To this end I have created the XUnit.Reporter github repository. This article will cover the parsing of the xml into a C# model.

A neat class that I am using inside the XUnit.Reporter is the AssemblyResource class. This class allows easy access to embedded assembly resources. Which means that I can run the XUnit console runner for a test, take the resultant output and add it to the test assembly as an embedded resource. I can then use the AssemblyResource class to load back the text from the xml file by using the following line of code:

AssemblyResource.InAssembly(typeof(ParserScenarios).Assembly, "singlepassingscenario.xml").GetText())

To produce the test xml files for the tests I simply set up a console app, added XBehave and then created a test in the state I wanted for example a single scenario that passes. I then ran the XUnit console runner with the -xml flag set to produce the xml output. I then copied the xml output to a test file and named it accordingly.

The statistics for the assembly model and test collection model are not aligned to what I think you would want from an XBehave test. For example if you have this single XBehave test:


public class MyTest
{
[Scenario]
public void MyScenario()
{
"Given something"
._(() => { });

"When something"
._(() => { });

"Then something should be true"
._(() => { });

"And then another thing"
._(() => { });
}
}

Then the resultant xml produced by the console runner is:

<?xml version="1.0" encoding="utf-8"?>
<assemblies>
<assembly name="C:\projects\CommandScratchpad\CommandScratchpad\bin\Debug\CommandScratchpad.EXE" environment="64-bit .NET 4.0.30319.42000 [collection-per-class, parallel (2 threads)]" test-framework="xUnit.net 2.1.0.3179" run-date="2017-01-27" run-time="17:16:00" config-file="C:\projects\CommandScratchpad\CommandScratchpad\bin\Debug\CommandScratchpad.exe.config" total="4" passed="4" failed="0" skipped="0" time="0.161" errors="0">
<errors />
<collection total="4" passed="4" failed="0" skipped="0" name="Test collection for RandomNamespace.MyTest" time="0.010">
<test name="RandomNamespace.MyTest.MyScenario() [01] Given something" type="RandomNamespace.MyTest" method="MyScenario" time="0.0023842" result="Pass" />
<test name="RandomNamespace.MyTest.MyScenario() [02] When something" type="RandomNamespace.MyTest" method="MyScenario" time="0.0000648" result="Pass" />
<test name="RandomNamespace.MyTest.MyScenario() [03] Then something should be true" type="RandomNamespace.MyTest" method="MyScenario" time="0.0000365" result="Pass" />
<test name="RandomNamespace.MyTest.MyScenario() [04] And then another thing" type="RandomNamespace.MyTest" method="MyScenario" time="0.000032" result="Pass" />
</collection>
</assembly>
</assemblies>

If you look carefully at the xml you will notice a number of things which are counter-intuative. Firstly look at the total in the assembly element it says 4, when we only had a single test. This is because the runner is considering each step to be a separate test. The same goes for the other totals and the totals in the collection element. The next thing that you will notice is that the step names in the original test have had a load of junk added to the front of them.

In the parser I produce a model with the results that I would expect. So for the above xml it I produce an assembly model with a total of 1 test, 1 passed, 0 failed, 0 skipped and 0 errors. Which I think makes much more sense for XBehave tests.

Feel free to clone the repository and look through the parsing code (warning it is a big ugly). Next time I will be talking through the remainder of this app which is rendering the test results to html using razor.

Converting decimal numbers to Roman Numerals in C#

I decided to create a little project to implement converting decimal numbers to Roman Numerals in C#. You can solve this problem in quite a few different ways, I wanted to talk through the pattern based approach that I went for.

The Roman Numerals from 0 to 9 are as follows:

  • 0 = “” (empty string)
  • 1 = I
  • 2 = II
  • 3 = III
  • 4 = IV
  • 5 = V
  • 6 = VI
  • 7 = VII
  • 8 = VIII
  • 9 = IX

To start the project I wrote a set of tests that checked these first 10 cases. I like taking this approach as it allows you to solve for the simple base case, then you can refactor your solution to be more generic. Our first implementation that solves for the first ten numbers is:

public static string ToRomanNumeral(this int integer)
{
       
     var mapping = new Dictionary<int, string>
     {
         {0, ""},
         {1, "I"},
         {2, "II"},
         {3, "III"},
         {4, "IV"},
         {5, "V"},
         {6, "VI"},
         {7, "VII"},
         {8, "VIII"},
         {9, "IX"},
     };

     return mapping[integer];
}

Obviously there is no error checking etc we are just solving for the 10 first cases. I decided to implement the method as an extension method on int as it makes the code look neat as it will allow you to write:

    var romanNumeral = 9.ToRomanNumeral();

The next step is to look at the Roman Numerals up to 100 and see if we can spot any patterns. We don’t want to have to manually type out a dictionary for every Roman Numeral! If we look at the Roman Numeral representation for the tens column from 0 to 100 we find they are:

  • 0 = “” (empty string)
  • 10 = X
  • 20 = XX
  • 30 = XXX
  • 40 = XL
  • 50 = L
  • 60 = LX
  • 70 = LXX
  • 80 = LXXX
  • 90 = XC

We can straight away see that it is exactly the same as the numbers from 0 to 9 except you replace I with X, V with L and X with C. So lets pull that pattern out into something that can create a mapping dictionary given the 3 symbols. Doing this gives you the following method:

private static Dictionary<int, string> CreateMapping(string baseSymbol, string midSymbol, string upperSymbol)
{
    return new Dictionary<int, string>
    {
        {0, ""},
        {1, baseSymbol},
        {2, baseSymbol + baseSymbol},
        {3, baseSymbol + baseSymbol + baseSymbol},
        {4, baseSymbol + midSymbol},
        {5, midSymbol},
        {6, midSymbol + baseSymbol},
        {7, midSymbol + baseSymbol + baseSymbol},
        {8, midSymbol + baseSymbol + baseSymbol + baseSymbol},
        {9, baseSymbol + upperSymbol},
    };
}

We can now call the above method with the symbols for the column we want to calculate passing in the correct symbols. So for the units column we would write:

    var unitMapping = CreateMapping("I", "V", "X");

Now we have this we it is straight forward to create the mapping for the hundreds column. To complete our implementation we want to add some error checks as we are only going to support Roman Numerals from 0 to 4000. The full solution is now quite trivial. We simply check the input is in our valid range (between 0 and 4000). Then we loop through each column looking up the symbol for that column in our mapping dictionary that we generate using our function above. Then we simply concatenate the symbols together using a StringBuilder and return the result.

The full solution with all of the tests is available on my GitHub repository: https://github.com/kevholditch/RomanNumerals.

A neat query pattern written in C#

By using nuget packages such as the brilliant Dapper it is possible to create a very concise way to access your database without using very much code. I particularly like the query pattern I’m going to go through today, it’s lightweight, simple and encourages composition!

Lets work from the outside in. Clone the query pattern github repository so you can follow along. I have written the program in the repository to work against the Northwind example database. If you haven’t got the Northwind database installed you can find information on that over on msdn.

Take a look at the program.cs file the essence of which is captured in the code snippet below:

Console.WriteLine("Enter category to search for:");
var name = Console.ReadLine();

var categories = queryExector.Execute<GetCategoriesMatchingNameCriteria, GetCategoriesMatchingNameResult>(new GetCategoriesMatchingNameCriteria { Name = name }).Categories;

Console.WriteLine("categories found matching name: {0}", name);

foreach (var category in categories)
    Console.WriteLine(category);

The program above takes an input from the user and then does a like match with any category from the Northwind database that matches the user’s input. You can see how few liens of code this has taken to achieve. Note nowhere do we have reams of ADO .net code cluttering up the joint.

We are modelling a query as something that takes TCriteria and returns TResult. The interface for a query is shown below:

public interface IQuery<in TCriteria, out TResult>
{
    TResult Execute(TCriteria criteria);
}

By representing a query in this way and using Dapper the implementation is very short and to the point:

public class GetCategoriesMatchingNameQuery : IQuery<GetCategoriesMatchingNameCriteria, GetCategoriesMatchingNameResult>
{
    private readonly IDbConnection _dbConnection;

    public GetCategoriesMatchingNameQuery(IDbConnection dbConnection)
    {
        _dbConnection = dbConnection;
    }

    public GetCategoriesMatchingNameResult Execute(GetCategoriesMatchingNameCriteria matchingNameCriteria)
    {                        
        string term = "%" + matchingNameCriteria.Name.Replace("%", "[%]").Replace("[", "[[]").Replace("]", "[]]") + "%";

        var result = new GetCategoriesMatchingNameResult
        {
            Categories = _dbConnection.Query<CategoryEntity>(@"select CategoryID, CategoryName, Description from categories where categoryName like @term", new { term })                
        };


        return result;
    }
}

Note that is the class above the actual query is 3 lines of code! It can be done on one line if you wanted but that would make the code harder to read. As the queries are created automatically using an abstract factory in Castle Windsor we can do things like decorate them to apply caching or logging across the board or even both. Any cross cutting concern you can think of can be done easily. Queries can also be composed together easily you just have a query take a dependency on another query or multiple queries, then simply chain them together.

I really love how clean and concise this code is. By letting Dapper do the heavy lifting we aren’t bogged down with lots of ADO .net code that isn’t part of the IP of your business application.

An easy way to test custom configuration sections in .Net

Due the way the ConfigurationManager class works in the .Net framework it doesn’t lend itself very well to testing custom configuration sections.  I’ve found a neat solution to this problem that I want to share.  You can see all of the code in the EasyConfigurationTesting repository on github.

Most of the problems stem from the fact that the ConfigurationManager class is static. A good takeaway from this is that static classes are hard to test.

One approach to testing your own custom config section would be to put your config section in an app.config inside your test project.  This would work to the extent you could read the configuration section from it and test each value but it would be hard, error prone and hacky to test anything other than what was in the app.config file when the test ran.

In the config demo code that is on github the config section we are trying to test has a range element allowing you to set the min and max.  We want to test that we can successfully read the min and max values from the config and what happens in different scenarios like if we omit the max value from the config section.  Take a look at how readable the following test is:

TestAppConfigContext testContext = BuildA.NewTestAppConfig(@"<?xml version=""1.0""?>
            <configuration>
              <configSections>
<section name=""customFilter"" type=""Config.Sections.FilterConfigurationSection, Config""/>
              </configSections>
              <customFilter>
                <range min=""1"" max=""500"" />
              </customFilter>
            </configuration>");


var configurationProvider = new ConfigurationProvider(testContext.ConfigFilePath);
var filterConfigSection = configurationProvider.Read<FilterConfigurationSection>(FilterConfigurationSection.SectionName);

filterConfigSection.Range.Min.Should().Be(1);
filterConfigSection.Range.Max.Should().Be(500);            

testContext.Destroy();

Notice how we can pass in a string to represent the configuration we want to test, get the section based upon that string, assert the values from the section and then destroy the test context.

Under the covers this works by creating a temporary text file based upon the string you pass in. That temporary text file is then set as the config file to use for ConfigurationManager. The location of the temporary file is stored in the TestAppConfigContext class which also has a helpful destroy method to clean up the temporary file after the test is complete.

By using the builder pattern it makes the test very readable. If you read the test out loud from the top it reads “build a new test app config”. Code that describes itself in this way is easy to understand and more maintainable.

A cool trick that I use inside the builder is to implement the implicit operator to convert from TestAppConfigBuilder to TestAppConfigContext. This means that instead of writing:

var testContext = BuildA.NewTestAppConfig(@"...").Build();

You can write:

TestAppConfigContext testContext = BuildA.NewTestAppConfig(@"...");

Notice on the second row you can omit the call to Build() because the implicit operator takes care of the conversion (which incidentally is implemented as a call to build). In this case you could argue that you like the call to Build() as it means you can use var rather than the type but I personally prefer it.

With this pattern we can easily add more tests with different config values:

TestAppConfigContext testContext = BuildA.NewTestAppConfig(@"<?xml version=""1.0""?>
            <configuration>
              <configSections>
                <section name=""customFilter"" type=""Config.Sections.FilterConfigurationSection, Config""/>
              </configSections>
              <customFilter>
                <range min=""1"" />
              </customFilter>
            </configuration>");


var configurationProvider = new ConfigurationProvider(testContext.ConfigFilePath);
var filterConfigSection = configurationProvider.Read<FilterConfigurationSection>(FilterConfigurationSection.SectionName);

filterConfigSection.Range.Min.Should().Be(1);
filterConfigSection.Range.Max.Should().Be(100);

testContext.Destroy();

In the test above we are making sure we get a max value of 100 if one is not provided in the configuration. This gives us the safety net of a failing unit test should someone update this code.

I think this pattern is a really neat way to test custom configuration sections. Feel free to clone the github repository with the sample code and give feedback.

BuilderFramework – Dependent steps

Last time I started to detail about a new open source builder framework that I was writing. Today I wanted to speak about dependent steps.

It is important for the builder to support dependent steps. For example you might have one step to create an order and another step to create a customer. Obviously the step to create the customer will need to run before the step to create the customer. When reversing these steps they will need to run in the opposite order.

To manage this I have written an attribute DependsOnAttribute. This attribute takes a type as it’s constructor parameter.  The attribute allows you to annotate which steps your step depends on. For example:


public class CreateCustomerStep : IBuildStep { ...

[DependsOn(typeof(CreateCustomerStep))]
public class CreateOrderStep : IBuildStep { ...

To support this the commit method needs to sort the steps into dependency order. It also needs to check for a circular dependency and throw an exception if one is found. We are going to need a separate class for managing the sorting of the steps (remember single responsibility!). The interface is detailed as follows:

public interface IBuildStepDependencySorter
{
    IEnumerable Sort(IEnumerable buildSteps);
}

Before we implement anything we need a set of tests to cover all of the use cases of the dependency sorter.  That way when all of the tests pass we know that our code is good. I always like to work in a TDD style.  (The tests I have come up with can be seen in depth on the github source page or by cloning the source).

At a high level these are the tests we need:

  • A simple case where we have 2 steps where one depends on the other
  • A simple circular reference with 3 steps throws an exception
  • A complex circular reference with 5 steps throws an exception
  • A more complex but valid 4 step dependency hierarchy gets sorted correctly
  • A multiple dependency (one step dependent or more than one other step) gets sorted correctly

It is so important to spend a decent amount of time writing meaningful tests that test all of the use cases of your code.  Once you have done this it makes it so much easier to write the code.  I see so many people writing the code first and then retro fitting the tests.  Some developers also claim that they haven’t got time to write tests.  I can’t see this logic.  When writing tests first your code is quicker to write as you know when the code is working.  If you write your tests last then you are just caught in a horrible debugger cycle trying to work out what’s going wrong and why.  You should rarely if ever need the debugger.

To implement dependency sorting we need to implement a topological sorted as detailed on wikipedia. I have decided to implement the algorithm first described by Khan (1962).

Here is the Psuedo code for the algorithm:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    add n to tail of L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    return error (graph has at least one cycle)
else
    return L (a topologically sorted order)

Here is that code in C#:

public class BuildStepDependencySorter : IBuildStepDependencySorter
{
    private class Node
    {
        public Node(IBuildStep buildStep)
        {
            BuildStep = buildStep;
            IncomingEdges = new List<Edge>();
            OutgoingEdges = new List<Edge>();
        }

        public IBuildStep BuildStep { get; private set; }
        public List<Edge> IncomingEdges { get; private set; }
        public List<Edge> OutgoingEdges { get; private set; }
    }

    private class Edge
    {
        public Edge(Node sourceNode, Node destinationNode)
        {
            SourceNode = sourceNode;
            DestinationNode = destinationNode;
        }

        public Node SourceNode { get; private set; }
        public Node DestinationNode { get; private set; }

        public void Remove()
        {
            SourceNode.OutgoingEdges.Remove(this);
            DestinationNode.IncomingEdges.Remove(this);
        }
    }

    public IEnumerable<IBuildStep> Sort(IEnumerable<IBuildStep> buildSteps)
    {
        List<Node> nodeGraph = buildSteps.Select(buildStep => new Node(buildStep)).ToList();

        foreach (var node in nodeGraph)
        {
            var depends = (DependsOnAttribute[])Attribute.GetCustomAttributes(node.BuildStep.GetType(), typeof(DependsOnAttribute));
            var dependNodes = nodeGraph.Where(n => depends.Any(d => d.DependedOnStep == n.BuildStep.GetType()));

            var edges = dependNodes.Select(n => new Edge(node, n)).ToArray();
            node.OutgoingEdges.AddRange(edges);

            foreach (var edge in edges)
                edge.DestinationNode.IncomingEdges.Add(edge);
        }

        var result = new Stack<Node>();
        var sourceNodes = new Stack<Node>(nodeGraph.Where(n => !n.IncomingEdges.Any()));
        while (sourceNodes.Count > 0)
        {
            var sourceNode = sourceNodes.Pop();
            result.Push(sourceNode);

            for (int i = sourceNode.OutgoingEdges.Count - 1; i >= 0; i--)
            {
                var edge = sourceNode.OutgoingEdges[i];
                edge.Remove();

                if (!edge.DestinationNode.IncomingEdges.Any())
                    sourceNodes.Push(edge.DestinationNode);
            }
        }

        if (nodeGraph.SelectMany(n => n.IncomingEdges).Any())
            throw new CircularDependencyException();

        return result.Select(n => n.BuildStep);
    }

}

Imagine how hard this code would’ve been to get right with no unit tests! When you have unit tests and nCrunch an indicator simply goes green when it works! If you haven’t seen or heard of nCrunch before definitely check that out. It is a fantastic tool.

Now that we have the dependency sorter in place all we need to do is add some more tests to the builder class.  These tests ensure that the steps are sorted into dependency order before they are committed and they are sorted in reverse dependency order when they are rolled back. With those tests in place it is quite trivial update the builder to sort the steps for commit and rollback (see snippet from builder below):

private void Execute(IEnumerable<IBuildStep> buildSteps, Action<IBuildStep> action)
{
    foreach (var buildStep in buildSteps)
    {
        action(buildStep);
    }
}

public void Commit()
{
    Execute(_buildStepDependencySorter.Sort(BuildSteps),
                buildStep => buildStep.Commit());
}

public void Rollback()
{
    Execute(_buildStepDependencySorter.Sort(BuildSteps).Reverse(),
                buildStep => buildStep.Rollback());
}

I love how clean that code is. When your code is short and to the point like this it is so much easier to read, maintain and test. That is the importance of following SOLID principles.

As always I welcome your feedback so feel free to tweet or email me.