Integration testing SSIS ETL packages can be quite a challenge! The reason for this is often the package is written against several large source databases with several gigabytes (or even terabytes) of data. The only way people have of testing the SSIS package is to run it, which takes several hours so the feedback loop is very slow. This also can leave a company without an environment for a day (or more) whilst the issue is fixed.
How can we go about writing an automated test around our SSIS package to ensure that any changes are going to work before we run it against our large production databases? The answer to this is by writing a full end to end integration test that spins up the source database, inserts some data into it, spins up a destination database then runs the SSIS package and then asserts the data is now in the destination database. Once we have this in place we can test every aspect of an SSIS package to make sure it is functioning correctly. The build can then be run on a CI server so we have the same level of confidence about our SSIS package as we do for our production code!
A short rant before I begin explaining how the build works… It continues to amaze me how many companies do not have their database in source control so the only version of the database is the one sitting on the instance in live (and often different versions of it scattered around test environments). When you sit down and think about this it is a crazy situation. It is not that tough to put your database into source control and write a CI build around it. Having your database in source control gives you so many benefits its so surprising to me it is neglected as an after thought.
Rant over lets get on to how to create an integration test around an SSIS package. I have created a proof of concept that you are free to clone and use as you see fit see SSISIntegrationTesting on github. I have tried to make the readme on the repository as comprehensive as possible so if that is sufficient feel free to dive over there and check out the code.
The code uses the local sql server instance to create throw away databases for testing. This is a really useful trick and one that I’ve used in my SqlJuxt F# database comparison project. The great thing about using the local db instance on sql server is that it is available on any box where sql server management tools are installed. So you do not even need the full version of sql server installed to get up and running with it. So it makes it easy to get the build up and running on your build server.
The other trick to making your SSIS package testable is by parameterising your connection strings. To do this go to the data flow view in visual studio and click on the data flow tab. From there right click on the connection in the connection manager pane at the bottom and select “parameterize”. This allows you to pass in a parameter to override the connection string but it will default to the existing connection string you have set up.
If we open up the SSISIntegrationTesting.sln in the repository you will see the package1.dtsx SSIS package. This is a very simple package that uses ETL to copy all data from the products table in the source database to a products table in the destination database. Obviously in reality your SSIS job will be much more complex than this but by solving testing for this simple base case we can build from here.
I am a big fan of writing your tests using XBehave. This allows you to write meaningful descriptions in your test using Given, When, Then. On top of this I like to use builder classes to build the data and write descriptive methods for asserting the data. In my view the test should be readable in that you should be able to walk up to it and realise exactly what it is doing. Too many tests in my view have reams and reams of code and you have to spend quite a while working out what is going on.
From here on I think the best way to finish this article is to go through the integration test in the project and describe it step by step. I am going to paste the code line by line and then add a description below it. Although I do not think you will really need much of a description as the code is self describing as previously mentioned. All of the code for the test is in the PackageScenarios.cs file in the SSISTests project inside the SSISIntegrationTesting.sln in the github repository.
"Given a source database with one row in the products table"
sourceDatabase = _testServer.CreateNew();
var connection = Database.OpenConnection(sourceDatabase.ConnectionString);
connection.Products.Insert(ProductCode: 1, ShippingWeight: 2f, ShippingLength: 3f,
ShippingWidth: 4f, ShippingHeight: 5f, UnitCost: 6f, PerOrder: 2);
The first step in the test sets up an empty source database. It then runs in our schema which is stored in the database.sql file. Note in a real project the schema should come from your database CI build. It then uses Simple.Data to insert a product into the products table. Simple.Data is an excellent lightweight ORM that we can use to make it easier to write queries against our database. In this example Simple.Data takes advantages of the C# dynamic type to create an insert statement for us.
"And an empty destination database with a products table"
destDatabase = _testServer.CreateNew();
Next we create an another database this time to use for our destination database. Again we run in our schema which is contained in the database.sql file.
"When I execute the migration package against the source and dest databases"
._(() => result = PackageRunner.Run("Package1.dtsx", new
Source_ConnectionString = sourceDatabase.ConnectionString.ToSsisCompatibleConnectionString(),
Dest_ConnectionString = destDatabase.ConnectionString.ToSsisCompatibleConnectionString(),
Now comes the action of testing the SSIS package. Notice here that we are passing in the connection strings of our source and destination SSIS packages for use. This will override the connection strings in the package so our two test databases will be used.
"Then the package should execute successfully"
._(() => result.Should().BeTrue());
I have built the package runner to return a bool as to whether or not it succeeded. Which is sufficient for this proof of concept but if you wanted to extend this to return any specific errors that came back then you could do so. Here we just assert that the package ran successfully.
"And the products table in the destination database should contain the row from the source database"
._(() => destDatabase.AssertTable().Products.ContainsExactlyOneRowMatching(
ProductCode = 1,
Lastly we assert that the data is in fact in the destination database. This line of code looks quite magical so let me explain how it works. The AssertTable() extension method returns a dynamic which means after the “.” we can put anything we want, in this case we put “Products” as we want the products table. We then override the TryGetMember method on dynamic object to grab the string “products” and pass that along to our next method which is ContainsExactlyOneRowMatching. This method under the covers takes the anonymous C# that you pass in and uses Simple.Data to construct a sql query that can be run against the database. This means that this assertion is very efficient as it tries to select a single row from the products table with a where clause with all of the fields you pass in using the anonymous object. I think the syntax for this is very neat as it allows you to quickly assert data in your database and it is very readable.
Note all of the databases created by the test are destroyed in the Dispose method of the test. Go into the code if you want to see exactly how this happens.
There we have it, a full repeatable end to end SSIS integration test. I believe we have the building blocks here to create tests for more complex SSIS builds. I hope this helps you constructing your own builds, feel free to use any of the code or get in touch via the comments if you want any help/advice or have any suggestions as to how I can make this proof of concept even better.