Exception Caught

Routing outbound webhook calls through Nginx on AWS

At form3 I have recently solved an interesting problem to route our outbound web hook calls via Nginx. Before I dive into how I achieved this I want to set the landscape as to why you would want to do this.

For web hook calls your customer has to open up a https endpoint on the internet, so you really want to present a client certificate so the person receiving your call can verify it really is you calling them. You could add the client auth code directly inside your service (our service is written in Java). Although this becomes a pain for a number of reasons. Firstly the code for doing this in Java is pretty cumbersome. Secondly it means you now need to deploy this client certificate inside your application or load it in somehow, which shouldn’t really be the concern of the application.

The solution to all of this is to route the outbound call through Nginx and let Nginx take care of the client auth. This is much neater as it means the application doesn’t need to worry about any certificates, instead it can simply post to Nginx and let Nginx do all of the heavy TLS lifting.

The above design is what we end up with. Note we route everything using Linkerd to Linkerd. If you want to read more about how we set this up on AWS ECS you can read my blog post on it. If the web hook url for a customer was https://kevinholditch.co.uk/callback then the notification api will post this through Linkerd to Nginx using the url https://localhost:4140/notification-proxy/?address=http%3A%2F%2Fkevinholditch.co.uk%2Fcallback. Note we have to url encode the callback address. We then setup nginx to proxy the request onto the address provided in the address query string parameter. This is done using the following Nginx server config:

	server {
	server_name notification_proxy;
	listen 8117;

	location / {
	resolver 127.0.0.11 ipv6=off;

	set_by_lua_block $proxy_addr {
	return ngx.unescape_uri(ngx.var.arg_address)
	}

	proxy_pass $proxy_addr;
	proxy_ssl_server_name on;
	proxy_ssl_certificate /etc/nginx/certs/client.cert.pem;
	proxy_ssl_certificate_key /etc/nginx/certs/client.key.pem;
	}
	}

view raw

nginx proxy.conf

hosted with ❤ by GitHub

The set_by_lua block is a piece of lua code that url decodes the address query string parameter. the ngx.var.arg_address relates to the address query string parameter. Note that you can replace address with anything to read any parameter from the query string. This address is then used on line 12 as the proxy_pass parameter. Lines 13 and 14 do the client auth signing.

The last trick to making all of this work was working out that the IP address on line 6 needs to change based on where you are running this. This line is basically telling Nginx which DNS server to use to look up the proxy address. Locally on my machine I need to use 127.0.0.11 (the docker DNS server) however on AWS this address changes. The last part of the jigsaw was working out how to find this dynamically. You can do that by issuing the following command cat /etc/resolv.conf | grep nameserver | cut -d' ' -f2-. Once I had cracked that, I then updated the Nginx config file to be a template:

	server {
	server_name notification_proxy;
	listen 8117;

	location / {
	resolver $DNS_ADDR ipv6=off;

	set_by_lua_block $proxy_addr {
	return ngx.unescape_uri(ngx.var.arg_address)
	}

	proxy_pass $proxy_addr;
	proxy_ssl_server_name on;
	proxy_ssl_certificate /etc/nginx/certs/client.cert.pem;
	proxy_ssl_certificate_key /etc/nginx/certs/client.key.pem;
	}
	}

view raw

nginx.template

hosted with ❤ by GitHub

We then use something like the following startup script on our docker container that derives from openresty/openresty :

	#!/usr/local/bin/dumb-init /bin/bash
	set -e

	export DNS_ADDR=`cat /etc/resolv.conf \| grep nameserver \| cut -d' ' -f2-`
	envsubst '$DNS_ADDR' < /etc/nginx/nginx.template > /usr/local/openresty/nginx/conf/nginx.conf

	/usr/local/openresty/bin/openresty -g "daemon off;"

view raw

envsubst nginx.sh

hosted with ❤ by GitHub

The clever part here is that we are setting an Environment variable on the fly which is the IP address of the DNS server using the command we worked out above. Then we are using envsubst to substitute any environment variables in our template config file and writing our templated file out to disk. So when Nginx starts the IP address will be correct. This will all happen as the container starts so wherever the container is running (locally or on AWS) it will get the correct IP address and work.

Full stack testing in golang with docker containers

I like to practice the approach of full stack component testing where the guiding principle is that you test the entire component from as high level as possible and only stub out third party dependencies (read other APIs) or something that isn’t easily available as a docker container. I have recently started a golang project to write a client for kong and I thought this would be a good opportunity to use this testing strategy in a golang project.

I love Go but the one thing I don’t like so much about it is the approach that most people seem to be taking to testing. A lot of tests are at the method level where your tests end up being tightly coupled to your implementation. This is a bad thing. You know a test is not very good when if you have to change your test when you change your implementation. The reason this is bad is because firstly as you are changing your test the same time as you are changing your code once you have finished you have know way of knowing if the new implementation still works as the test has changed. It also restricts how much you can get in and edit the implementation as you have constantly having to update the way the tests mock out everything. By testing at the top component level the tests do not care about the implementation and the code runs with real components so it works how it will in the real world. By writing tests in this way I have seen a lot less defects and have never had to manually debug something.

Anyway back to the subject of the full stack testing approach in Go. To start I used the excellent dockertest project which gives you a great API to start and stop docker containers. I then took advantage of the fact that in a Go project there is a special test function that gets called for every test run:

	func TestMain(m *testing.M) {

	// setup

	code := m.Run()

	// teardown

	os.Exit(code)

	}

view raw

TestMain.go

hosted with ❤ by GitHub

In the above method you can do your test setup code where I have placed the //setup comment and your teardown code where I have placed the //teardown comment. The code that gets returned by m.Run() is the exit code from the test run. Go sets this to non zero if the test run fails so you need to exit with this code so your build will fail if your test run fails. Now using this method I can start the kong docker container, run the tests and then stop the kong docker container. Here is the full TestMain code at time of writing:y

	func TestMain(m *testing.M) {

	testContext := containers.StartKong(GetEnvVarOrDefault("KONG_VERSION", defaultKongVersion))

	err := os.Setenv(EnvKongAdminHostAddress, testContext.KongHostAddress)
	if err != nil {
	log.Fatalf("Could not set kong host address env variable: %v", err)
	}

	code := m.Run()

	containers.StopKong(testContext)

	os.Exit(code)

	}

view raw

TestMain.go

hosted with ❤ by GitHub

I have wrapped the starting and stopping of the kong container in a method to abstract away the detail. Notice how the StartKong method takes the Kong version as a parameter. It gets the Kong version either from the environment variable KONG_VERSION or if that environment variable is not set then it uses the default Kong version which I set to the latest version 0.11 at time of writing. The cool thing about this is that if I want to run my tests against a different version of Kong I can do that easily by changing this value. The really cool thing about this is that I can run the build against multiple versions of Kong on travis-ci by taking advantage of the env matrix feature. If you list multiple values for an environment variable in travis-ci then travis-ci will automatically run a build for each entry. This means it is really easy to run the whole test pack against multiple versions of Kong which is pretty neat. You can check out the gokong build to see this in action!

The one part you may be wondering from all of this is how do I get the url of the container that Kong is running on for use in my tests. That is done by setting an environment variable KONG_ADMIN_ADDR. The client uses that environment variable if set and if not then it defaults to localhost:8001.

With all of this in place it allows me to test the client by hitting a real running Kong in a container, no mocks in sight! How cool is that. Plus I can run against any version of Kong that is built as a docker container with a flick of a switch!

Here is an example of what a test looks like so you can get a feel:

	func Test_ApisGetById(t *testing.T) {
	apiRequest := &ApiRequest{
	Name: "test-" + uuid.NewV4().String(),
	Hosts: []string{"example.com"},
	Uris: []string{"/example"},
	Methods: []string{"GET", "POST"},
	UpstreamUrl: "http://localhost:4140/testservice",
	StripUri: true,
	PreserveHost: true,
	Retries: 3,
	UpstreamConnectTimeout: 1000,
	UpstreamSendTimeout: 2000,
	UpstreamReadTimeout: 3000,
	HttpsOnly: true,
	HttpIfTerminated: true,
	}

	apiClient := NewClient(NewDefaultConfig()).Apis()
	createdApi, err := apiClient.Create(apiRequest)

	assert.Nil(t, err)
	assert.NotNil(t, createdApi)

	result, err := apiClient.GetById(createdApi.Id)

	assert.Equal(t, createdApi, result)

	}

view raw

ExampleTest.go

hosted with ❤ by GitHub

I think that is really clean and readable. All of the code that boots up and tears down Kong is out of sight and you can just concentrate on the test. Again with no mocks around 🙂

If you want to see the rest of the code or help contribute to my gokong project that would be great. I look forward to any feedback you have on this.

Download github releases from private repos in bash and docker builds

I wanted to add a short post to describe how to automate the downloading of releases from private github repositories using a bash script or in a Docker build.

To start you need to create a Github token that has access to your repository. Once you have your token you can use the following bash script filling in the relevant details:

	#!/usr/bin/env bash
	set -e

	GITHUB_TOKEN=<my_token>
	REPO="kevholditch/demo"
	FILE="demo_0.0.1_linux_amd64.tar.gz"
	VERSION="v0.0.1"

	wget -q –auth-no-challenge –header='Accept:application/octet-stream' \
	https://$GITHUB_TOKEN:@api.github.com/repos/$REPO/releases/assets/`curl -H "Authorization: token $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3.raw" -s https://api.github.com/repos/$REPO/releases \| jq ". \| map(select(.tag_name == \"$VERSION\"))[0].assets \| map(select(.name == \"$FILE\"))[0].id"` \
	-O /tmp/$FILE

view raw

download.sh

hosted with ❤ by GitHub

This script will download your release to the /tmp/ directory, from there you can untar and move it etc.

To take this a stage further if you want to download your release as part of a docker build you can use the Dockerfile snippet below to give you a starting point:

	ARG GITHUB_TOKEN
	ENV REPO "kevholditch/demo"
	ENV FILE "demo_0.0.1_linux_amd64.tar.gz"
	ENV VERSION "v0.0.1"

	wget -q –auth-no-challenge –header='Accept:application/octet-stream' \
	https://$GITHUB_TOKEN:@api.github.com/repos/$REPO/releases/assets/`curl -H "Authorization: token $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3.raw" -s https://api.github.com/repos/$REPO/releases \| jq ". \| map(select(.tag_name == \"$VERSION\"))[0].assets \| map(select(.name == \"$FILE\"))[0].id"` \
	-O /tmp/$FILE

view raw

Dockerfile

hosted with ❤ by GitHub

The trick here is that we are passing in the GITHUB_TOKEN using a docker build arg. This allows you to build the container using travis by setting a secure ENV variable and then passing that into your docker build script as the docker arg parameter. For example:

	if [ "$GITHUB_TOKEN" = "" ]; then
	echo "you need to create a github token with access to kevholditch/demo to run this build see https://github.com/settings/tokens/new"
	exit -1
	fi

	docker build –build-arg "GITHUB_TOKEN=$GITHUB_TOKEN" -t kevholditch/demo .

view raw

example_docker_build.sh

hosted with ❤ by GitHub

In the script above we check that the GITHUB_TOKEN env variable is set and if it isn’t then we terminate with a non zero exit code, halting the build. This then allows developers to run the build with their own GITHUB_TOKEN and you can run this build on travis by setting a secure env variable (or the equivalent in the builder server you are using).

Running SNS & SQS locally in docker containers supporting fan out

On AWS using SNS to fan out to multiple SQS queues is a common scenario. SNS fan out means creating a SQS queue for each consumer of an SNS message and subscribing each SQS queue to the SNS topic. This means when a message is sent to the SNS topic a copy of the message arrives in each consumer’s queue. It gives you multicast messaging and the ability to consume messages at your own pace and allowing you to not be online when a notification occurs.

I wanted to use SNS fan out in one of our components and as our testing model tests at the component level this means I needed to get a SNS SQS solution working in docker. Step forward ElasticMq and SNS.

Inside the example folder inside the SNS repository was the following docker compose file as an example to get SNS and SQS containers working together in fan out mode:

	services:
	sns:
	image: s12v/sns
	ports:
	– "9911:9911"
	volumes:
	– ./config:/etc/sns
	depends_on:
	– sqs
	sqs:
	image: s12v/elasticmq
	ports:
	– "9324:9324"

view raw

dockercompose.yml

hosted with ❤ by GitHub

When started with the docker-compose up command the containers span up ok. The problem came when publishing a message to the sns topic using the following command:

aws sns publish –topic-arn arn:aws:sns:us-east-1:1465414804035:test1 –endpoint-url http://localhost:9911 –message "hello"

view raw

bashcommand

hosted with ❤ by GitHub

The error received was:

	sns_1 \| com.amazonaws.http.AmazonHttpClient executeHelper
	sns_1 \| INFO: Unable to execute HTTP request: Connection refused (Connection refused)
	sns_1 \| java.net.ConnectException: Connection refused (Connection refused)

view raw

error

hosted with ❤ by GitHub

So not a great start. For some reason the SNS container could not send the message on to the sqs container. Time to debug why….

The first step to working out why was going onto the SNS container and sending a message to the SQS container. This tells us whether or not the containers can talk to each other. When running this test the message got sent to the SQS queue successfully.

The next stage in testing was to look at the code for the SNS library to see if I could work out whether it logged out the SQS queue name it was trying to send it to. Upon inspection I realised that the SNS library was using Apache Camel to connect to SQS. I noticed that in the source code for Apache Camel it does log out a lot more information when the log level is set to trace. Going back to the SNS library there is the following logback.xml file:

	<configuration>
	<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
	<encoder>
	<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} – %msg%n</pattern>
	</encoder>
	</appender>

	<logger name="org.apache.camel" level="INFO"/>

	<root level="DEBUG">
	<appender-ref ref="STDOUT" />
	</root>
	</configuration>

view raw

logbackconfig.xml

hosted with ❤ by GitHub

I simply cloned the SNS repository from github, updated the level from DEBUG to TRACE and then recompiled the SNS code using the command sbt assembly. Once this finished it was simply a matter of copying the new jar into the root of the folder where I had cloned the SNS repo and updating the Dockerfile to use my newly compiled jar. The last change needed was updating the docker-compose.yml file in the example directory to:

	services:
	sns:
	build: ../
	ports:
	– "9911:9911"
	volumes:
	– ./config:/etc/sns
	depends_on:
	– sqs
	sqs:
	image: s12v/elasticmq
	ports:
	– "9324:9324"

view raw

dockercompose.yml

hosted with ❤ by GitHub

The important line being that we are now using the local SNS container not the one from Github. To build this I simply ran docker-compose build and then docker-compose up. This time the SNS container started logging with trace logging. When I sent a message to SNS I got a much more informative error message:

TRACE o.a.c.component.aws.sqs.SqsEndpoint – Queue available at 'http://localhost:9324/queue/queue1'.

view raw

basherror

hosted with ❤ by GitHub

Its clear now that the url for the queue is being set incorrectly. It should be http://sqs:9324/queue/queue1 as sqs is the name of the container, the reason we were getting connection refused before was that the messages were being sent to the host. To work out how to change this we had to dig through the Apache Camel code to work out how it configures its queue urls. We found that it queries sqs using the list queues command. Running the same list queues command on our running container revealed that the queues were being bound to localhost and not sqs.

To change this we simply had to use the following config file for elasticmq:

	include classpath("application.conf")

	node-address {
	host = sqs
	}

	queues {
	queue1 {}
	}

view raw

queueconf.yml

hosted with ❤ by GitHub

The key line being “host = sqs”. The last part to making everything work was updating the docker-compose.yml file to include the config file for elastic mq:

	services:
	sns:
	image: s12v/sns
	ports:
	– "9911:9911"
	volumes:
	– ./config/db.json:/etc/sns/db.json
	depends_on:
	– sqs
	sqs:
	image: s12v/elasticmq
	ports:
	– "9324:9324"
	volumes:
	– ./config/elasticmq.conf:/etc/elasticmq/elasticmq.conf

view raw

dockercompose.yml

hosted with ❤ by GitHub

Once I tore down the containers and started them up again I ran the list queues command, this time the queues came back bound to sqs: http://sqs:9324/queue/queue1. I then ran the command to send the a message to SNS and could see it successfully get sent to SQS by receiving it with the following command:

aws sqs receive-message –queue-url http://localhost:9324/queue/queue1 –region elasticmq –endpoint-url http://localhost:9324 –no-verify-ssl –no-sign-request –attribute-names All –message-attribute-names All

view raw

aws command

hosted with ❤ by GitHub

And there we have it, a working SNS fan out to SQS using docker containers. The author of the SNS container has accepted a PR from my colleague sam-io to update the example docker-compose.yml with the fixes described here. Meaning that you can simply clone the SNS repository from github cd into the example directory and run docker-compose up and everything should work. A big thanks to the open source community and people like Sergey Novikov for providing such great tooling. Its great to be able to give something back!

Mutual client ssl using nginx on AWS

An interesting problem I’ve recently had to solve is to integrate with a third party client who wanted to communicate with our services over the internet and use mutual client auth (ssl) to lock down the connection. The naive way to solve this would have been to put the code directly into the java service itself. This however is quite limiting and restrictive. For example if you want to update the certificates that are allowed you need to rerelease your service and your service is now tightly coupled to that customer’s way of doing things.

A neater solution is to offload this concern to a third party service (in this case nginx running on a docker container). This means that the original service can talk using normal http/s to nginx and nginx can do all of the hard work of the mutual client auth and proxying the request onto the customer.

When implementing this solution I couldn’t find a full example of how to set this up using nginx so I wanted to go through it. I want to split the explanation into two halves outgoing and incoming. First lets go through the outgoing config in nginx:

http {
  server {
    server_name outgoing_proxy;
    listen 8888;
    location / {
      proxy_pass                    http://thirdparty.com/api/;
      proxy_ssl_certificate         /etc/nginx/certs/client.cert.pem;
      proxy_ssl_certificate_key     /etc/nginx/certs/client.key.pem;
    }
  }
}

This is a pretty simple block to understand. It says we are hosting a server on port 8888 at the root. We are going to proxy all requests to http://thirdparty.com/api/ and use the client certificate specified to sign the requests. Pretty simple so far. The harder part is the configuration for the inbound:


http {
  map $ssl_client_s_dn $allowed_ssl_client_s_dn {
      default no;
      "CN=inbound.mycompany.com,OU=My Company,O=My Company Limited,L=London,C=GB" yes;
  }

  server {
    listen       443 ssl;
    server_name  inbound.mycompany.com;

    ssl_client_certificate  /etc/nginx/certs/client-ca-bundle.pem;
    ssl_certificate         /etc/nginx/certs/server.cert.pem;
    ssl_certificate_key     /etc/nginx/certs/server.key.pem;
    ssl_verify_client       on;
    ssl_verify_depth        2;

    location / {

      if ($allowed_ssl_client_s_dn = no) {
        add_header X-SSL-Client-S-DN $ssl_client_s_dn always;
        return 403;
      }

      proxy_pass localhost:4140/myservice/;
      proxy_set_header Host $host;
      proxy_set_header 'Content-Type' 'text/xml;charset=UTF-8';
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
  }
}

Before I explain the config block above I wanted to point out that in practice you place all of the code in the above two snippets inside the same http block.

Starting at the server block we can see that we are listening on port 443 (normal https port) using the hostname inbound.mycompany.com. This is where the ssl will terminate. We are using an AWS ELB to load balance requests to a pool of nginx servers that handle the ssl termination. The ssl_client_certificate is the ca bundle pem for all of the certificates with trust (intermediate and root authorities). The ssl_certificate and ssl_certificate_key host the server certificate (for the ssl endpoint ie a certificate with the subject name “inbound.mycompany.com”). ssl_verify_client on means to check that the client is trusted and ssl_verify_depth is how far down the certificate chain to check. The if statement says if we have verified that the presented client certificate is indeed one that we trust then lets check that the subject distinguished name is one that we are allowing explicitly. This is done by checking the map above. In my example config above the only subject distinguished name that will be allowed is “CN=inbound.mycompany.com,OU=My Company,O=My Company Limited,L=London,C=GB”. If the subject distinguished name is not allowed then nginx will return http code 403 forbidden. If it is allowed then we proxy the request onto myservice through linkerd.

By using a map to define allowed subject distinguished names we can easily generate this programmatically and it keeps all of the allowed subject distinguished names in a single place.

I really like the solution above as all of our services can talk to their local linkerd container ( see my linkerd post on running linkerd on AWS ECS), then linkerd can take care of talking using https between server boundaries and then nginx can take care of doing the ssl mutual auth to talk to the customer. The service does not need to worry about any of that. In fact as far as the service is concerned it is talking to a service running on the box (the local linkerd instance). This means it is much more flexible and if you have another service that needs to talk to that customer using mutual ssl that service can just talk through the same channel through linkerd and nginx. If you did it using the mutual ssl code directly in your service you would then have two copies (or two references) to a library to handle the mutual ssl that you would have to keep up to date with your client’s allowed certificates. That could quickly explode as you have to write more services or have more customers that want to use mutual client auth and would quickly become a nightmare to maintain. By solving this problem using a single service for just this job all of the ssl configuration is in a single place and all of the services are much simpler to write.

Running Linkerd in a docker container on AWS ECS

I recently solved an interesting problem of configuring linkerd to run on an AWS ECS cluster.

Before I explain the linkerd configuration I think it would help to go through a diagram showing our setup:

A request comes in the top it then gets routed to a Kong instance. Kong is configured to route the request to a local linkerd instance. The local linkerd instance then uses its local Consul to find out where the service is. It Then rewrites the request to call another linkerd on the destination server where the service resides (the one that was discovered in consul). The linkerd on the service box then receives the request and uses its local consul to find the service. At this point we use a filter to make sure it only uses the service located on the same box as essentially the service discovery has already happened by the calling linkerd. We then call the service and reply.

The problem to solve when running on AWS ECS is how to bind to only services on your local box. The normal way of doing this is to use the interpreter “io.l5d.localhost” (localhost). Which will then filter services in consul that are on the local host. When running in a docker container this won’t work as the local ip address of the linkerd in the docker container will not be the ip address of the server it is running on. Meaning when it queries consul it will have no matches.

To solve this problem we can use the specificHost filter (added recently). We can use this to provide the IP address of the server to filter on. Now we run into another problem where we do not know the ip address of the server until runtime. There is a neat solution to this problem. Firstly we can write our own docker container based off the official one. Next we define a templated config file like this:

interpreter:
    kind: default
    transformers:
    - kind: io.l5d.specificHost
      host: $LOCAL_IP

Notice that I have used $LOCAL_IP instead of the actual ip. This is because at runtime we can write a simple script that will set the $LOCAL_IP environment variable to the IP of the box the container is running on and then substitute all environment variables in the config and then run linkerd.

To do this we use the following code inside an entrypoint.sh file:

export LOCAL_IP=$(curl -s 169.254.169.254/latest/meta-data/local-ipv4)
envsubst < /opt/linkerd/conf/linkerd.conf.template > /opt/linkerd/conf/linkerd.conf
echo "starting linkerd on $LOCAL_IP..."
./bundle-exec /opt/linkerd/conf/linkerd.conf

The trick here is to use the AWS meta endpoint to grab the local ip of the machine. We then use the envsubst program to substitute out all environment variables in our config and build a real linkerd.conf file. From there we can simply start linkerd.

To make this work you need a Dockerfile like the following:

FROM buoyantio/linkerd:1.1.0

RUN wget -O /usr/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64
RUN chmod +x /usr/bin/dumb-init

RUN apt-get update && apt-get install gettext -y

ADD entrypoint.sh /entrypoint.sh

ENTRYPOINT ["/usr/bin/dumb-init", "--"]

This dockerfile uses dumb-init to manage the linkerd process. It installs the gettext package which is where the envsubst program lives. It then kicks off the entrypoint.sh script which is where we do our environment substitution and start linkerd.

The nice thing about this is we can put any environment variables we want into our linkerd config template file and then supply them at runtime in our task definition file. They will then get substituted out when the container starts. For example I am supplying the common name to use for ssl this way in my container.

Although this solution works it would be great if linkerd supported having environment variables inside its config file and swapped them out automatically. Maybe a pull request for another time…

Integration testing SSIS ETL packages

Integration testing SSIS ETL packages can be quite a challenge! The reason for this is often the package is written against several large source databases with several gigabytes (or even terabytes) of data. The only way people have of testing the SSIS package is to run it, which takes several hours so the feedback loop is very slow. This also can leave a company without an environment for a day (or more) whilst the issue is fixed.

How can we go about writing an automated test around our SSIS package to ensure that any changes are going to work before we run it against our large production databases? The answer to this is by writing a full end to end integration test that spins up the source database, inserts some data into it, spins up a destination database then runs the SSIS package and then asserts the data is now in the destination database. Once we have this in place we can test every aspect of an SSIS package to make sure it is functioning correctly. The build can then be run on a CI server so we have the same level of confidence about our SSIS package as we do for our production code!

A short rant before I begin explaining how the build works… It continues to amaze me how many companies do not have their database in source control so the only version of the database is the one sitting on the instance in live (and often different versions of it scattered around test environments). When you sit down and think about this it is a crazy situation. It is not that tough to put your database into source control and write a CI build around it. Having your database in source control gives you so many benefits its so surprising to me it is neglected as an after thought.

Rant over lets get on to how to create an integration test around an SSIS package. I have created a proof of concept that you are free to clone and use as you see fit see SSISIntegrationTesting on github. I have tried to make the readme on the repository as comprehensive as possible so if that is sufficient feel free to dive over there and check out the code.

The code uses the local sql server instance to create throw away databases for testing. This is a really useful trick and one that I’ve used in my SqlJuxt F# database comparison project. The great thing about using the local db instance on sql server is that it is available on any box where sql server management tools are installed. So you do not even need the full version of sql server installed to get up and running with it. So it makes it easy to get the build up and running on your build server.

The other trick to making your SSIS package testable is by parameterising your connection strings. To do this go to the data flow view in visual studio and click on the data flow tab. From there right click on the connection in the connection manager pane at the bottom and select “parameterize”. This allows you to pass in a parameter to override the connection string but it will default to the existing connection string you have set up.

If we open up the SSISIntegrationTesting.sln in the repository you will see the package1.dtsx SSIS package. This is a very simple package that uses ETL to copy all data from the products table in the source database to a products table in the destination database. Obviously in reality your SSIS job will be much more complex than this but by solving testing for this simple base case we can build from here.

I am a big fan of writing your tests using XBehave. This allows you to write meaningful descriptions in your test using Given, When, Then. On top of this I like to use builder classes to build the data and write descriptive methods for asserting the data. In my view the test should be readable in that you should be able to walk up to it and realise exactly what it is doing. Too many tests in my view have reams and reams of code and you have to spend quite a while working out what is going on.

From here on I think the best way to finish this article is to go through the integration test in the project and describe it step by step. I am going to paste the code line by line and then add a description below it. Although I do not think you will really need much of a description as the code is self describing as previously mentioned. All of the code for the test is in the PackageScenarios.cs file in the SSISTests project inside the SSISIntegrationTesting.sln in the github repository.

 "Given a source database with one row in the products table"
._(() =>
{
    sourceDatabase = _testServer.CreateNew();
    sourceDatabase.ExecuteScript("database.sql");
    var connection = Database.OpenConnection(sourceDatabase.ConnectionString);
    connection.Products.Insert(ProductCode: 1, ShippingWeight: 2f, ShippingLength: 3f,
        ShippingWidth: 4f, ShippingHeight: 5f, UnitCost: 6f, PerOrder: 2);
});

The first step in the test sets up an empty source database. It then runs in our schema which is stored in the database.sql file. Note in a real project the schema should come from your database CI build. It then uses Simple.Data to insert a product into the products table. Simple.Data is an excellent lightweight ORM that we can use to make it easier to write queries against our database. In this example Simple.Data takes advantages of the C# dynamic type to create an insert statement for us.

"And an empty destination database with a products table"
._(() =>
{
    destDatabase = _testServer.CreateNew();
    destDatabase.ExecuteScript("database.sql");
});

Next we create an another database this time to use for our destination database. Again we run in our schema which is contained in the database.sql file.

"When I execute the migration package against the source and dest databases"
._(() => result = PackageRunner.Run("Package1.dtsx", new
{
    Source_ConnectionString = sourceDatabase.ConnectionString.ToSsisCompatibleConnectionString(),
    Dest_ConnectionString = destDatabase.ConnectionString.ToSsisCompatibleConnectionString(),                    
}));

Now comes the action of testing the SSIS package. Notice here that we are passing in the connection strings of our source and destination SSIS packages for use. This will override the connection strings in the package so our two test databases will be used.

"Then the package should execute successfully"
._(() => result.Should().BeTrue());

I have built the package runner to return a bool as to whether or not it succeeded. Which is sufficient for this proof of concept but if you wanted to extend this to return any specific errors that came back then you could do so. Here we just assert that the package ran successfully.

"And the products table in the destination database should contain the row from the source database"
._(() => destDatabase.AssertTable().Products.ContainsExactlyOneRowMatching(
   new {
       ProductCode = 1,
       ShippingWeight= 2f,
       ShippingLength= 3f,
       ShippingWidth= 4f,
       ShippingHeight= 5f,
       UnitCost= 6f,
       PerOrder= 2
        }
   ));

Lastly we assert that the data is in fact in the destination database. This line of code looks quite magical so let me explain how it works. The AssertTable() extension method returns a dynamic which means after the “.” we can put anything we want, in this case we put “Products” as we want the products table. We then override the TryGetMember method on dynamic object to grab the string “products” and pass that along to our next method which is ContainsExactlyOneRowMatching. This method under the covers takes the anonymous C# that you pass in and uses Simple.Data to construct a sql query that can be run against the database. This means that this assertion is very efficient as it tries to select a single row from the products table with a where clause with all of the fields you pass in using the anonymous object. I think the syntax for this is very neat as it allows you to quickly assert data in your database and it is very readable.

Note all of the databases created by the test are destroyed in the Dispose method of the test. Go into the code if you want to see exactly how this happens.

There we have it, a full repeatable end to end SSIS integration test. I believe we have the building blocks here to create tests for more complex SSIS builds. I hope this helps you constructing your own builds, feel free to use any of the code or get in touch via the comments if you want any help/advice or have any suggestions as to how I can make this proof of concept even better.

SqlJuxt – Speeding up the tests

SqlJuxt is coming along nicely. I have a great test pack in the project that tests through the top level API. I have spoken before about the value of this (gives you freedom to change your implementation etc) so I won’t harp on about it again. The problem is that the tests take 2 minutes 40 seconds to run, clearly too long. Especially when you consider that number is going to keep on growing as I add more tests.

The first step to debugging why the tests were taking so long was to turn on timings inside the Resharper test runner. This gives you a break down of how long each test takes. From that dialog I could see that all of the tests that created 2 databases for comparison were taking around 6 seconds. I then put timing around each line in one of these tests and found that the whole test ran in around 0.3 seconds and the test clean up took the other 5.7 seconds. This lead me to realise that it was the dropping of the database that was causing the slowdown.

To prove this theory I commented out the drop database code (which if you remember is in the dispose method in the DisposableDatabase type). I ran the whole test pack again and it ran in 12 seconds. Wowzer! Clearly this is not a solution as each test run will leave behind lots of databases.

I noticed that after the tests have finished if you drop a database then it happens pretty much instantaneously, but it takes ages when it happens as the last step of the test. This lead me to realise that something in my test was hanging on to the database connection causing it to have to rip the connection away. I examined my test code and I definitely close down the SqlConnection so this left me stumped for a little while. Then I got on to reading about connection pooling. Then it twigged that ADO .Net turns on connection pooling by default. What this means is that when you close a sql connection the connection doesn’t get completely destroyed. Instead it gets put into a sleeping state. This is to speed up the process of getting a connection next time you want one. In nearly all applications this is what you want as you are going to be continually connecting to your database to make connections and you do not want to keep on going through the overhead of the handshake to set up a connection. However in my scenario this is not what I want. I am making a database connection, running in a script to setup a schema, comparing that schema and then dropping it again and at that point I no longer need the database. The solution to my issue was to simply turn off connection pooling. This means that when you close the connection it is completely gone so when you drop the database it drops pretty much instantly as it doesn’t have to rip away any existing connections. To turn off connection pooling you simply add “pooling=false;” to the connection string. I did that and ran the tests and all of the tests now run in 12 seconds. Quite an improvement, its like I found the turbo button!!

I just want to reiterate that I am not advocating turning off connection pooling for everyone. It is defaulted to on for a reason (its probably what you want), its just in my specialised scenario it was causing a massive performance overhead.

It is quite a win having found that as now I can run my whole integration test pack in 12 seconds it makes it much easier to refactor.

Check out the full source code at SqlJuxt GitHub repository.

SqlJuxt – Active patterns for the win

F# Active Patterns are awesome!! I wanted to start this blog post with that statement. I was not truly aware of F# active patterns until I read the article on F Sharp for Fun and Profit when I was looking at a better way to use the .Net Int32.Parse function.

Active patterns are a function that you can define that can match on an expression. For example:

let (|Integer|_|) (s:string) = 
    match Int32.TryParse(s) with
        | (true, i) -> Some i
        | _ -> None

To explain what each line is doing the | characters are called banana clips (not sure why) and here we are defining a partial active pattern. This means the pattern has to return Option of some type. So be definition the pattern may return a value or it may not so it is a partial match. This pattern takes a string and then returns Some int if the pattern matches or None if it does not. Once this pattern is defined it can be used in a normal match expression as follows:

let printNumber (str:string) =
	match str with
		| Integer i -> printfn "Integer: %d" i
		| _ -> "%s is not a number" %s

Using this technique it allows us to define a really cool active pattern for matching regular expressions and parsing out the groups that are matched:

let (|ParseRegex|_|) regex str =
    let m = Regex(regex).Match(str)
    match m with 
        | m when m.Success -> Some (List.tail [ for x in m.Groups -> x.Value] )
        | _ -> None

This regex active pattern returns all of the matched groups if the regex is a match or None if it is not a match. Note the reason List.tail is used is to skip the first element as that is the fully matched string which we don’t want, we only want the groups.

The reason why all of this came up is that the getNextAvailableName function that I wrote about in my last blog post is very long winded. For those who haven’t read my last post this function generates a unique name by taking a candidate name and a list of names that have been taken. If the name passed in has been taken then the function will add a number on to the end of the name and keep incrementing it until it finds a name that is not taken. The getNextAvailableName function was defined as:

let rec getNextAvailableName (name:string) (names: string list) =

    let getNumber (chr:char) =
        match Int32.TryParse(chr.ToString()) with
            | (true, i) -> Some i
            | _ -> None

    let grabLastChar (str:string) =
        str.[str.Length-1]

    let pruneLastChar (str:string) =
        str.Substring(0, str.Length - 1)

    let pruneNumber (str:string) i =
        str.Substring(0, str.Length - i.ToString().Length)

    let getNumberFromEndOfString (s:string)  =

        let rec getNumberFromEndOfStringInner (s1:string) (n: int option) =
            match s1 |> String.IsNullOrWhiteSpace with
                | true -> n
                | false -> match s1 |> grabLastChar |> getNumber with
                            | None -> n
                            | Some m ->  let newS = s1 |> pruneLastChar
                                         match n with 
                                            | Some n1 -> let newN = m.ToString() + n1.ToString() |> Convert.ToInt32 |> Some
                                                         getNumberFromEndOfStringInner newS newN
                                            | None -> getNumberFromEndOfStringInner newS (Some m) 
        let num = getNumberFromEndOfStringInner s None
        match num with
            | Some num' -> (s |> pruneNumber <| num', num)
            | None -> (s, num)
        

    let result = names |> List.tryFind(fun x -> x = name)
    match result with
        | Some r -> let (n, r) = getNumberFromEndOfString name
                    match r with 
                        | Some r' -> getNextAvailableName (n + (r'+1).ToString()) names
                        | None -> getNextAvailableName (n + "2") names
                    
        | None -> name

With the Integer and Regex active patterns defined as explained above the new version of the getNextAvailableName function is:

let rec getNextAvailableName (name:string) (names: string list) =

    let result = names |> List.tryFind(fun x -> x = name)
    match result with
        | Some r -> let (n, r) = match name with
                                    | ParseRegex "(.*)(\d+)$" [s; Integer i] -> (s, Some i)
                                    | _ -> (name, None)
                    match r with 
                        | Some r' -> getNextAvailableName (n + (r'+1).ToString()) names
                        | None -> getNextAvailableName (n + "2") names
                    
        | None -> name

I think it is pretty incredible how much simpler this version of the function is. It does exactly the same job (all of my tests still pass:) ). It really shows the power of active patterns and how they simplify your code. I think they also make the code more readable as even if you didn’t know the definition of the ParseRegex active pattern you could guess from the code.

Check out the full source code at SqlJuxt GitHub repository.

SqlJuxt – Making index names unique

As part of the fluent database builder for SqlJuxt I automatically generate names for objects that you create on the database such as primary keys and indexes. For primary keys this is an easy task as you can only have one primary key on a table so I can simply use the string “PK_

table>”. As a table name has to be unique I’m assured that the primary key name will be unique.

The problem comes when you want to create unique names for indexes. On Sql Server it is legal to have more than one non clustered index on a table on the same columns. So I am left with two options I either get the user to specify the index name which feels a bit clunky and unnecessary or I have to check for duplicate names and generate unique names. I chose the latter approach.

The naming convention for the indexes I have gone with is “IDX_

table>_” where column_names is a list of column names separated by underscores.

To keep the names unique I decided that if there was already an index with the name that is generated using the formula above then I would append a number on the end. If that name was taken then I would increment the number until I found a name that wasn’t taken.

To do the work of finding a unique name I thought it was best to abstract this out into its own function that could work with any name. The function signature I came up with was:

getNextAvailableName: string -> string list -> string

The function takes a string which is the name and a list of string which are the names that have been taken it then gives you back a new string which will be unique.

To write this function I needed a set of tests (test first remember) to prove out my test cases, these are:

[<Test>]
let ``should return name passed in when name is not in collection as collection is empty``() =
    getNextAvailableName "my_index" []
        |> should equal "my_index"

[<Test>]
let ``should return name passed in when name is not in collection``() =
    getNextAvailableName "my_index" ["some_index"; "some_other"]
        |> should equal "my_index"

[<Test>]
let ``should return my_index2 when my_index is in collection``() =
    getNextAvailableName "my_index" ["my_index"]
        |> should equal "my_index2"


[<Test>]
let ``should return my_index3 when my_index and my_index2 are in collection``() =
    getNextAvailableName "my_index" ["my_index"; "my_index2"]
        |> should equal "my_index3"

[<Test>]
let ``should return my_index3 when my_index and my_index2 and my_index33 are in collection``() =
    getNextAvailableName "my_index" ["my_index"; "my_index2"; "my_index33"]
        |> should equal "my_index3"

[<Test>]
let ``should return my_index2 when my_index and my_index22 are in collection``() =
    getNextAvailableName "my_index" ["my_index"; "my_index22"]
        |> should equal "my_index2"

[<Test>]
let ``should return my_index10 when my_index 2-9 are already in collection``() =
    getNextAvailableName "my_index" ["my_index"; "my_index2"; "my_index3"; "my_index4"; "my_index5"; "my_index6"; "my_index7"; "my_index8"; "my_index9"]
        |> should equal "my_index10"

I love how readable tests are in F# when you use FsUnit!! Now we have our tests defined we can go ahead an implement the function. I am sure that I didn’t do this in the most functional and efficient way. If anyone could help tidy this up then I would greatly appreciate it. My implementation is:

let rec getNextAvailableName (name:string) (names: string list) =

        let getNumber (chr:char) =
            match Int32.TryParse(chr.ToString()) with
                | (true, i) -> Some i
                | _ -> None

        let grabLastChar (str:string) =
            str.[str.Length-1]

        let pruneLastChar (str:string) =
            str.Substring(0, str.Length - 1)

        let pruneNumber (str:string) i =
            str.Substring(0, str.Length - i.ToString().Length)

        let getNumberFromEndOfString (s:string)  =

            let rec getNumberFromEndOfStringInner (s1:string) (n: int option) =
                match s1 |> String.IsNullOrWhiteSpace with
                    | true -> n
                    | false -> match s1 |> grabLastChar |> getNumber with
                                | None -> n
                                | Some m ->  let newS = s1 |> pruneLastChar
                                             match n with 
                                                | Some n1 -> let newN = m.ToString() + n1.ToString() |> Convert.ToInt32 |> Some
                                                             getNumberFromEndOfStringInner newS newN
                                                | None -> getNumberFromEndOfStringInner newS (Some m) 
            let num = getNumberFromEndOfStringInner s None
            match num with
                | Some num' -> (s |> pruneNumber <| num', num)
                | None -> (s, num)
            

        let result = names |> List.tryFind(fun x -> x = name)
        match result with
            | Some r -> let (n, r) = getNumberFromEndOfString name
                        match r with 
                            | Some r' -> getNextAvailableName (n + (r'+1).ToString()) names
                            | None -> getNextAvailableName (n + "2") names
                        
            | None -> name

I’m sure there are some tricks you can do with pattern matching to shorten this down. Now that we have this function and all of the tests pass it is trivial to plug it in to our database builder. All we have to do is generate the index name using the formula above and then call the getNextAvailableName function with the generated index name and a list of all of the index names on the table. The function will then give us back a unique name to use. This gets proved out by the following test:

[<Test>]
let ``should name indexes sequentially when there are multiple indexes defined that would generate the same name``() =
    CreateTable "MyIndexedTable"
        |> WithInt "MyKeyColumn"
        |> WithInt "SecondKeyColumn"
        |> WithNonClusteredIndex UNIQUE [("MyKeyColumn", ASC); ("SecondKeyColumn", DESC)]
        |> WithNonClusteredIndex UNIQUE [("MyKeyColumn", ASC); ("SecondKeyColumn", DESC)]
        |> WithNonClusteredIndex NONUNIQUE [("MyKeyColumn", ASC) ; ("SecondKeyColumn", DESC)]
        |> ScriptTable
        |> should equal @"CREATE TABLE [dbo].[MyIndexedTable]( [MyKeyColumn] [int] NOT NULL, [SecondKeyColumn] [int] NOT NULL )
GO

CREATE UNIQUE NONCLUSTERED INDEX IDX_MyIndexedTable_MyKeyColumn_SecondKeyColumn ON [dbo].[MyIndexedTable] ([MyKeyColumn] ASC, [SecondKeyColumn] DESC)
GO


CREATE UNIQUE NONCLUSTERED INDEX IDX_MyIndexedTable_MyKeyColumn_SecondKeyColumn2 ON [dbo].[MyIndexedTable] ([MyKeyColumn] ASC, [SecondKeyColumn] DESC)
GO


CREATE NONCLUSTERED INDEX IDX_MyIndexedTable_MyKeyColumn_SecondKeyColumn3 ON [dbo].[MyIndexedTable] ([MyKeyColumn] ASC, [SecondKeyColumn] DESC)
GO"

We can see how the generated index names are the same so they have been numbered.

Check out the full source code at SqlJuxt GitHub repository.