Routing outbound webhook calls through Nginx on AWS

At form3 I have recently solved an interesting problem to route our outbound web hook calls via Nginx.  Before I dive into how I achieved this I want to set the landscape as to why you would want to do this.

For web hook calls your customer has to open up a https endpoint on the internet, so you really want to present a client certificate so the person receiving your call can verify it really is you calling them.  You could add the client auth code directly inside your service (our service is written in Java).  Although this becomes a pain for a number of reasons.  Firstly the code for doing this in Java is pretty cumbersome.  Secondly it means you now need to deploy this client certificate inside your application or load it in somehow, which shouldn’t really be the concern of the application.

The solution to all of this is to route the outbound call through Nginx and let Nginx take care of the client auth.  This is much neater as it means the application doesn’t need to worry about any certificates, instead it can simply post to Nginx and let Nginx do all of the heavy TLS lifting.

The above design is what we end up with.  Note we route everything using Linkerd to Linkerd.  If you want to read more about how we set this up on AWS ECS you can read my blog post on it. If the web hook url for a customer was then the notification api will post this through Linkerd to Nginx using the url https://localhost:4140/notification-proxy/? Note we have to url encode the callback address. We then setup nginx to proxy the request onto the address provided in the address query string parameter. This is done using the following Nginx server config:

The set_by_lua block is a piece of lua code that url decodes the address query string parameter.  the ngx.var.arg_address relates to the address query string parameter.  Note that you can replace address with anything to read any parameter from the query string.  This address is then used on line 12 as the proxy_pass parameter.  Lines 13 and 14 do the client auth signing.

The last trick to making all of this work was working out that the IP address on line 6 needs to change based on where you are running this.  This line is basically telling Nginx which DNS server to use to look up the proxy address.  Locally on my machine I need to use (the docker DNS server) however on AWS this address changes.  The last part of the jigsaw was working out how to find this dynamically.  You can do that by issuing the following command cat /etc/resolv.conf | grep nameserver | cut -d' ' -f2-. Once I had cracked that, I then updated the Nginx config file to be a template:

We then use something like the following startup script on our docker container that derives from openresty/openresty :

The clever part here is that we are setting an Environment variable on the fly which is the IP address of the DNS server using the command we worked out above.  Then we are using envsubst to substitute any environment variables in our template config file and writing our templated file out to disk.  So when Nginx starts the IP address will be correct.  This will all happen as the container starts so wherever the container is running (locally or on AWS) it will get the correct IP address and work.





Running SNS & SQS locally in docker containers supporting fan out

On AWS using SNS to fan out to multiple SQS queues is a common scenario. SNS fan out means creating a SQS queue for each consumer of an SNS message and subscribing each SQS queue to the SNS topic. This means when a message is sent to the SNS topic a copy of the message arrives in each consumer’s queue. It gives you multicast messaging and the ability to consume messages at your own pace and allowing you to not be online when a notification occurs.

I wanted to use SNS fan out in one of our components and as our testing model tests at the component level this means I needed to get a SNS SQS solution working in docker. Step forward ElasticMq and SNS.

Inside the example folder inside the SNS repository was the following docker compose file as an example to get SNS and SQS containers working together in fan out mode:

When started with the docker-compose up command the containers span up ok. The problem came when publishing a message to the sns topic using the following command:

The error received was:

So not a great start. For some reason the SNS container could not send the message on to the sqs container. Time to debug why….

The first step to working out why was going onto the SNS container and sending a message to the SQS container. This tells us whether or not the containers can talk to each other. When running this test the message got sent to the SQS queue successfully.

The next stage in testing was to look at the code for the SNS library to see if I could work out whether it logged out the SQS queue name it was trying to send it to. Upon inspection I realised that the SNS library was using Apache Camel to connect to SQS. I noticed that in the source code for Apache Camel it does log out a lot more information when the log level is set to trace. Going back to the SNS library there is the following logback.xml file:

I simply cloned the SNS repository from github, updated the level from DEBUG to TRACE and then recompiled the SNS code using the command sbt assembly. Once this finished it was simply a matter of copying the new jar into the root of the folder where I had cloned the SNS repo and updating the Dockerfile to use my newly compiled jar. The last change needed was updating the docker-compose.yml file in the example directory to:

The important line being that we are now using the local SNS container not the one from Github. To build this I simply ran docker-compose build and then docker-compose up. This time the SNS container started logging with trace logging. When I sent a message to SNS I got a much more informative error message:

Its clear now that the url for the queue is being set incorrectly. It should be http://sqs:9324/queue/queue1 as sqs is the name of the container, the reason we were getting connection refused before was that the messages were being sent to the host. To work out how to change this we had to dig through the Apache Camel code to work out how it configures its queue urls. We found that it queries sqs using the list queues command. Running the same list queues command on our running container revealed that the queues were being bound to localhost and not sqs.

To change this we simply had to use the following config file for elasticmq:

The key line being “host = sqs”. The last part to making everything work was updating the docker-compose.yml file to include the config file for elastic mq:

Once I tore down the containers and started them up again I ran the list queues command, this time the queues came back bound to sqs: http://sqs:9324/queue/queue1. I then ran the command to send the a message to SNS and could see it successfully get sent to SQS by receiving it with the following command:

And there we have it, a working SNS fan out to SQS using docker containers. The author of the SNS container has accepted a PR from my colleague sam-io to update the example docker-compose.yml with the fixes described here. Meaning that you can simply clone the SNS repository from github cd into the example directory and run docker-compose up and everything should work. A big thanks to the open source community and people like Sergey Novikov for providing such great tooling. Its great to be able to give something back!

Running Linkerd in a docker container on AWS ECS

I recently solved an interesting problem of configuring linkerd to run on an AWS ECS cluster.

Before I explain the linkerd configuration I think it would help to go through a diagram showing our setup:

A request comes in the top it then gets routed to a Kong instance.  Kong is configured to route the request to a local linkerd instance.  The local linkerd instance then uses its local Consul to find out where the service is.  It Then rewrites the request to call another linkerd on the destination server where the service resides (the one that was discovered in consul).  The linkerd on the service box then receives the request and uses its local consul to find the service.  At this point we use a filter to make sure it only uses the service located on the same box as essentially the service discovery has already happened by the calling linkerd.  We then call the service and reply.

The problem to solve when running on AWS ECS is how to bind to only services on your local box.  The normal way of doing this is to use the interpreter “io.l5d.localhost” (localhost).  Which will then filter services in consul that are on the local host.  When running in a docker container this won’t work as the local ip address of the linkerd in the docker container will not be the ip address of the server it is running on.  Meaning when it queries consul it will have no matches.

To solve this problem we can use the specificHost filter (added recently). We can use this to provide the IP address of the server to filter on. Now we run into another problem where we do not know the ip address of the server until runtime. There is a neat solution to this problem. Firstly we can write our own docker container based off the official one. Next we define a templated config file like this:

kind: default
– kind: io.l5d.specificHost
host: $LOCAL_IP

Notice that I have used $LOCAL_IP instead of the actual ip. This is because at runtime we can write a simple script that will set the $LOCAL_IP environment variable to the IP of the box the container is running on and then substitute all environment variables in the config and then run linkerd.

To do this we use the following code inside an file:

export LOCAL_IP=$(curl -s
envsubst < /opt/linkerd/conf/linkerd.conf.template > /opt/linkerd/conf/linkerd.conf
echo "starting linkerd on $LOCAL_IP..."
./bundle-exec /opt/linkerd/conf/linkerd.conf

The trick here is to use the AWS meta endpoint to grab the local ip of the machine. We then use the envsubst program to substitute out all environment variables in our config and build a real linkerd.conf file. From there we can simply start linkerd.

To make this work you need a Dockerfile like the following:

FROM buoyantio/linkerd:1.1.0

RUN wget -O /usr/bin/dumb-init
RUN chmod +x /usr/bin/dumb-init

RUN apt-get update && apt-get install gettext -y


ENTRYPOINT ["/usr/bin/dumb-init", "--"]

This dockerfile uses dumb-init to manage the linkerd process. It installs the gettext package which is where the envsubst program lives. It then kicks off the script which is where we do our environment substitution and start linkerd.

The nice thing about this is we can put any environment variables we want into our linkerd config template file and then supply them at runtime in our task definition file. They will then get substituted out when the container starts. For example I am supplying the common name to use for ssl this way in my container.

Although this solution works it would be great if linkerd supported having environment variables inside its config file and swapped them out automatically. Maybe a pull request for another time…