Mutual client ssl using nginx on AWS

An interesting problem I’ve recently had to solve is to integrate with a third party client who wanted to communicate with our services over the internet and use mutual client auth (ssl) to lock down the connection. The naive way to solve this would have been to put the code directly into the java service itself. This however is quite limiting and restrictive. For example if you want to update the certificates that are allowed you need to rerelease your service and your service is now tightly coupled to that customer’s way of doing things.

A neater solution is to offload this concern to a third party service (in this case nginx running on a docker container). This means that the original service can talk using normal http/s to nginx and nginx can do all of the hard work of the mutual client auth and proxying the request onto the customer.

When implementing this solution I couldn’t find a full example of how to set this up using nginx so I wanted to go through it. I want to split the explanation into two halves outgoing and incoming. First lets go through the outgoing config in nginx:

http {
  server {
    server_name outgoing_proxy;
    listen 8888;
    location / {
      proxy_pass                    http://thirdparty.com/api/;
      proxy_ssl_certificate         /etc/nginx/certs/client.cert.pem;
      proxy_ssl_certificate_key     /etc/nginx/certs/client.key.pem;
    }
  }
}

This is a pretty simple block to understand. It says we are hosting a server on port 8888 at the root. We are going to proxy all requests to http://thirdparty.com/api/ and use the client certificate specified to sign the requests. Pretty simple so far. The harder part is the configuration for the inbound:


http {
  map $ssl_client_s_dn $allowed_ssl_client_s_dn {
      default no;
      "CN=inbound.mycompany.com,OU=My Company,O=My Company Limited,L=London,C=GB" yes;
  }

  server {
    listen       443 ssl;
    server_name  inbound.mycompany.com;

    ssl_client_certificate  /etc/nginx/certs/client-ca-bundle.pem;
    ssl_certificate         /etc/nginx/certs/server.cert.pem;
    ssl_certificate_key     /etc/nginx/certs/server.key.pem;
    ssl_verify_client       on;
    ssl_verify_depth        2;

    location / {

      if ($allowed_ssl_client_s_dn = no) {
        add_header X-SSL-Client-S-DN $ssl_client_s_dn always;
        return 403;
      }

      proxy_pass localhost:4140/myservice/;
      proxy_set_header Host $host;
      proxy_set_header 'Content-Type' 'text/xml;charset=UTF-8';
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
  }
}

Before I explain the config block above I wanted to point out that in practice you place all of the code in the above two snippets inside the same http block.

Starting at the server block we can see that we are listening on port 443 (normal https port) using the hostname inbound.mycompany.com. This is where the ssl will terminate. We are using an AWS ELB to load balance requests to a pool of nginx servers that handle the ssl termination. The ssl_client_certificate is the ca bundle pem for all of the certificates with trust (intermediate and root authorities). The ssl_certificate and ssl_certificate_key host the server certificate (for the ssl endpoint ie a certificate with the subject name “inbound.mycompany.com”). ssl_verify_client on means to check that the client is trusted and ssl_verify_depth is how far down the certificate chain to check. The if statement says if we have verified that the presented client certificate is indeed one that we trust then lets check that the subject distinguished name is one that we are allowing explicitly. This is done by checking the map above. In my example config above the only subject distinguished name that will be allowed is “CN=inbound.mycompany.com,OU=My Company,O=My Company Limited,L=London,C=GB”. If the subject distinguished name is not allowed then nginx will return http code 403 forbidden. If it is allowed then we proxy the request onto myservice through linkerd.

By using a map to define allowed subject distinguished names we can easily generate this programmatically and it keeps all of the allowed subject distinguished names in a single place.

I really like the solution above as all of our services can talk to their local linkerd container ( see my linkerd post on running linkerd on AWS ECS), then linkerd can take care of talking using https between server boundaries and then nginx can take care of doing the ssl mutual auth to talk to the customer. The service does not need to worry about any of that. In fact as far as the service is concerned it is talking to a service running on the box (the local linkerd instance). This means it is much more flexible and if you have another service that needs to talk to that customer using mutual ssl that service can just talk through the same channel through linkerd and nginx. If you did it using the mutual ssl code directly in your service you would then have two copies (or two references) to a library to handle the mutual ssl that you would have to keep up to date with your client’s allowed certificates. That could quickly explode as you have to write more services or have more customers that want to use mutual client auth and would quickly become a nightmare to maintain. By solving this problem using a single service for just this job all of the ssl configuration is in a single place and all of the services are much simpler to write.

Running Linkerd in a docker container on AWS ECS

I recently solved an interesting problem of configuring linkerd to run on an AWS ECS cluster.

Before I explain the linkerd configuration I think it would help to go through a diagram showing our setup:

A request comes in the top it then gets routed to a Kong instance.  Kong is configured to route the request to a local linkerd instance.  The local linkerd instance then uses its local Consul to find out where the service is.  It Then rewrites the request to call another linkerd on the destination server where the service resides (the one that was discovered in consul).  The linkerd on the service box then receives the request and uses its local consul to find the service.  At this point we use a filter to make sure it only uses the service located on the same box as essentially the service discovery has already happened by the calling linkerd.  We then call the service and reply.

The problem to solve when running on AWS ECS is how to bind to only services on your local box.  The normal way of doing this is to use the interpreter “io.l5d.localhost” (localhost).  Which will then filter services in consul that are on the local host.  When running in a docker container this won’t work as the local ip address of the linkerd in the docker container will not be the ip address of the server it is running on.  Meaning when it queries consul it will have no matches.

To solve this problem we can use the specificHost filter (added recently). We can use this to provide the IP address of the server to filter on. Now we run into another problem where we do not know the ip address of the server until runtime. There is a neat solution to this problem. Firstly we can write our own docker container based off the official one. Next we define a templated config file like this:

interpreter:
kind: default
transformers:
– kind: io.l5d.specificHost
host: $LOCAL_IP

Notice that I have used $LOCAL_IP instead of the actual ip. This is because at runtime we can write a simple script that will set the $LOCAL_IP environment variable to the IP of the box the container is running on and then substitute all environment variables in the config and then run linkerd.

To do this we use the following code inside an entrypoint.sh file:

export LOCAL_IP=$(curl -s 169.254.169.254/latest/meta-data/local-ipv4)
envsubst < /opt/linkerd/conf/linkerd.conf.template > /opt/linkerd/conf/linkerd.conf
echo "starting linkerd on $LOCAL_IP..."
./bundle-exec /opt/linkerd/conf/linkerd.conf

The trick here is to use the AWS meta endpoint to grab the local ip of the machine. We then use the envsubst program to substitute out all environment variables in our config and build a real linkerd.conf file. From there we can simply start linkerd.

To make this work you need a Dockerfile like the following:

FROM buoyantio/linkerd:1.1.0

RUN wget -O /usr/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64
RUN chmod +x /usr/bin/dumb-init

RUN apt-get update && apt-get install gettext -y

ADD entrypoint.sh /entrypoint.sh

ENTRYPOINT ["/usr/bin/dumb-init", "--"]

This dockerfile uses dumb-init to manage the linkerd process. It installs the gettext package which is where the envsubst program lives. It then kicks off the entrypoint.sh script which is where we do our environment substitution and start linkerd.

The nice thing about this is we can put any environment variables we want into our linkerd config template file and then supply them at runtime in our task definition file. They will then get substituted out when the container starts. For example I am supplying the common name to use for ssl this way in my container.

Although this solution works it would be great if linkerd supported having environment variables inside its config file and swapped them out automatically. Maybe a pull request for another time…