How To Run a Vault HA Cluster on Home Kubernetes Cluster

I have blogged about how I set up a bare-metal Kubernetes cluster on VMWare. In this post I’m going to explain how to run a HA Vault cluster on the cluster.

Vault is a cloud native secrets engine that can protect your secrets and allow applications to retrieve them giving you a central place to store secrets. This video by Armon Dadger Hashicorp CTO gives a really good oversight Vault.

To run a HA Vault cluster in Kubernetes requires quite a few steps so I wanted to document how to approach it. Disclaimer this is by far from a guide on how to deploy a production grade vault cluster. It is aimed at giving you a way to deploy a HA cluster as starting point or if you are link me and want to deploy a Vault cluster on your home Kubernetes cluster for fun!

Step 1- Create Vault Namespace

The official vault helm chart which we are going to use to install Vault sets up Vault in a vault namespace. In order to get our cluster ready we need to create the vault namespace using the following command:

kubectl create namespace vault

Step 2 – Generate Certificates

In order to run a Vault cluster you use a TLS certificate for vault to use in order for it to expose its https endpoints. There are a few approaches we can take but the easiest is to generate our own root CA and then use that root CA to sign a certificate. We can then add the root CA to the trust store of the machine we want to use to access the cluster so that it will trust the vault endpoints.

To generate the root CA first we need to create a new private key:

openssl genrsa -out rootCAKey.pem 4096

Create the certificate signing request with this key:

openssl req -x509 -sha256 -new -nodes -key rootCAKey.pem -days 10000 -out rootCACert.pem

Note I’m using an expiry of around 30 years for the root CA.

If you are doing this for real then you would probably want to look at setting up an intermediary but as this is just for fun we are going to sign the server certificate directly with the root CA. Lets generate a private key for the server:

openssl genrsa -out vault.key.pem 4096

For the vault server certificate, we will want to configure a few subject alternative names in order for everything to work correctly. Firstly, we can come up with a domain that we will later alias to point to our vault cluster, I’m going to use vault.local for that. Next, we need to create subject n subject alternative names in the format of vault-X.vault-internal where n is the number of vault nodes we want to run and x is the node number. For example, if we run 3 nodes then we would need to create a certificate with the 3 addresses vault-0.vault-internal, vault-1.vault-internal and vault-2.vault-internal. These internal endpoints are used by the vault nodes to talk to one another.

In order to create the subject alternative names we need to create a config file like the one below:

[req]
req_extensions = v3_req
distinguished_name = dn
prompt = no

[dn]
CN = vault.local

[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1 = vault.svc.cluster.local
DNS.2 = vault-0.vault-internal
DNS.3 = vault-1.vault-internal
DNS.4 = vault-2.vault-internal
DNS.5 = vault.local

With this config file in place we can now create the CSR for our vault server certificate:

openssl req -new -key vault.key.pem -sha256 -out vault.csr -config cert.config

To generate the server certificate we just need to sign the CSR we just generated with the root CA:

openssl x509 -req -sha256 -in vault.csr -CA rootCACert.pem -CAkey rootCAKey.pem -CAcreateserial -out rbaServerCert.pem -days 365 -extfile cert.config -extensions 'v3_req'

We have now created our two certificates one for the root CA and one for the vault server itself.

Step 3 – Upload Certificates

In order for Vault to be able to use the certificates, we have generated we need to load them into the cluster as kubernetes secrets. We need to only upload the certificate for the root CA so that we can set Vault to trust any certificate that is signed by this root. This is needed so that the Vault instances trust each other and can talk over TLS to each other.

To upload the root CA cert we can use an opaque secret:

kubectl --namespace='vault' create secret opaque rootCA ./rootCACert.pem

For the server certificate we need to upload both the certificate and the private key in order for our Vault container to use that certificate to host its TLS endpoint. To do this we can use the command:

kubectl --namespace='vault' create secret tls tls-server --cert ./vault.cert.pem --key ./vault.key.pem

This command will upload the server certificate using the Kubernetes inbuilt tls secret type which will create a secret with the key and certificate as data under tls.crt and tls.key.

Step 4 – Set up Local Storage Volumes

In order to run a Vault cluster we need to choose a storage backend. The solution with the least moving parts is to configure vault to use an inbuilt raft storage backend. To make this work we have to create persistent volumes for each vault container to use. To solve this we are going to create a persistent volume on disk on each node, we can use node affinity to pin the volume so that it is always on the same node.

To set this up we first have to configure local storage by creating the local storage class:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
reclaimPolicy: Delete
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

With the file above created, we can create the storage class using the kubectl apply -f storageclass.yaml command.

Next, we have to create the volumes themselves. I have 3 worker nodes in my cluster and am going to run a 3 node vault cluster so I am going to create 3 persistent volumes by using the code:

apiVersion: v1
kind: PersistentVolume
metadata:
name: volworker01
namespace: vault
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /srv/cluster/storage/vault
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker01

In the above code you will need to change the worker01 to the name of your worker node. This is the line that makes the volume only available on that node. You will have to copy and paste this file for as many nodes as you are going to run, altering the name and the node affinity clause each time. Then use the kubectl apply -f <filename> to create the volumes on the cluster.

Before the volumes will work you have to make sure the path you set physically exists on disk, so be sure to create that on each of your worker nodes.

Step 5 – Install Vault

With all of the above in place we are finally ready to install Vault! We are going to use helm to install Vault, so the first thing we need to do is add the hashicorp repo to helm:

helm repo add hashicorp https://helm.releases.hashicorp.com

Next we need to create the following values.yaml:

# Vault Helm Chart Value Overrides
global:
enabled: true
tlsDisable: false
injector:
enabled: true
# Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
image:
repository: "hashicorp/vault-k8s"
tag: "latest"
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
server:
# These Resource Limits are in line with node requirements in the
# Vault Reference Architecture for a Small Cluster
resources:
requests:
memory: 256Mi
cpu: 500m
limits:
memory: 256Mi
cpu: 500m
# For HA configuration and because we need to manually init the vault,
# we need to define custom readiness/liveness Probe settings
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 60
# extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
# used to include variables required for auto-unseal.
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/rootCA/rootCACert.pem
# extraVolumes is a list of extra volumes to mount. These will be exposed
# to Vault in the path `/vault/userconfig/<name>/`.
extraVolumes:
type: secret
name: tls-server
type: secret
name: rootCA
# This configures the Vault Statefulset to create a PVC for audit logs.
# See https://www.vaultproject.io/docs/audit/index.html to know more
auditStorage:
enabled: false
dataStorage:
enabled: true
storageClass: local-storage
standalone:
enabled: false
# Run Vault in "HA" mode.
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/tls-server/tls.crt"
tls_key_file = "/vault/userconfig/tls-server/tls.key"
tls_ca_cert_file = "/vault/userconfig/rootCA/rootCACert.pem"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200&quot;
leader_ca_cert_file = "/vault/userconfig/rootCA/rootCACert.pem"
leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200&quot;
leader_ca_cert_file = "/vault/userconfig/rootCA/rootCACert.pem"
leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200&quot;
leader_ca_cert_file = "/vault/userconfig/rootCA/rootCACert.pem"
leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
}
autopilot {
cleanup_dead_servers = "true"
last_contact_threshold = "200ms"
last_contact_failure_threshold = "10m"
max_trailing_logs = 250000
min_quorum = 5
server_stabilization_time = "10s"
}
}
service_registration "kubernetes" {}
# Vault UI
ui:
enabled: true
serviceType: "LoadBalancer"
serviceNodePort: null
externalPort: 8200
view raw values.yaml hosted with ❤ by GitHub

This values.yaml configures 3 vault nodes and mounts the certificates using the secrets we created earlier. I have tuned down the memory requirements of each node as the VMs I’m running are pretty small.

With all of this in place we can get Vault installed:

helm install vault hashicorp/vault --namespace vault -f values.yaml

Once vault is installed we can check its running by listing the pods:

kubectl get pods -n vault

This should print out the 3 vault pods vault-0, vault-1 and vault-2. To get the cluster up and running we need to initialise one of the vaults and then save the unseal keys and root token and add those to the other vaults.

To initialise one of the vaults first open a terminal on the container using:

kubectl exec -it --stdin=true --tty=true vault-0 /bin/ash

Now we need to turn off TLS verification as we are accessing vault on localhost, do this by setting up VAULT_SKIP_VERIFY environment variable:

export VAULT_SKIP_VERIFY=1

Now we can initialise vault using:

vault operator init

Make sure you save the unseal keys and root token somewhere safe and then unseal this vault by running vault operator unseal 3 times and providing 3 of the unseal keys. Then ssh onto the other two vault containers using the kubectl exec command given above but changing the pod name to vault-1 and vault-2, export the VAULT_SKIP_VERIFY variable and then run vault operator unseal 3 times to unseal both of the other vaults.

Step 6 – Configuring Your Machine To Use Vault

To finish our setup we need to configure our machine to be able to trust our new Vault cluster. To do that first load your root CA certificate into your machine’s trust store. How to do this will vary based on which operating system you are on. For Mac you can follow these instructions.

I have Metal LB setup as a load balancer on my cluster as explained in this post. Metal LB creates a virtual IP address that will load balance across the cluster and route to vault. I have added an entry to my /etc/hosts file with vault.local pointing at the virtual IP address for the Metal LB IP that points to vault.

Next, install vault on your machine, instructions to do this can be found on the vault install page. With all of that configured we can set the vault address for the cli to use by running:

export VAULT_ADDR=https://vault.local:8200

Now we can successfully use our new vault HA cluster, test it out by running vault status you should see the status of the cluster.

Running Unifi Controller On Home K8s Cluster With MetalLB

In the last two posts (part 1 and part 2), I covered how I turned my desktop machine into a hypervisor using EXSI to serve 6 VMs for a Kubernetes cluster. In this post we are going to cover how to setup the unifi controller software on the cluster.

I run Unifi (Ubiquiti) networking kit throughout my house. It is industrial grade networking equipment. To run the equipment you have to run the unifi controller software. The controller software gives you a UI that you use to configure your switches, access points etc. Ubiquiti sell a Cloud Key that runs the software but ~£130 feels like a lot of money to me for something just to run a configuration UI. I used to run the controller software in docker on my [https://global.download.synology.com/download/Document/Hardware/DataSheet/DiskStation/13-year/DS713+/enu/Synology_DS713_Plus_Data_Sheet_enu.pdf](Synology Diskstation 713+) but since I have started using the Synology to run my home CCTV setup, the 8 year old hardware was really struggling to run both the Video Surveillance and Unifi Controller software. So I thought it would be great to move the Unifi Controller over to my newly created Kubernetes cluster.

To make this work we are going to have to solve 2 problems:

  1. We are going to have to setup persistance in our cluster as the Unifi Controller software saves the network config and that needs to be there when the pod restarts
  2. We need a way to expose the controller software to our network on a static IP – just using one of the node IPs is not great as if that node goes down then all of our network kit can no longer talk to the controller, what we want is a virtual IP that points to a node that is running in our cluster

With the preamble out of the way lets get into solving problem one. How are we going to setup persistance in our Kubernetes cluster. To do this we need to setup a persistent volume, we can then claim this volume and attach it to a pod. For the persistant volume I thought the easist thing to do was to setup an NFS server. That way pod could launch on any node and simply attatch to the NFS server to mount the volume.

To create the NFS server, I simply made another clone of a VM (as covered in part 1). Once cloned, I changed the IP to the next available sequential IP 192.168.1.206. To setup the NFS share I ran the following commands:

sudo apt update
sudo apt install nfs-kernel-server

sudo mkdir /var/nfs -p
sudo chown nobody:nogroup /var/nfs
sudo chmod 777 -R /var/nfs

This installs the NFS server onto Ubuntu and sets up a folder for the share (/var/nfs). I set the permissions on this share wide open so that anyone can write to this share, which is good enough for a home cluster.

The last part is to expose this share out by editing the file /etc/exports and adding the following line to allow any machine on my network to read or write to the share:

/var/nfs	192.168.1.1/24(rw,sync,no_root_squash,no_subtree_check)

To make those changes take effect we need to restart the NFS server with sudo systemctl restart nfs-kernel-server.

With that in place we need to setup the persistant volume. A persistent volume is a way for you as the administrator of the cluster to make a space available for people using the cluster to mount on their pods. To setup the NFS share we just created as a peristent volume we can use the following yaml and applying with kubectl apply -f volume.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nas
spec:
  capacity:
    storage: 20Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: slow
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /var/nfs
    server: 192.168.1.206

This is making 20GB available for people to use on the share (/var/nfs) that we just setup on our NFS server (192.168.1.206). To use the persistent volume with the unifi software we need to claim it. To claim it we create a persistent volume claim with the following yaml and applying with kubectl apply -f claim.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: unifi-claim
spec:
  storageClassName: slow
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

We can check that the persistent volume has a bound claim by:

kubectl get persistentvolumes
NAME   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE
nas    20Gi       RWO            Recycle          Bound    kube-system/unifi-claim   slow                    23h

We can see that the persistent volume’s status is Bound and the claim is unifi-claim which was the claim we just created.

To setup the unifi controller software, I following the instructions on this helm chart. To begin we need to add the helm repo:

helm repo add k8s-at-home https://k8s-at-home.com/charts/
helm repo update

Before we install the chart we need to create a custom values.yaml in order to pass our persistent volume claim in for the unifi controller pod to use:

persistence:
  config:
    enabled: true
    type: pvc
    existingClaim: unifi-claim

With that in place we can install unifi using helm install unifi k8s-at-home/unifi -f values.yaml. Once installed I checked the status of the unifi pod and noticed that it wasn’t started. The error from the pod was bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount. helper program., which I realised is because we haven’t setup the NFS client on our worker nodes. This is quite straight forward to fix by logging onto each of the worker nodes and running:

sudo apt update
sudo apt install nfs-common

With this run on every node our pod now starts. But we have another issue, I checked the NFS share and no files were being written. From running the unifi software on docker on my Synology I knew that you needed to mount a volume to /var/lib/unifi on the container. From checking the pod definition I could see the volume mount was missing. I could not see a way to provide it on the chart. As an aside this is one of the things I dislike about Kubernetes. It can feel like death by configuration some of the time! Anyway I got the unifi deployment into a yaml file using kubectl get deployment unifi -o yaml &gt; unifi.yaml and then added another volume to it:


...[snip]...

      volumeMounts:
        - mountPath: /var/lib/unifi
          name: config
        - mountPath: /config
          name: config
...[snip]...

      volumes:
      - name: config
        persistentVolumeClaim:
          claimName: unifi-claim

With the extra volume in place at the /var/lib/unifi path I applied the deployment to my cluster and volia, files started appearing in the NFS share. Internally the unifi controller uses a Mongo database for state and it puts those files in the share.

The second problem to solve is how to make the controller available on a static virtual IP. We want to do this for two reasons. Firstly, if we pick a random node’s IP and use that for our controller then if that node goes down for any reason our controller will be offline. Secondly, if we use a service of type NodePort then this provides is with high port numbers in the range 30000-32767, there is no (easy) way to use the real port numbers of the controller. This is important as the unifi network equipment talks to the controller on a set of predefined ports and there is no way we can change that.

To solve our problem enter Metal LB. Metal LB is an awesome piece of software that allows you to setup a Kubernetes load balancer and point it at a virtual IP. Metal LB takes care of broadcasting this virtual IP and routing it to an active node on your cluster. If a node goes down then no problem, as Metal LB will repoint the virtual IP at a different node. This nicely solves both of our problems above.

To install Metal LB:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.10.2/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.10.2/manifests/metallb.yaml

Once installed to setup a virtual IP for Metal LB to use, we simply can create the following config map using kubectl apply -f map.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 192.168.1.61-192.169.1.61

Above I’m making the IP address 192.168.1.61. I’m doing that because that was the IP my old unifi controller was on so it enables me just to swap the controller out.

To make unifi use this IP we just have to change the deployment to make it use a Kubernetes load balancer by updating our values.yaml file to:

service:
  main:
    type: LoadBalancer
persistence:
  config:
    enabled: true
    type: pvc
    existingClaim: unifi-claim

Then we can apply using helm helm upgrade --install unifi k8s-at-home/unifi -f unifi.yaml. Once we have done that we can see that the unifi service is now being exposed on external IP 192.168.1.61 by using kubectl get service unifi:

NAME    TYPE           CLUSTER-IP       EXTERNAL-IP                                                                                  AGE
unifi   LoadBalancer   10.106.167.144   192.168.1.61

By using a load balancer it will expose all of the container ports on 192.168.1.61, we can verify by hitting the UI on https://192.168.1.61:8443. It works! To complete the setup, I backed up my old unifi controller and then restored it.

All of the open source software that makes the above possible is so awesome. Seriously impressed with Metal LB, how easy it was to setup and how cool it is. I now have a very highly available unifi controller, running in kubernetes and it performs like a dream!

Terraform Beginner to Master Book

A while ago I produced a Terraform course for INE. I really enjoyed the experience but there were a few things that left me frustrated. Such as the course was quite expensive (I couldn’t set the price) which set a high barrier to entry for most people. It was also quite hard to update the course due to the way the course was structured a small change would have been quite a lot of rework.

With the release of Terraform 0.12 where Hashicorp made a number of breaking changes, I wanted to release more content to help people get started with Terraform and learn why they should use it. I started to research into writing my own book and that’s when I found Leanpub.com. Leanpub is an amazing site where you can self publish your own book from markdown. It allows you to automate the flow of publishing a book from GitHub (great for developers) and the best part is that it allows you to release a book before it is finished and start getting feedback. Then as you write more and more of the book you can keep releasing new versions. The model is kind of like beta software. You can get feedback from your readers early and then take that feedback into the book to fix typos, improve chapters and even shape upcoming content.

My book Terraform: From Beginner to Master has been in the works for a couple of months. Available now on leanpub. You can choose to pay what you want (down to a minimum) for the book plus you are protected by leanpub’s refund policy. Where leanpub will give you your money back with no arguments within the first 45 days. This means you can try books out on Leanpub virtually risk-free. You can also download them into a format compatible with your favorite eReader.

The book takes you from not even knowing what Terraform is, to explaining the business case as to why you should use it, all the way through from giving you a solid understanding of how complex Terraform projects work using AWS for real-world examples. If you purchase the book and have any feedback it would be greatly welcomed.

Learn Terraform with AWS

A few months ago I did a Terraform course for ine.com.  The course guides you from knowing nothing about Terraform to how to use Terraform to manage your AWS infrastructure.  The course focuses entirely on using Terraform with AWS to give it some focus but you can apply the same techniques to use Terraform with any provider.

If you are interested in learning Terraform then please check out my course.  If you have bought the course and have any questions then please feel free to contact me.

Terraform provider kong – fully tested using docker containers

A couple of posts ago I talked about how you could achieve full stack testing in Go using Docker containers. I have just finished the first version of terraform provider kong, a terraform provider for Kong that is built on top of gokong.

At the moment at Form3 we configure Kong using a custom Ruby script that runs on our Kong container and configures Kong upon boot up. Whilst this works there are a number of problems with this approach:

  • It is hard to know what the end state of Kong is as you end up with a whole bunch of Ruby code that is hard to follow
  • If you want to migrate the container over time it can be quite tricky as its hard to clean up APIs, consumers and other Kong resources as you delete them (the database Kong uses is long lasting)

By writing a custom terraform provider for Kong all of these problems go away.  As terraform uses a declarative language you simply declare the state that you want the thing you are configuring to be in (in this case Kong) and terraform looks at the current state and automatically works out how to get there.  This makes everything so much simpler.  A good example is if you remove an API in our current world you have to remember to write a line of Ruby to delete the old API.  When using terraform you simply remove the API from your configuration and terraform deletes it for you.

Building upon the way that I wrote gokong I wrote terraform provider kong in the same vein, ie full stack testing using real components and no mocks! This means that the acceptance tests in terraform provider kong spin up a real Kong container and apply terraform plans to it, check it is in the expected state and then destroy it again. Testing in this way gives you ultimate confidence that the code works!

To automate releases of new versions of terraform provider kong I have used the excellent goreleaser. Goreleaser automates building your code for multiple platforms, zipping/tar balling it up and uploading it to github. It is so easy to use, kudos to the authors. To setup you simply need to create a goreleaser.yml file:


builds:
binary: terraform-provider-kong
goos:
darwin
linux
windows
goarch:
amd64
archive:
format: zip

view raw

goreleaser.yml

hosted with ❤ by GitHub

My file specifies the binary and builds for mac (Darwin), Linux and Windows.  The last part is to define a task to run goreleaser as part of your build.  Goreleaser will only run when the build is running as a tag build, therefore you can create a task and run it on every build and goreleaser will only build a release for a tagged build, pretty smart.  My build task to create a release looks like:


release:
go get github.com/goreleaser/goreleaser; \
goreleaser; \

view raw

Makefile

hosted with ❤ by GitHub

I run the release task as part of every travis build.  You also need to create a github token which is used by goreleaser to upload the release to your github account.  To generate a github token go to https://github.com/settings/tokens/new.  Then simply set it as a secure environment variable in your .travis.yml and goreleaser will pick it up for you.

Then when you want to create a release you can simply do:


git tag -a v0.1.0 -m "My release"
git push origin v0.1.0

view raw

make release

hosted with ❤ by GitHub

Which will build a new v0.1.0 release and automatically create the binaries and upload the release to Github.  Pretty smart!

The last thing I wanted to go through was that I’m taking advantage of the build matrix feature of travis to run the tests against multiple versions of kong. The following section of my travis.yml file:


env:
matrix:
KONG_VERSION=0.11 TF_ACC=1
KONG_VERSION=0.11.1 TF_ACC=1
KONG_VERSION=0.11.2 TF_ACC=1

view raw

.travis.yml

hosted with ❤ by GitHub

Travis will automatically run a build for each line under the matrix definition.  Each with the values on the line.  I pass the KONG_VERSION parameter all of the way through to the code that pulls the Kong container for Docker so it will pull the version that is specified here.  Hence I can easily update this build to run against many versions of Kong.  As new versions are released I can simply add a line to the build matrix and it will automatically be tested on travis.  This is a really powerful feature, being able to run your tests against multiple versions of a target piece of software is immense!

If you are interested terraform provider kong or gokong I would love to hear your feedback and get your ideas and contributions.

 

 

Full stack testing in golang with docker containers

I like to practice the approach of full stack component testing where the guiding principle is that you test the entire component from as high level as possible and only stub out third party dependencies (read other APIs) or something that isn’t easily available as a docker container.  I have recently started a golang project to write a client for kong and I thought this would be a good opportunity to use this testing strategy in a golang project.

I love Go but the one thing I don’t like so much about it is the approach that most people seem to be taking to testing. A lot of tests are at the method level where your tests end up being tightly coupled to your implementation. This is a bad thing. You know a test is not very good when if you have to change your test when you change your implementation. The reason this is bad is because firstly as you are changing your test the same time as you are changing your code once you have finished you have know way of knowing if the new implementation still works as the test has changed. It also restricts how much you can get in and edit the implementation as you have constantly having to update the way the tests mock out everything. By testing at the top component level the tests do not care about the implementation and the code runs with real components so it works how it will in the real world. By writing tests in this way I have seen a lot less defects and have never had to manually debug something.

Anyway back to the subject of the full stack testing approach in Go. To start I used the excellent dockertest project which gives you a great API to start and stop docker containers. I then took advantage of the fact that in a Go project there is a special test function that gets called for every test run:


func TestMain(m *testing.M) {
// setup
code := m.Run()
// teardown
os.Exit(code)
}

view raw

TestMain.go

hosted with ❤ by GitHub

In the above method you can do your test setup code where I have placed the //setup comment and your teardown code where I have placed the //teardown comment.  The code that gets returned by m.Run() is the exit code from the test run.  Go sets this to non zero if the test run fails so you need to exit with this code so your build will fail if your test run fails.  Now using this method I can start the kong docker container, run the tests and then stop the kong docker container.  Here is the full TestMain code at time of writing:y


func TestMain(m *testing.M) {
testContext := containers.StartKong(GetEnvVarOrDefault("KONG_VERSION", defaultKongVersion))
err := os.Setenv(EnvKongAdminHostAddress, testContext.KongHostAddress)
if err != nil {
log.Fatalf("Could not set kong host address env variable: %v", err)
}
code := m.Run()
containers.StopKong(testContext)
os.Exit(code)
}

view raw

TestMain.go

hosted with ❤ by GitHub

I have wrapped the starting and stopping of the kong container in a method to abstract away the detail.  Notice how the StartKong method takes the Kong version as a parameter.  It gets the Kong version either from the environment variable KONG_VERSION or if that environment variable is not set then it uses the default Kong version which I set to the latest version 0.11 at time of writing.  The cool thing about this is that if I want to run my tests against a different version of Kong I can do that easily by changing this value.  The really cool thing about this is that I can run the build against multiple versions of Kong on travis-ci by taking advantage of the env matrix feature.  If you list multiple values for an environment variable in travis-ci then travis-ci will automatically run a build for each entry.  This means it is really easy to run the whole test pack against multiple versions of Kong which is pretty neat.  You can check out the gokong build to see this in action!

The one part you may be wondering from all of this is how do I get the url of the container that Kong is running on for use in my tests.  That is done by setting an environment variable KONG_ADMIN_ADDR.  The client uses that environment variable if set and if not then it defaults to localhost:8001.

With all of this in place it allows me to test the client by hitting a real running Kong in a container, no mocks in sight!  How cool is that.  Plus I can run against any version of Kong that is built as a docker container with a flick of a switch!

Here is an example of what a test looks like so you can get a feel:


func Test_ApisGetById(t *testing.T) {
apiRequest := &ApiRequest{
Name: "test-" + uuid.NewV4().String(),
Hosts: []string{"example.com"},
Uris: []string{"/example"},
Methods: []string{"GET", "POST"},
UpstreamUrl: "http://localhost:4140/testservice&quot;,
StripUri: true,
PreserveHost: true,
Retries: 3,
UpstreamConnectTimeout: 1000,
UpstreamSendTimeout: 2000,
UpstreamReadTimeout: 3000,
HttpsOnly: true,
HttpIfTerminated: true,
}
apiClient := NewClient(NewDefaultConfig()).Apis()
createdApi, err := apiClient.Create(apiRequest)
assert.Nil(t, err)
assert.NotNil(t, createdApi)
result, err := apiClient.GetById(createdApi.Id)
assert.Equal(t, createdApi, result)
}

view raw

ExampleTest.go

hosted with ❤ by GitHub

I think that is really clean and readable.  All of the code that boots up and tears down Kong is out of sight and you can just concentrate on the test.  Again with no mocks around 🙂

If you want to see the rest of the code or help contribute to my gokong project that would be great.  I look forward to any feedback you have on this.

Download github releases from private repos in bash and docker builds

I wanted to add a short post to describe how to automate the downloading of releases from private github repositories using a bash script or in a Docker build.

To start you need to create a Github token that has access to your repository. Once you have your token you can use the following bash script filling in the relevant details:


#!/usr/bin/env bash
set -e
GITHUB_TOKEN=<my_token>
REPO="kevholditch/demo"
FILE="demo_0.0.1_linux_amd64.tar.gz"
VERSION="v0.0.1"
wget -q –auth-no-challenge –header='Accept:application/octet-stream' \
https://$GITHUB_TOKEN:@api.github.com/repos/$REPO/releases/assets/`curl -H "Authorization: token $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3.raw" -s https://api.github.com/repos/$REPO/releases | jq ". | map(select(.tag_name == \"$VERSION\"))[0].assets | map(select(.name == \"$FILE\"))[0].id"` \
-O /tmp/$FILE

view raw

download.sh

hosted with ❤ by GitHub

This script will download your release to the /tmp/ directory, from there you can untar and move it etc.

To take this a stage further if you want to download your release as part of a docker build you can use the Dockerfile snippet below to give you a starting point:


ARG GITHUB_TOKEN
ENV REPO "kevholditch/demo"
ENV FILE "demo_0.0.1_linux_amd64.tar.gz"
ENV VERSION "v0.0.1"
wget -q –auth-no-challenge –header='Accept:application/octet-stream' \
https://$GITHUB_TOKEN:@api.github.com/repos/$REPO/releases/assets/`curl -H "Authorization: token $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3.raw" -s https://api.github.com/repos/$REPO/releases | jq ". | map(select(.tag_name == \"$VERSION\"))[0].assets | map(select(.name == \"$FILE\"))[0].id"` \
-O /tmp/$FILE

view raw

Dockerfile

hosted with ❤ by GitHub

The trick here is that we are passing in the GITHUB_TOKEN using a docker build arg.  This allows you to build the container using travis by setting a secure ENV variable and then passing that into your docker build script as the docker arg parameter.  For example:


if [ "$GITHUB_TOKEN" = "" ]; then
echo "you need to create a github token with access to kevholditch/demo to run this build see https://github.com/settings/tokens/new"
exit -1
fi
docker build –build-arg "GITHUB_TOKEN=$GITHUB_TOKEN" -t kevholditch/demo .

In the script above we check that the GITHUB_TOKEN env variable is set and if it isn’t then we terminate with a non zero exit code, halting the build.  This then allows developers to run the build with their own GITHUB_TOKEN and you can run this build on travis by setting a secure env variable (or the equivalent in the builder server you are using).

 

Running SNS & SQS locally in docker containers supporting fan out

On AWS using SNS to fan out to multiple SQS queues is a common scenario. SNS fan out means creating a SQS queue for each consumer of an SNS message and subscribing each SQS queue to the SNS topic. This means when a message is sent to the SNS topic a copy of the message arrives in each consumer’s queue. It gives you multicast messaging and the ability to consume messages at your own pace and allowing you to not be online when a notification occurs.

I wanted to use SNS fan out in one of our components and as our testing model tests at the component level this means I needed to get a SNS SQS solution working in docker. Step forward ElasticMq and SNS.

Inside the example folder inside the SNS repository was the following docker compose file as an example to get SNS and SQS containers working together in fan out mode:


services:
sns:
image: s12v/sns
ports:
"9911:9911"
volumes:
./config:/etc/sns
depends_on:
sqs
sqs:
image: s12v/elasticmq
ports:
"9324:9324"

When started with the docker-compose up command the containers span up ok. The problem came when publishing a message to the sns topic using the following command:


aws sns publish –topic-arn arn:aws:sns:us-east-1:1465414804035:test1 –endpoint-url http://localhost:9911 –message "hello"

view raw

bashcommand

hosted with ❤ by GitHub

The error received was:


sns_1 | com.amazonaws.http.AmazonHttpClient executeHelper
sns_1 | INFO: Unable to execute HTTP request: Connection refused (Connection refused)
sns_1 | java.net.ConnectException: Connection refused (Connection refused)

view raw

error

hosted with ❤ by GitHub

So not a great start. For some reason the SNS container could not send the message on to the sqs container. Time to debug why….

The first step to working out why was going onto the SNS container and sending a message to the SQS container. This tells us whether or not the containers can talk to each other. When running this test the message got sent to the SQS queue successfully.

The next stage in testing was to look at the code for the SNS library to see if I could work out whether it logged out the SQS queue name it was trying to send it to. Upon inspection I realised that the SNS library was using Apache Camel to connect to SQS. I noticed that in the source code for Apache Camel it does log out a lot more information when the log level is set to trace. Going back to the SNS library there is the following logback.xml file:


<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} – %msg%n</pattern>
</encoder>
</appender>
<logger name="org.apache.camel" level="INFO"/>
<root level="DEBUG">
<appender-ref ref="STDOUT" />
</root>
</configuration>

I simply cloned the SNS repository from github, updated the level from DEBUG to TRACE and then recompiled the SNS code using the command sbt assembly. Once this finished it was simply a matter of copying the new jar into the root of the folder where I had cloned the SNS repo and updating the Dockerfile to use my newly compiled jar. The last change needed was updating the docker-compose.yml file in the example directory to:


services:
sns:
build: ../
ports:
"9911:9911"
volumes:
./config:/etc/sns
depends_on:
sqs
sqs:
image: s12v/elasticmq
ports:
"9324:9324"

The important line being that we are now using the local SNS container not the one from Github. To build this I simply ran docker-compose build and then docker-compose up. This time the SNS container started logging with trace logging. When I sent a message to SNS I got a much more informative error message:


TRACE o.a.c.component.aws.sqs.SqsEndpoint – Queue available at 'http://localhost:9324/queue/queue1&#39;.

view raw

basherror

hosted with ❤ by GitHub

Its clear now that the url for the queue is being set incorrectly. It should be http://sqs:9324/queue/queue1 as sqs is the name of the container, the reason we were getting connection refused before was that the messages were being sent to the host. To work out how to change this we had to dig through the Apache Camel code to work out how it configures its queue urls. We found that it queries sqs using the list queues command. Running the same list queues command on our running container revealed that the queues were being bound to localhost and not sqs.

To change this we simply had to use the following config file for elasticmq:


include classpath("application.conf")
node-address {
host = sqs
}
queues {
queue1 {}
}

view raw

queueconf.yml

hosted with ❤ by GitHub

The key line being “host = sqs”. The last part to making everything work was updating the docker-compose.yml file to include the config file for elastic mq:


services:
sns:
image: s12v/sns
ports:
"9911:9911"
volumes:
./config/db.json:/etc/sns/db.json
depends_on:
sqs
sqs:
image: s12v/elasticmq
ports:
"9324:9324"
volumes:
./config/elasticmq.conf:/etc/elasticmq/elasticmq.conf

Once I tore down the containers and started them up again I ran the list queues command, this time the queues came back bound to sqs: http://sqs:9324/queue/queue1. I then ran the command to send the a message to SNS and could see it successfully get sent to SQS by receiving it with the following command:


aws sqs receive-message –queue-url http://localhost:9324/queue/queue1 –region elasticmq –endpoint-url http://localhost:9324 –no-verify-ssl –no-sign-request –attribute-names All –message-attribute-names All

view raw

aws command

hosted with ❤ by GitHub

And there we have it, a working SNS fan out to SQS using docker containers. The author of the SNS container has accepted a PR from my colleague sam-io to update the example docker-compose.yml with the fixes described here. Meaning that you can simply clone the SNS repository from github cd into the example directory and run docker-compose up and everything should work. A big thanks to the open source community and people like Sergey Novikov for providing such great tooling. Its great to be able to give something back!

Mutual client ssl using nginx on AWS

An interesting problem I’ve recently had to solve is to integrate with a third party client who wanted to communicate with our services over the internet and use mutual client auth (ssl) to lock down the connection. The naive way to solve this would have been to put the code directly into the java service itself. This however is quite limiting and restrictive. For example if you want to update the certificates that are allowed you need to rerelease your service and your service is now tightly coupled to that customer’s way of doing things.

A neater solution is to offload this concern to a third party service (in this case nginx running on a docker container). This means that the original service can talk using normal http/s to nginx and nginx can do all of the hard work of the mutual client auth and proxying the request onto the customer.

When implementing this solution I couldn’t find a full example of how to set this up using nginx so I wanted to go through it. I want to split the explanation into two halves outgoing and incoming. First lets go through the outgoing config in nginx:

http {
  server {
    server_name outgoing_proxy;
    listen 8888;
    location / {
      proxy_pass                    http://thirdparty.com/api/;
      proxy_ssl_certificate         /etc/nginx/certs/client.cert.pem;
      proxy_ssl_certificate_key     /etc/nginx/certs/client.key.pem;
    }
  }
}

This is a pretty simple block to understand. It says we are hosting a server on port 8888 at the root. We are going to proxy all requests to http://thirdparty.com/api/ and use the client certificate specified to sign the requests. Pretty simple so far. The harder part is the configuration for the inbound:


http {
  map $ssl_client_s_dn $allowed_ssl_client_s_dn {
      default no;
      "CN=inbound.mycompany.com,OU=My Company,O=My Company Limited,L=London,C=GB" yes;
  }

  server {
    listen       443 ssl;
    server_name  inbound.mycompany.com;

    ssl_client_certificate  /etc/nginx/certs/client-ca-bundle.pem;
    ssl_certificate         /etc/nginx/certs/server.cert.pem;
    ssl_certificate_key     /etc/nginx/certs/server.key.pem;
    ssl_verify_client       on;
    ssl_verify_depth        2;

    location / {

      if ($allowed_ssl_client_s_dn = no) {
        add_header X-SSL-Client-S-DN $ssl_client_s_dn always;
        return 403;
      }

      proxy_pass localhost:4140/myservice/;
      proxy_set_header Host $host;
      proxy_set_header 'Content-Type' 'text/xml;charset=UTF-8';
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
  }
}

Before I explain the config block above I wanted to point out that in practice you place all of the code in the above two snippets inside the same http block.

Starting at the server block we can see that we are listening on port 443 (normal https port) using the hostname inbound.mycompany.com. This is where the ssl will terminate. We are using an AWS ELB to load balance requests to a pool of nginx servers that handle the ssl termination. The ssl_client_certificate is the ca bundle pem for all of the certificates with trust (intermediate and root authorities). The ssl_certificate and ssl_certificate_key host the server certificate (for the ssl endpoint ie a certificate with the subject name “inbound.mycompany.com”). ssl_verify_client on means to check that the client is trusted and ssl_verify_depth is how far down the certificate chain to check. The if statement says if we have verified that the presented client certificate is indeed one that we trust then lets check that the subject distinguished name is one that we are allowing explicitly. This is done by checking the map above. In my example config above the only subject distinguished name that will be allowed is “CN=inbound.mycompany.com,OU=My Company,O=My Company Limited,L=London,C=GB”. If the subject distinguished name is not allowed then nginx will return http code 403 forbidden. If it is allowed then we proxy the request onto myservice through linkerd.

By using a map to define allowed subject distinguished names we can easily generate this programmatically and it keeps all of the allowed subject distinguished names in a single place.

I really like the solution above as all of our services can talk to their local linkerd container ( see my linkerd post on running linkerd on AWS ECS), then linkerd can take care of talking using https between server boundaries and then nginx can take care of doing the ssl mutual auth to talk to the customer. The service does not need to worry about any of that. In fact as far as the service is concerned it is talking to a service running on the box (the local linkerd instance). This means it is much more flexible and if you have another service that needs to talk to that customer using mutual ssl that service can just talk through the same channel through linkerd and nginx. If you did it using the mutual ssl code directly in your service you would then have two copies (or two references) to a library to handle the mutual ssl that you would have to keep up to date with your client’s allowed certificates. That could quickly explode as you have to write more services or have more customers that want to use mutual client auth and would quickly become a nightmare to maintain. By solving this problem using a single service for just this job all of the ssl configuration is in a single place and all of the services are much simpler to write.

Running Linkerd in a docker container on AWS ECS

I recently solved an interesting problem of configuring linkerd to run on an AWS ECS cluster.

Before I explain the linkerd configuration I think it would help to go through a diagram showing our setup:

A request comes in the top it then gets routed to a Kong instance.  Kong is configured to route the request to a local linkerd instance.  The local linkerd instance then uses its local Consul to find out where the service is.  It Then rewrites the request to call another linkerd on the destination server where the service resides (the one that was discovered in consul).  The linkerd on the service box then receives the request and uses its local consul to find the service.  At this point we use a filter to make sure it only uses the service located on the same box as essentially the service discovery has already happened by the calling linkerd.  We then call the service and reply.

The problem to solve when running on AWS ECS is how to bind to only services on your local box.  The normal way of doing this is to use the interpreter “io.l5d.localhost” (localhost).  Which will then filter services in consul that are on the local host.  When running in a docker container this won’t work as the local ip address of the linkerd in the docker container will not be the ip address of the server it is running on.  Meaning when it queries consul it will have no matches.

To solve this problem we can use the specificHost filter (added recently). We can use this to provide the IP address of the server to filter on. Now we run into another problem where we do not know the ip address of the server until runtime. There is a neat solution to this problem. Firstly we can write our own docker container based off the official one. Next we define a templated config file like this:

interpreter:
    kind: default
    transformers:
    - kind: io.l5d.specificHost
      host: $LOCAL_IP

Notice that I have used $LOCAL_IP instead of the actual ip. This is because at runtime we can write a simple script that will set the $LOCAL_IP environment variable to the IP of the box the container is running on and then substitute all environment variables in the config and then run linkerd.

To do this we use the following code inside an entrypoint.sh file:

export LOCAL_IP=$(curl -s 169.254.169.254/latest/meta-data/local-ipv4)
envsubst < /opt/linkerd/conf/linkerd.conf.template > /opt/linkerd/conf/linkerd.conf
echo "starting linkerd on $LOCAL_IP..."
./bundle-exec /opt/linkerd/conf/linkerd.conf

The trick here is to use the AWS meta endpoint to grab the local ip of the machine. We then use the envsubst program to substitute out all environment variables in our config and build a real linkerd.conf file. From there we can simply start linkerd.

To make this work you need a Dockerfile like the following:

FROM buoyantio/linkerd:1.1.0

RUN wget -O /usr/bin/dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64
RUN chmod +x /usr/bin/dumb-init

RUN apt-get update && apt-get install gettext -y

ADD entrypoint.sh /entrypoint.sh

ENTRYPOINT ["/usr/bin/dumb-init", "--"]

This dockerfile uses dumb-init to manage the linkerd process. It installs the gettext package which is where the envsubst program lives. It then kicks off the entrypoint.sh script which is where we do our environment substitution and start linkerd.

The nice thing about this is we can put any environment variables we want into our linkerd config template file and then supply them at runtime in our task definition file. They will then get substituted out when the container starts. For example I am supplying the common name to use for ssl this way in my container.

Although this solution works it would be great if linkerd supported having environment variables inside its config file and swapped them out automatically. Maybe a pull request for another time…