Creating a virtual 6 node HA Kubernetes cluster with Cilium using VMware – part 2

In part 1 we setup 6 Ubuntu Server VMs (using VMware ESXI) all available on the host network. In this part we are going to take those 6 VMs and install Kubernetes with Cilium CNI. Our Kubernetes setup will have 3 master nodes (control plane) and 3 worker nodes, making it a highly available cluster. The astute reader will point out that all of the VMs are running on a single host so making this cluster HA is kind of pointless. Yes to a point I would agree with you, but making the cluster HA is better for two reasons, firstly, it will enable zero downtime upgrades to the control plane and secondly it gives us experience in making a HA cluster which for a production use cases we would want to do.

To setup Kubernetes I have decided to go with kubeadm. I know this is cheating a tiny bit versus installing all of the components ourselves, but even though kubeadm does a bit of heavy lifting for us, we will still need to understand a bit about what is going on to get the cluster working.

The instructions I’m going to be following can be found on the kubeadm site. I’m going to talk through the salient points here as I had to add a few workarounds to get the cluster running. First step is to install kubeadm on every node (it would’ve been better to have included this in our first VM that we cloned as it would’ve saved some time but you live and learn). To install kubeadm:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

Once kubeadm is install we can run the following command on master01 to initailise the master node: `

kubeadm init --control-plane-endpoint=skycluster --pod-network-cidr=10.10.0.0/16 --skip-phases=addon/kube-proxy

The above command sets the control-plane endpoint as is recommended by kubeadm if you want to setup a HA cluster. For now skycluster is just a hosts entry that points to master01‘s IP. I have added this entry to every machine in our setup. We are setting the skip-phase kube-proxy flag because I am planning on using Cilium as a CNI. Cilium uses eBPF to super power your cluster and provides the kube-proxy functionality by using eBPF instead of iptables. I recently interviewed Dan Wendlandt the CEO of Isovalent who are the company behind Cilium in the Form3 .tech podcast, an episode well worth listening to!

I got a warning saying that swap in enabled and kubeadm should be run with swap off. To turn off swap run sudo swapoff -a. To make sure that swap stays off even after a reboot we need to edit /etc/fstab and comment out the following line /swap.img none swap sw 0 0 by adding a # to the start of it. After turning off swap, I ran the the kubeadm init command again.

This time command timed out. From the error message it states that the kubelet cannot be contacted. The kubelet is a service that runs on each node in our cluster. Running the command systemctl status kubelet reveals that the kubelet service is not running. For more information as to why the service isn’t running I ran journalctl -u kubelet, then press Shift+g to get to the bottom of the logs, then right arrow to see the full error message which was Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\ is different from docker cgroup driver: \"cgroupfs\"". To fix this we have to switch either kubelet or docker to use the same cgroup driver. From doing some reading it is recommended to use systemd as the cgroup driver so I updated docker to use systemd by running:

sudo -i
echo '{"exec-opts": ["native.cgroupdriver=systemd"]}' >> /etc/docker/daemon.json
systemctl restart docker

With that change in place lets run kube init... again.  This time we get: Your Kubernetes control-plane has initialized successfully!

Wahoo!

To be able to contact the cluster from our machine we need to add the cluster config to our local kube config file. To do this we can copy the file /etc/kubernetes/admin.conf from the master01 node onto our machine. From there we can grab the entry for the new cluster and put it into our kube config file located in $HOME/.kube/config. Once we have done that if we get nodes by running kubectl get nodes we see

NAME STATUS ROLES AGE VERSION
master01 NotReady control-plane,master 13m v1.22.0

The node is not ready! To find out why we can describe the node using kubectl describe node and then if we look in events, we see the error container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized. When you install a new kubernetes cluster it does not come with a CNI. The container CNI is responsible for handling the IP addresses for pods that get created (amongst other things). As stated earlier we are going to use Cilium as our CNI of choice so we can super power our cluster.

To install cilium we can use the following command

helm install cilium cilium/cilium --version 1.9.9 \
    --namespace kube-system \
    --set kubeProxyReplacement=strict \
    --set k8sServiceHost=192.168.1.200 \
    --set k8sServicePort=6443

Which I got from the cilium documentation on how to install cilium without kube proxy. I got the port of kube api server by describing the kube-api server pod using describe pod kube-apiserver-master01.

Now that cilium is installed when we check the status of the nodes again we now see:

NAME STATUS ROLES AGE VERSION
master01 Ready control-plane,master 15m v1.22.0

Awesome! The first control plane node is now fully working!

Lets not rest on our laurels, the next job is to setup the two other control plane nodes nodes. This would’ve been made easier if i would’ve had the forsight to pass --upload-certs flag to the initial kubeadm init command. This flag stores the control plane certificates inside a kubernetes secret, meaning the joining control plane nodes can simply download them. Unfortunately, I did not do that so I had to copy the certificates manually. Luckily this helper script is available on the kubeadm documentation:

USER=kevin # customizable
CONTROL_PLANE_IPS="master02 master03"
for host in ${CONTROL_PLANE_IPS}; do
  scp /etc/kubernetes/pki/ca.crt "${USER}"@$host:
  scp /etc/kubernetes/pki/ca.key "${USER}"@$host:
  scp /etc/kubernetes/pki/sa.key "${USER}"@$host:
  scp /etc/kubernetes/pki/sa.pub "${USER}"@$host:
  scp /etc/kubernetes/pki/front-proxy-ca.crt "${USER}"@$host:
  scp /etc/kubernetes/pki/front-proxy-ca.key "${USER}"@$host:
  scp /etc/kubernetes/pki/etcd/ca.crt "${USER}"@$host:etcd-ca.crt
  # Quote this line if you are using external etcd
  scp /etc/kubernetes/pki/etcd/ca.key "${USER}"@$host:etcd-ca.key
done

This script copies all of the certificates we need onto each of our control plane nodes (master02 and master03). Once run, we have to go on to each node and copy all of the certs to /etc/kubernetes/pki (this folder won’t exist so we need to create it). Then we need to copy the etcd-ca.key and etcd-ca.crt to /etc/kubernetes/pki/etcd/ca.key and /etc/kubernetes/pki/etcd/ca.crt respectively. Once those changes are in place I had to change docker to use systemd and turn off swap on each node by (the steps we saw earlier):

sudo -i
swapoff -a
systemctl restart docker
systemctl daemon-reload

With those changes in place we can now run the kubeadm join command that we got given when we successfully ran kubeadm init:

sudo kubeadm join skycluster:6443 --token xxx --discovery-token-ca-cert-hash xxx --control-plane

After running this on both nodes we see a message saying that the node joined successfully. Now when we check the status of the nodes we see:

master01 Ready control-plane,master 24h v1.22.0
master02 Ready control-plane,master 7m48s v1.22.0
master03 Ready control-plane,master 23m v1.22.0```

All of our control plane nodes are now ready to go. To join the worker nodes we just have to run the following set of commands:

sudo -i
swapoff -a
echo '{"exec-opts": ["native.cgroupdriver=systemd"]}' >> /etc/docker/daemon.json
systemctl restart docker
systemctl daemon-reload
kubeadm join skycluster:6443 --token xxx --discovery-token-ca-cert-hash xxx

These commands automate the switching of docker on each of the worker nodes to use systemd, then we use kubeadm to join the node to the cluster. With this command run on each worker node we now have a fully working cluster, which we can verify with kubectl get nodes:

NAME STATUS ROLES AGE VERSION
master01 Ready control-plane,master 24h v1.22.0
master02 Ready control-plane,master 16m v1.22.0
master03 Ready control-plane,master 32m v1.22.0
worker01 Ready 4m12s v1.22.0
worker02 Ready 2m8s v1.22.0
worker03 Ready 101s v1.22.0

Lets check the health of cilium using the awesome cilium cli. To install, simply download the release for your OS and copy the binary to your path. With that in place we can run cilium status which gives us the following output:

We can see that cilium is healthy and we have 6/6 cilium containers running (one on each node). We now have a fully working Kubernetes cluster ready for action.

Creating a virtual 6 node HA Kubernetes cluster with Cilium using VMware – part 1

I decided to recommission decent spec desktop (16 core – Threadripper 1950x, 32GB RAM) into a server of VMs. With the first project to be to build out a HA Kubernetes cluster. Why? When you can get a Kubernetes cluster in a click from a cloud vendor? For fun and to learn! This blog series will take you through the journey of building out the cluster.

To start on the project I decided to use VMware’s ESXI to host the VMs. For those who don’t know ESXI is an OS that only exists to allow you to serve VMs. If you want the full blown product it is very expensive but luckily you can get it for FREE from VMware, simply by registering for a free license and added once installed. There are a couple of limitations on the free license but none that will get in the way of most hobbiests. The limitations are you can only have 2 real CPUs in your VMs and as most people are going to put this on a single machine that won’t matter and the way backups are of the whole fleet of VMs is restricted, again this is not a concern for me. So for free this is a pretty good deal.

To install, I simply downloaded the x86 64bit ISO and put it on a USB drive using BalenaEtcher. Then simply boot from the USB drive and hit next a few times through the ESXI install. Once installed a web url is displayed showing the IP address of the machine, that is how you interact with ESXI. The UI is a bit 1990s but remember we are talking about a free license here!

For the cluster VMs I decided to use Ubuntu 20.04 LTS server edition. For my use case I don’t need the added bloat that comes with desktop Ubuntu. I created a new VM and attached the Ubuntu ISO and ran through the installer. What is very cool is the default way ESXI sets up the VMs is that it makes them available on the main network. So for example in my home network I’m using 192.168.1.1/24 and the VMs get an IP address in that CIDR. That means that from another machine when you SSH into the VM you don’t even know that its a VM, you can treat it like any other machine on the network. ESXI setups up a virtual switch on routes traffic through the host machine to the VM automatically.

I named my first machine master01 as for my HA Kubernetes cluster I’m going to build out 3 master nodes to run the control plane and 3 worker nodes. Once the first machine is up and running there is a really easy way to create the other 5. Simply shutdown the VM and then clone it, this Youtube video explains how to clone a VM so I’m not going to repeat the instructions here.

Once all 6 machines have been created, there are a few things we need to do to tidy them up. Firstly, I wanted them all to have sequential IPs in my network so I would remember where they were. To do this we simply need to edit the file /etc/netplan/00-installer-config.yaml to something like the following:

network:
  ethernets:
    ens160:
      dhcp4: false
      addresses: [192.168.1.202/24]
      gateway4: 192.168.1.1
      nameservers:
              addresses: [8.8.8.8,8.8.4.4]
  version: 2

In the above configuration we are setting the static IP of the machine to 192.168.1.202 with a default gateway of 192.168.1.1 to make this config apply run sudo netplan apply. The next thing we need to do is rename the machines, as currently they all have the same hostname as the copied VM. To do this first we update the hostname in /etc/hostname and then update any reference to the old machine name in /etc/hosts to the new one. With those changes in place we need to reboot to make them apply.

Lastly to make the machines easier to work with, I setup the /etc/hosts file on my laptop to the following:

192.168.1.200 master01
192.168.1.201 master02
192.168.1.202 master03
192.168.1.203 worker01
192.168.1.204 worker02
192.168.1.205 worker03

This means that we can SSH into each machine using a friendly name rather than the IP address.

And with that we now have our 6 machines ready to go, in part 2 we will get on to installing Kubernetes…

Terraform Beginner to Master Book

A while ago I produced a Terraform course for INE. I really enjoyed the experience but there were a few things that left me frustrated. Such as the course was quite expensive (I couldn’t set the price) which set a high barrier to entry for most people. It was also quite hard to update the course due to the way the course was structured a small change would have been quite a lot of rework.

With the release of Terraform 0.12 where Hashicorp made a number of breaking changes, I wanted to release more content to help people get started with Terraform and learn why they should use it. I started to research into writing my own book and that’s when I found Leanpub.com. Leanpub is an amazing site where you can self publish your own book from markdown. It allows you to automate the flow of publishing a book from GitHub (great for developers) and the best part is that it allows you to release a book before it is finished and start getting feedback. Then as you write more and more of the book you can keep releasing new versions. The model is kind of like beta software. You can get feedback from your readers early and then take that feedback into the book to fix typos, improve chapters and even shape upcoming content.

My book Terraform: From Beginner to Master has been in the works for a couple of months. Available now on leanpub. You can choose to pay what you want (down to a minimum) for the book plus you are protected by leanpub’s refund policy. Where leanpub will give you your money back with no arguments within the first 45 days. This means you can try books out on Leanpub virtually risk-free. You can also download them into a format compatible with your favorite eReader.

The book takes you from not even knowing what Terraform is, to explaining the business case as to why you should use it, all the way through from giving you a solid understanding of how complex Terraform projects work using AWS for real-world examples. If you purchase the book and have any feedback it would be greatly welcomed.

Learn Terraform with AWS

A few months ago I did a Terraform course for ine.com.  The course guides you from knowing nothing about Terraform to how to use Terraform to manage your AWS infrastructure.  The course focuses entirely on using Terraform with AWS to give it some focus but you can apply the same techniques to use Terraform with any provider.

If you are interested in learning Terraform then please check out my course.  If you have bought the course and have any questions then please feel free to contact me.

Custom JSON serialisation in golang to fix a Kong bug

I wrote an open source terraform provider for kong towards the end of last year.  The provider gave me a chance to write some go code and also the chance to do an open source project.  It seems to be pretty popular which is cool as it is nice to have written something that people are finding useful.

Recently someone logged a really detailed (thanks) github issue against a bug they found with the terraform provider.  It turned out the cause of this bug was the way that Kong was returning JSON objects when one of the fields was an empty list.  If a list property on the kong API object is set to [] then instead of returning [] Kong returns {}.

I wanted to fix this in a generic way for all list fields on the API object and also fix it for every operation on API (GET, POST, PUT etc).  Ie the ideal fix would see the code only being needed in a single place.  So to fix this I first wrote a test (test first of course).  As an aside the nice thing about the gokong project (and the kong terraform provider) are tested using docker containers so they are actually against the real Kong (no mocks anywhere to be seen!!!).

I quickly realised that I could write my own deserialise JSON method by simply implementing the UnmarshalJSON interface.  So my first stab at this was:

func (a *Api) UnmarshalJSON(data []byte) error {
fixedJson := strings.Replace(string(data), `"hosts":{}`, `"hosts":[]`, 1)
fixedJson = strings.Replace(fixedJson, `"uris":{}`, `"uris":[]`, 1)
fixedJson = strings.Replace(fixedJson, `"methods":{}`, `"methods":[]`, 1)
return json.Unmarshal([]byte(fixedJson), &a)
}

view raw
deserialisation.go
hosted with ❤ by GitHub

The idea behind this code was to simply replace the erroneous {} with [] in the JSON and then call the normal json.Unmarshal func.  Unfortunately this code goes into an infinite loop as you keep calling yourself.  Which was a shame as this code was nice and concise and would’ve done everything I wanted as it delegated the hard work of actually doing the deserialisation over to the normal json.Unmarshal func.

Then I found out that I can do what I want to do if I use a type alias.  Here is the corrected code:

func (a *Api) UnmarshalJSON(data []byte) error {
fixedJson := strings.Replace(string(data), `"hosts":{}`, `"hosts":[]`, 1)
fixedJson = strings.Replace(fixedJson, `"uris":{}`, `"uris":[]`, 1)
fixedJson = strings.Replace(fixedJson, `"methods":{}`, `"methods":[]`, 1)
type Alias Api
aux := &struct {
*Alias
}{
Alias: (*Alias)(a),
}
return json.Unmarshal([]byte(fixedJson), &aux)
}

view raw
deserialisation.go
hosted with ❤ by GitHub

The trick is to use a type alias to alias my Api type over to Alias (note this is a private declaration so is only scoped to the method).  Then I use an anonymous struct that uses an embedded type to inherit all of the fields from Api.  I then call json.Unmarshal on my anonymous struct with the fixed JSON.  Because my type Alias has a single embedded type which is Api it will serialise all of the Api fields to JSON using the fixed JSON.

I think this is a pretty neat solution as it means that anywhere in my whole library that deserialises the Api type will automatically use this code just by virtue of the fact that it implements the UnmarshalJSON interface.

This is a handy thing to know for your toolbox when you need to get involved in the JSON deserialisation pipeline in go.  Happy coding!

Terraform provider kong – fully tested using docker containers

A couple of posts ago I talked about how you could achieve full stack testing in Go using Docker containers. I have just finished the first version of terraform provider kong, a terraform provider for Kong that is built on top of gokong.

At the moment at Form3 we configure Kong using a custom Ruby script that runs on our Kong container and configures Kong upon boot up. Whilst this works there are a number of problems with this approach:

  • It is hard to know what the end state of Kong is as you end up with a whole bunch of Ruby code that is hard to follow
  • If you want to migrate the container over time it can be quite tricky as its hard to clean up APIs, consumers and other Kong resources as you delete them (the database Kong uses is long lasting)

By writing a custom terraform provider for Kong all of these problems go away.  As terraform uses a declarative language you simply declare the state that you want the thing you are configuring to be in (in this case Kong) and terraform looks at the current state and automatically works out how to get there.  This makes everything so much simpler.  A good example is if you remove an API in our current world you have to remember to write a line of Ruby to delete the old API.  When using terraform you simply remove the API from your configuration and terraform deletes it for you.

Building upon the way that I wrote gokong I wrote terraform provider kong in the same vein, ie full stack testing using real components and no mocks! This means that the acceptance tests in terraform provider kong spin up a real Kong container and apply terraform plans to it, check it is in the expected state and then destroy it again. Testing in this way gives you ultimate confidence that the code works!

To automate releases of new versions of terraform provider kong I have used the excellent goreleaser. Goreleaser automates building your code for multiple platforms, zipping/tar balling it up and uploading it to github. It is so easy to use, kudos to the authors. To setup you simply need to create a goreleaser.yml file:

builds:
binary: terraform-provider-kong
goos:
darwin
linux
windows
goarch:
amd64
archive:
format: zip

view raw
goreleaser.yml
hosted with ❤ by GitHub

My file specifies the binary and builds for mac (Darwin), Linux and Windows.  The last part is to define a task to run goreleaser as part of your build.  Goreleaser will only run when the build is running as a tag build, therefore you can create a task and run it on every build and goreleaser will only build a release for a tagged build, pretty smart.  My build task to create a release looks like:

release:
go get github.com/goreleaser/goreleaser; \
goreleaser; \

view raw
Makefile
hosted with ❤ by GitHub

I run the release task as part of every travis build.  You also need to create a github token which is used by goreleaser to upload the release to your github account.  To generate a github token go to https://github.com/settings/tokens/new.  Then simply set it as a secure environment variable in your .travis.yml and goreleaser will pick it up for you.

Then when you want to create a release you can simply do:

git tag -a v0.1.0 -m "My release"
git push origin v0.1.0

view raw
make release
hosted with ❤ by GitHub

Which will build a new v0.1.0 release and automatically create the binaries and upload the release to Github.  Pretty smart!

The last thing I wanted to go through was that I’m taking advantage of the build matrix feature of travis to run the tests against multiple versions of kong. The following section of my travis.yml file:

env:
matrix:
KONG_VERSION=0.11 TF_ACC=1
KONG_VERSION=0.11.1 TF_ACC=1
KONG_VERSION=0.11.2 TF_ACC=1

view raw
.travis.yml
hosted with ❤ by GitHub

Travis will automatically run a build for each line under the matrix definition.  Each with the values on the line.  I pass the KONG_VERSION parameter all of the way through to the code that pulls the Kong container for Docker so it will pull the version that is specified here.  Hence I can easily update this build to run against many versions of Kong.  As new versions are released I can simply add a line to the build matrix and it will automatically be tested on travis.  This is a really powerful feature, being able to run your tests against multiple versions of a target piece of software is immense!

If you are interested terraform provider kong or gokong I would love to hear your feedback and get your ideas and contributions.

 

 

Routing outbound webhook calls through Nginx on AWS

At form3 I have recently solved an interesting problem to route our outbound web hook calls via Nginx.  Before I dive into how I achieved this I want to set the landscape as to why you would want to do this.

For web hook calls your customer has to open up a https endpoint on the internet, so you really want to present a client certificate so the person receiving your call can verify it really is you calling them.  You could add the client auth code directly inside your service (our service is written in Java).  Although this becomes a pain for a number of reasons.  Firstly the code for doing this in Java is pretty cumbersome.  Secondly it means you now need to deploy this client certificate inside your application or load it in somehow, which shouldn’t really be the concern of the application.

The solution to all of this is to route the outbound call through Nginx and let Nginx take care of the client auth.  This is much neater as it means the application doesn’t need to worry about any certificates, instead it can simply post to Nginx and let Nginx do all of the heavy TLS lifting.

The above design is what we end up with.  Note we route everything using Linkerd to Linkerd.  If you want to read more about how we set this up on AWS ECS you can read my blog post on it. If the web hook url for a customer was https://kevinholditch.co.uk/callback then the notification api will post this through Linkerd to Nginx using the url https://localhost:4140/notification-proxy/?address=http%3A%2F%2Fkevinholditch.co.uk%2Fcallback. Note we have to url encode the callback address. We then setup nginx to proxy the request onto the address provided in the address query string parameter. This is done using the following Nginx server config:

server {
server_name notification_proxy;
listen 8117;
location / {
resolver 127.0.0.11 ipv6=off;
set_by_lua_block $proxy_addr {
return ngx.unescape_uri(ngx.var.arg_address)
}
proxy_pass $proxy_addr;
proxy_ssl_server_name on;
proxy_ssl_certificate /etc/nginx/certs/client.cert.pem;
proxy_ssl_certificate_key /etc/nginx/certs/client.key.pem;
}
}

view raw
nginx proxy.conf
hosted with ❤ by GitHub

The set_by_lua block is a piece of lua code that url decodes the address query string parameter.  the ngx.var.arg_address relates to the address query string parameter.  Note that you can replace address with anything to read any parameter from the query string.  This address is then used on line 12 as the proxy_pass parameter.  Lines 13 and 14 do the client auth signing.

The last trick to making all of this work was working out that the IP address on line 6 needs to change based on where you are running this.  This line is basically telling Nginx which DNS server to use to look up the proxy address.  Locally on my machine I need to use 127.0.0.11 (the docker DNS server) however on AWS this address changes.  The last part of the jigsaw was working out how to find this dynamically.  You can do that by issuing the following command cat /etc/resolv.conf | grep nameserver | cut -d' ' -f2-. Once I had cracked that, I then updated the Nginx config file to be a template:

server {
server_name notification_proxy;
listen 8117;
location / {
resolver $DNS_ADDR ipv6=off;
set_by_lua_block $proxy_addr {
return ngx.unescape_uri(ngx.var.arg_address)
}
proxy_pass $proxy_addr;
proxy_ssl_server_name on;
proxy_ssl_certificate /etc/nginx/certs/client.cert.pem;
proxy_ssl_certificate_key /etc/nginx/certs/client.key.pem;
}
}

view raw
nginx.template
hosted with ❤ by GitHub

We then use something like the following startup script on our docker container that derives from openresty/openresty :

#!/usr/local/bin/dumb-init /bin/bash
set -e
export DNS_ADDR=`cat /etc/resolv.conf | grep nameserver | cut -d' ' -f2-`
envsubst '$DNS_ADDR' < /etc/nginx/nginx.template > /usr/local/openresty/nginx/conf/nginx.conf
/usr/local/openresty/bin/openresty -g "daemon off;"

view raw
envsubst nginx.sh
hosted with ❤ by GitHub

The clever part here is that we are setting an Environment variable on the fly which is the IP address of the DNS server using the command we worked out above.  Then we are using envsubst to substitute any environment variables in our template config file and writing our templated file out to disk.  So when Nginx starts the IP address will be correct.  This will all happen as the container starts so wherever the container is running (locally or on AWS) it will get the correct IP address and work.

 

 

 

 

Full stack testing in golang with docker containers

I like to practice the approach of full stack component testing where the guiding principle is that you test the entire component from as high level as possible and only stub out third party dependencies (read other APIs) or something that isn’t easily available as a docker container.  I have recently started a golang project to write a client for kong and I thought this would be a good opportunity to use this testing strategy in a golang project.

I love Go but the one thing I don’t like so much about it is the approach that most people seem to be taking to testing. A lot of tests are at the method level where your tests end up being tightly coupled to your implementation. This is a bad thing. You know a test is not very good when if you have to change your test when you change your implementation. The reason this is bad is because firstly as you are changing your test the same time as you are changing your code once you have finished you have know way of knowing if the new implementation still works as the test has changed. It also restricts how much you can get in and edit the implementation as you have constantly having to update the way the tests mock out everything. By testing at the top component level the tests do not care about the implementation and the code runs with real components so it works how it will in the real world. By writing tests in this way I have seen a lot less defects and have never had to manually debug something.

Anyway back to the subject of the full stack testing approach in Go. To start I used the excellent dockertest project which gives you a great API to start and stop docker containers. I then took advantage of the fact that in a Go project there is a special test function that gets called for every test run:

func TestMain(m *testing.M) {
// setup
code := m.Run()
// teardown
os.Exit(code)
}

view raw
TestMain.go
hosted with ❤ by GitHub

In the above method you can do your test setup code where I have placed the //setup comment and your teardown code where I have placed the //teardown comment.  The code that gets returned by m.Run() is the exit code from the test run.  Go sets this to non zero if the test run fails so you need to exit with this code so your build will fail if your test run fails.  Now using this method I can start the kong docker container, run the tests and then stop the kong docker container.  Here is the full TestMain code at time of writing:y

func TestMain(m *testing.M) {
testContext := containers.StartKong(GetEnvVarOrDefault("KONG_VERSION", defaultKongVersion))
err := os.Setenv(EnvKongAdminHostAddress, testContext.KongHostAddress)
if err != nil {
log.Fatalf("Could not set kong host address env variable: %v", err)
}
code := m.Run()
containers.StopKong(testContext)
os.Exit(code)
}

view raw
TestMain.go
hosted with ❤ by GitHub

I have wrapped the starting and stopping of the kong container in a method to abstract away the detail.  Notice how the StartKong method takes the Kong version as a parameter.  It gets the Kong version either from the environment variable KONG_VERSION or if that environment variable is not set then it uses the default Kong version which I set to the latest version 0.11 at time of writing.  The cool thing about this is that if I want to run my tests against a different version of Kong I can do that easily by changing this value.  The really cool thing about this is that I can run the build against multiple versions of Kong on travis-ci by taking advantage of the env matrix feature.  If you list multiple values for an environment variable in travis-ci then travis-ci will automatically run a build for each entry.  This means it is really easy to run the whole test pack against multiple versions of Kong which is pretty neat.  You can check out the gokong build to see this in action!

The one part you may be wondering from all of this is how do I get the url of the container that Kong is running on for use in my tests.  That is done by setting an environment variable KONG_ADMIN_ADDR.  The client uses that environment variable if set and if not then it defaults to localhost:8001.

With all of this in place it allows me to test the client by hitting a real running Kong in a container, no mocks in sight!  How cool is that.  Plus I can run against any version of Kong that is built as a docker container with a flick of a switch!

Here is an example of what a test looks like so you can get a feel:

func Test_ApisGetById(t *testing.T) {
apiRequest := &ApiRequest{
Name: "test-" + uuid.NewV4().String(),
Hosts: []string{"example.com"},
Uris: []string{"/example"},
Methods: []string{"GET", "POST"},
UpstreamUrl: "http://localhost:4140/testservice&quot;,
StripUri: true,
PreserveHost: true,
Retries: 3,
UpstreamConnectTimeout: 1000,
UpstreamSendTimeout: 2000,
UpstreamReadTimeout: 3000,
HttpsOnly: true,
HttpIfTerminated: true,
}
apiClient := NewClient(NewDefaultConfig()).Apis()
createdApi, err := apiClient.Create(apiRequest)
assert.Nil(t, err)
assert.NotNil(t, createdApi)
result, err := apiClient.GetById(createdApi.Id)
assert.Equal(t, createdApi, result)
}

view raw
ExampleTest.go
hosted with ❤ by GitHub

I think that is really clean and readable.  All of the code that boots up and tears down Kong is out of sight and you can just concentrate on the test.  Again with no mocks around 🙂

If you want to see the rest of the code or help contribute to my gokong project that would be great.  I look forward to any feedback you have on this.

Download github releases from private repos in bash and docker builds

I wanted to add a short post to describe how to automate the downloading of releases from private github repositories using a bash script or in a Docker build.

To start you need to create a Github token that has access to your repository. Once you have your token you can use the following bash script filling in the relevant details:

#!/usr/bin/env bash
set -e
GITHUB_TOKEN=<my_token>
REPO="kevholditch/demo"
FILE="demo_0.0.1_linux_amd64.tar.gz"
VERSION="v0.0.1"
wget -q –auth-no-challenge –header='Accept:application/octet-stream' \
https://$GITHUB_TOKEN:@api.github.com/repos/$REPO/releases/assets/`curl -H "Authorization: token $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3.raw" -s https://api.github.com/repos/$REPO/releases | jq ". | map(select(.tag_name == \"$VERSION\"))[0].assets | map(select(.name == \"$FILE\"))[0].id"` \
-O /tmp/$FILE

view raw
download.sh
hosted with ❤ by GitHub

This script will download your release to the /tmp/ directory, from there you can untar and move it etc.

To take this a stage further if you want to download your release as part of a docker build you can use the Dockerfile snippet below to give you a starting point:

ARG GITHUB_TOKEN
ENV REPO "kevholditch/demo"
ENV FILE "demo_0.0.1_linux_amd64.tar.gz"
ENV VERSION "v0.0.1"
wget -q –auth-no-challenge –header='Accept:application/octet-stream' \
https://$GITHUB_TOKEN:@api.github.com/repos/$REPO/releases/assets/`curl -H "Authorization: token $GITHUB_TOKEN" -H "Accept: application/vnd.github.v3.raw" -s https://api.github.com/repos/$REPO/releases | jq ". | map(select(.tag_name == \"$VERSION\"))[0].assets | map(select(.name == \"$FILE\"))[0].id"` \
-O /tmp/$FILE

view raw
Dockerfile
hosted with ❤ by GitHub

The trick here is that we are passing in the GITHUB_TOKEN using a docker build arg.  This allows you to build the container using travis by setting a secure ENV variable and then passing that into your docker build script as the docker arg parameter.  For example:

if [ "$GITHUB_TOKEN" = "" ]; then
echo "you need to create a github token with access to kevholditch/demo to run this build see https://github.com/settings/tokens/new"
exit -1
fi
docker build –build-arg "GITHUB_TOKEN=$GITHUB_TOKEN" -t kevholditch/demo .

In the script above we check that the GITHUB_TOKEN env variable is set and if it isn’t then we terminate with a non zero exit code, halting the build.  This then allows developers to run the build with their own GITHUB_TOKEN and you can run this build on travis by setting a secure env variable (or the equivalent in the builder server you are using).

 

Running SNS & SQS locally in docker containers supporting fan out

On AWS using SNS to fan out to multiple SQS queues is a common scenario. SNS fan out means creating a SQS queue for each consumer of an SNS message and subscribing each SQS queue to the SNS topic. This means when a message is sent to the SNS topic a copy of the message arrives in each consumer’s queue. It gives you multicast messaging and the ability to consume messages at your own pace and allowing you to not be online when a notification occurs.

I wanted to use SNS fan out in one of our components and as our testing model tests at the component level this means I needed to get a SNS SQS solution working in docker. Step forward ElasticMq and SNS.

Inside the example folder inside the SNS repository was the following docker compose file as an example to get SNS and SQS containers working together in fan out mode:

services:
sns:
image: s12v/sns
ports:
"9911:9911"
volumes:
./config:/etc/sns
depends_on:
sqs
sqs:
image: s12v/elasticmq
ports:
"9324:9324"

view raw
dockercompose.yml
hosted with ❤ by GitHub

When started with the docker-compose up command the containers span up ok. The problem came when publishing a message to the sns topic using the following command:

aws sns publish –topic-arn arn:aws:sns:us-east-1:1465414804035:test1 –endpoint-url http://localhost:9911 –message "hello"

view raw
bashcommand
hosted with ❤ by GitHub

The error received was:

sns_1 | com.amazonaws.http.AmazonHttpClient executeHelper
sns_1 | INFO: Unable to execute HTTP request: Connection refused (Connection refused)
sns_1 | java.net.ConnectException: Connection refused (Connection refused)

view raw
error
hosted with ❤ by GitHub

So not a great start. For some reason the SNS container could not send the message on to the sqs container. Time to debug why….

The first step to working out why was going onto the SNS container and sending a message to the SQS container. This tells us whether or not the containers can talk to each other. When running this test the message got sent to the SQS queue successfully.

The next stage in testing was to look at the code for the SNS library to see if I could work out whether it logged out the SQS queue name it was trying to send it to. Upon inspection I realised that the SNS library was using Apache Camel to connect to SQS. I noticed that in the source code for Apache Camel it does log out a lot more information when the log level is set to trace. Going back to the SNS library there is the following logback.xml file:

<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} – %msg%n</pattern>
</encoder>
</appender>
<logger name="org.apache.camel" level="INFO"/>
<root level="DEBUG">
<appender-ref ref="STDOUT" />
</root>
</configuration>

view raw
logbackconfig.xml
hosted with ❤ by GitHub

I simply cloned the SNS repository from github, updated the level from DEBUG to TRACE and then recompiled the SNS code using the command sbt assembly. Once this finished it was simply a matter of copying the new jar into the root of the folder where I had cloned the SNS repo and updating the Dockerfile to use my newly compiled jar. The last change needed was updating the docker-compose.yml file in the example directory to:

services:
sns:
build: ../
ports:
"9911:9911"
volumes:
./config:/etc/sns
depends_on:
sqs
sqs:
image: s12v/elasticmq
ports:
"9324:9324"

view raw
dockercompose.yml
hosted with ❤ by GitHub

The important line being that we are now using the local SNS container not the one from Github. To build this I simply ran docker-compose build and then docker-compose up. This time the SNS container started logging with trace logging. When I sent a message to SNS I got a much more informative error message:

TRACE o.a.c.component.aws.sqs.SqsEndpoint – Queue available at 'http://localhost:9324/queue/queue1&#39;.

view raw
basherror
hosted with ❤ by GitHub

Its clear now that the url for the queue is being set incorrectly. It should be http://sqs:9324/queue/queue1 as sqs is the name of the container, the reason we were getting connection refused before was that the messages were being sent to the host. To work out how to change this we had to dig through the Apache Camel code to work out how it configures its queue urls. We found that it queries sqs using the list queues command. Running the same list queues command on our running container revealed that the queues were being bound to localhost and not sqs.

To change this we simply had to use the following config file for elasticmq:

include classpath("application.conf")
node-address {
host = sqs
}
queues {
queue1 {}
}

view raw
queueconf.yml
hosted with ❤ by GitHub

The key line being “host = sqs”. The last part to making everything work was updating the docker-compose.yml file to include the config file for elastic mq:

services:
sns:
image: s12v/sns
ports:
"9911:9911"
volumes:
./config/db.json:/etc/sns/db.json
depends_on:
sqs
sqs:
image: s12v/elasticmq
ports:
"9324:9324"
volumes:
./config/elasticmq.conf:/etc/elasticmq/elasticmq.conf

view raw
dockercompose.yml
hosted with ❤ by GitHub

Once I tore down the containers and started them up again I ran the list queues command, this time the queues came back bound to sqs: http://sqs:9324/queue/queue1. I then ran the command to send the a message to SNS and could see it successfully get sent to SQS by receiving it with the following command:

aws sqs receive-message –queue-url http://localhost:9324/queue/queue1 –region elasticmq –endpoint-url http://localhost:9324 –no-verify-ssl –no-sign-request –attribute-names All –message-attribute-names All

view raw
aws command
hosted with ❤ by GitHub

And there we have it, a working SNS fan out to SQS using docker containers. The author of the SNS container has accepted a PR from my colleague sam-io to update the example docker-compose.yml with the fixes described here. Meaning that you can simply clone the SNS repository from github cd into the example directory and run docker-compose up and everything should work. A big thanks to the open source community and people like Sergey Novikov for providing such great tooling. Its great to be able to give something back!