The GOV.UK Infrastructure team maintain and improve the systems that create GOV.UK. This includes the GOV.UK website as well as the tools used by publishers and developers.
Earlier this year we ran a discovery and alpha into hosting and deploying GOV.UK applications using a modern containerised infrastructure.
Our team comprised 4 WebOps Engineers, Laura, Dean, Ana, and Sam, along with product manager Rob, technical architect Stephen, and delivery manager Paul (that’s me).
Our job is to make sure that GOV.UK is stable, resilient, and secure. This includes:
- performing essential upgrades
- having comprehensive monitoring
- being alerted if anything isn’t working
- investigating issues as expected
The infrastructure team is not responsible for supporting GOV.UK applications.
We work to make it easier for GOV.UK product teams to host, support, and improve their applications.
Where we are now
Our current infrastructure is based upon many cloud hosted virtual machines (VMs).
Our integration, staging and production environments each comprise of around 130 VMs. Some of these VMs run more than one application.
Each VM requires a separate instance of an operating system (typically Ubuntu). Having so many instances leads to large maintenance costs.
We cannot automatically increase or decrease our VM capacity to cope with demand, which is also wasteful.
When we deploy a new version of an application we have to pull and compile code from an external code repository for each environment. Instead we want to be able to deploy once to a single ‘artifact’ that can be pushed to other environments.
Containerisation is based on a metaphor. Shipping containers are a standard size. This makes them easier to handle than if each were different. The contents may vary, but one container is like another.
For web operations, the contents are the applications we want to run, and the cloud servers are like ships. If we put our applications in containers they can be added, arranged, and removed more easily.
Technology has advanced since GOV.UK’s launch in 2012. It should now be easier for us to build and manage containers than lots of VMs.
Our hypothesis was that containerisation technology would help:
- speed up code deployment
- reduce maintenance costs
- balance supply and demand
- ensure code is identical on each environment
What we did
The first step was to get one of our basic applications running in a container on a VM within our current infrastructure.
We started with an application with few dependencies so that we could get up and running quickly. We chose the Release app, which provides useful information about the currently deployed versions of our applications.
We used Docker as the container technology. It’s well supported and widely used in the industry. We also updated our Jenkins build pipeline so that the Release application could be be pushed to a Docker container as well as our traditional VM.
This was a good start, but we knew that this wouldn’t guarantee zero downtime for deployment because there would still be a brief gap for the new container to spin up.
Our next step was to try a platform designed to simplify the deployment and management of containerised applications. We chose Kubernetes.
Kubernetes supports the ability to bring up ‘clusters’ of containers. A cluster is a group of containers working together, where individual containers can be spun up or down as needed. This provides the ability to scale capacity, which is more efficient and has the potential to reduce hosting costs.
To continue our experimentation we adapted three more GOV.UK applications to run in containers. These could then be orchestrated by Kubernetes: Government Frontend, Content Store, and Router. They form a cross-section of GOV.UK applications.
This gives us experience of the benefits and drawbacks if we decide to go all-in on containerisation in future.
It also showed some of the changes we would need to make to our code to make it work better as independent applications that can scale as needed.
We wrote Terraform scripts to help run the Kubernetes clusters. The ‘infrastructure as code’ principle makes it quick and reliable to bring up clusters of containers with minimal manual intervention. The configuration for the containers themselves is handled within Kubernetes.
When we finished the alpha test we removed the clusters but we can easily bring them back.
We have created a GitHub repository that contains technical notes on what we did, decisions we made, and what we learned along the way.
For the remainder of 2017/18, we’re beginning work on moving much of GOV.UK to the GOV.UK Platform as a Service (PaaS) product. GOV.UK PaaS is designed to deploy and host applications from across government. It’s already used by the Digital Marketplace, GOV.UK Notify, and other services.
The experiments we’ve run helped us learn about how GOV.UK applications could benefit from containerisation technology in future.
GOV.UK PaaS already provides support for Docker, so that’s an option we could explore further. GOV.UK PaaS is also able to build and manage containers itself, so we may only need to use Docker or Terraform in special cases.
Our ways of working give us the freedom to test things out while keeping GOV.UK up and running. We favour incremental improvements over ‘big bang’ migrations, as our priority is always to ensure that GOV.UK is available to the public and our publishing applications are available for content editors across government.
Paul Heron is a delivery manager on GOV.UK. You can follow him on Twitter.