This post outlines 2 recent production issues on GOV.UK and how they were resolved. We’ve blogged in the past about what happens when things go wrong on GOV.UK and how we classify and prioritise incidents.
What happened
From around 2:50pm until 5:30pm on Wednesday 30 June, and from 10:10am until 12:30pm on Wednesday 12 July, users were not able to submit or process any licences. The root cause for this was an outage at UKCloud, one of our hosting providers, which among other things brought down the servers that host the licensing application. These were both severity 1 incidents.
What users saw
Users attempting to apply for a new licence via GOV.UK would have seen a warning message, informing them that the licence in question is not available from the website.
Users attempting to sign in to the licensing application would have seen an error message.
How we responded
We contacted our hosting provider to let them know about the problems, sent out a message via Basecamp to the government content community and updated our GOV.UK status page. We also kept in close contact with the GDS Verify team, who were also affected by the outage and were handling their own incident independently.
Once UKCloud had identified the root cause on their end, we kept up communications with them until our problems were solved, since not all hosts recovered immediately.
What we’re doing to prevent this from happening again
We will work to identify relevant stakeholders and channels so we can more clearly communicate with affected parties in future.
We’ll also be looking into making UKCloud’s portal easier to access for our tech support team, so it’s easier to contact them if there’s another outage.
We’ll also be documenting how we would perform builds and deployments of the licensing application on the current UKCloud infrastructure.
David Basalla is a developer on GOV.UK.