We’ve posted before about what happens when things go wrong on GOV.UK, and how we classify and prioritise incidents.
What happened
A foreign travel advice alert, published on 13 December 2018, was delayed in being emailed to a number of subscribers.
What the users saw
4,300 of the 16,654 subscribers to travel advice alerts for Philippines did not receive an email after the content was changed on 13 December 2018 at 11:45am. The email would have been received some time later, but before 1:43pm.
Cause of the problem
GOV.UK use GOV.UK Notify to send the email alerts. As part of the request, we are required to provide a current timestamp in the request’s authorisation header.
Errors returned by GOV.UK Notify showed some of our requests contained an incorrect timestamp. The system clock on one of the email alert machines was out of sync therefore the requests were rejected by GOV.UK Notify.
How we responded
Before the travel advice was published, we had already observed a large number of alerts regarding errors in the sending of other email alerts to subscribers and were already investigating the issue. Approximately one third of email alerts were not being sent. We investigated the issue and discovered the cause.
Once the time was brought back into sync manually, the unsent travel advice emails were re-sent at 1:12pm.
Steps taken to prevent this happening again
We are reviewing the monitoring and alerting of clock time drift on our email alert machines. Additionally, we’ve improved the logging of email alert send errors, to speed up the investigation of any future email alert sending issues.