https://insidegovuk.blog.gov.uk/2019/02/22/data-incident-roundup-february-2018/

Data incident roundup - February 2018

GOV.UK Incident Report

We’ve posted before about what happens when things go wrong on GOV.UK, and how we classify and prioritise incidents.

Every incident teaches us something new about our technology or the way we communicate with each other. It also gives us the opportunity to improve our incident management process so we can minimise disruption to users in the future.

In May 2016 we committed to blogging about every severity 1 or severity 2 incident, as well as severity 3 incidents if they’re particularly interesting.

This post is a roundup of 2 incidents that GOV.UK’s data analytics encountered in February 2018. These were severity 3 incidents - we’re blogging about them to show how we responded to cases affecting our analytics data.

12 February 2018 - error in Google Analytics page tracking

What happened

Between 12 February and 20 February 2018 the data on Google Analytics that identifies pages viewed on GOV.UK contained full URLs, including the host name. The data should just contain the short form URI. For example, instead of /vehicle-tax, the data contained https://www.gov.uk/vehicle-tax.

What users saw

External users of the site were not affected.

Internal government users of Google Analytics data saw the different version of the ‘page’ dimension in their reports. This would have been confusing. Some users might have seen a drop in views for the pages they were reporting on if their report had been configured in a way which excluded the long version of the page address.

What caused this

Work to clean the page information being sent to Google Analytics (by processing the URI to remove irrelevant extra data) had accidentally resulted in the processed data including the host name at the start.

How we responded

The issue only caused an obvious problem in certain configurations of analytics reports. Because of this, it was 3 days before someone spotted the issue. The problem was fixed the next day. Internal and departmental users were notified.

Steps taken to prevent this from happening again

The process for checking changes to tracking is being improved, including liaison with the analysts.

We are making changes to the way we process data from our integration and staging servers to make it easier to monitor changes.

We are developing automated processes for checking the quality of our analytics data.

20 February 2018 - error in Google Analytics metadata tracking

What happened

Between 20 February and 8 March 2018, the values for a subset of the metadata about page views and events sent to Google Analytics and stored as ‘custom dimensions’ were set as [object Object] instead of the correct value. Custom dimensions are attributes of page visits that have been defined by the Google Analytics user, who is often a data analyst.

What users saw

External users of the site were not affected.

The metrics of departmental Google Analytics users were affected for 17 days. For a subset of pageviews and events, all custom dimensions were recorded by Google Analytics as ‘[object Object]’.

In particular, departmental users rely on the ‘Organisations’ custom dimension to filter their reports to their own organisation’s content. From 20 February to 7 March, 4,388,684 page views were wrongly tracked as ‘[object Object]’ instead of their correct organisations - 3% of the total 143,302,095 pageviews that had an organisation custom dimension.

What caused this

Due to the way Google Analytics code is injected into GOV.UK pages, it’s often not included within our development environment. In this instance, this resulted in the failure of code that relies on access to Google Analytics libraries.

We currently include Google Analytics code in 2 separate places within GOV.UK’s codebase, and there were subtle differences between the 2 entries. When attempting to make these consistent, a bug was introduced which incorrectly set the values of custom dimensions to ‘[object Object]’ in one of the locations.

How we responded

The issue was noticed by internal analysts within 3 days, but was believed to affect only one part of the site and a partial fix was deployed. On 8 March a departmental user pointed out the issue was affecting other areas and the issue was fixed the same day.

It took so long to notice the extent of the problem because there are no automated tests on Google Analytics code changes and no formal process for checking changes. Changes to the tracking have become more frequent since 2017, but no process had been established
because such changes used to be very rare.

The issue was not obvious without formal checking because it was intermittent (affecting about 3% of hits). Because correct data was mostly being sent, the issue did not stand out in reports at first.

Steps taken to prevent this from happening again

We are adding automated testing on analytics code changes.

We’re recommending analysts should notify tracking issues via established channels to our second line support.

The most important lesson from this incident, and the associated one from 12 February 2018, is that we must recognise changes to the analytics code are now much more common than they were before 2017, and so we need to establish better processes.

Subscribe to updates from this blog.

2 comments

  1. Comment by Alan Cooke posted on

    Is it 'Data incident roundup - February 2018' or 'Data incident roundup - February 2019'?

    • Replies to Alan Cooke>

      Comment by James Butler posted on

      Hi Alan. It was indeed February 2018 - the decision to write about these incidents was taken later.