As a performance analyst on GOV.UK, my job is to assess how well the site is performing, and track the effects of changes we put in place. Sometimes people think this mostly involves monitoring trends. That's important, but the interesting part is using data to respond to new problems, and creating evidence to help my team make better decisions.
Behind the trends and numbers we analyse are the people who are using GOV.UK’s services. Minute by minute, hour by hour, they are trying to get things done - things that matter to them. So how do we use data to figure out what’s important to users? How do we know when we’ve made things better?
At GOV.UK, optimising content and services is especially challenging. Unlike other industries, where revenue and conversions are a pretty good marker of success, it is often hard to decide on the metrics that strongly signify success or failure. To help the teams I work with get the most out of data, I like to frame the process with 3 clear steps: choosing what to measure, writing a strong hypothesis and testing for statistical significance.
1. Choosing what to measure
Your choice of metrics forms the most critical part of any optimisation plan. It is also deceivingly difficult, and if you realise later on that a chosen metric is the wrong one, it can invalidate all your work. I recommend devoting some serious brain time (both team and individually) to pick the metrics that are the closest signifiers of success.
Getting the most from a metric is all about context: it should be tied to a problem you’ve observed and a solution you’re proposing. A metric can be a KPI, but not always. KPIs are linked to your core objectives but metrics don’t have to be. They allow you to make improvements that aren’t necessarily linked to a core business objective.
To give an example, feedback from a user research session may show that people find a landing page confusing. Here, your metric might be the percentage of people using your site search function from the landing page - we hypothesise that people use site search when they can’t find a route to what they’re looking for.
Another way of thinking about a good metric is to define a bad one. Bad metrics include those that are:
- too vague - a metric so broad that your changes can’t register an impact, or it would be impossible to prove that your change was the one that made an impact
- too granular - a metric so small and specific that it doesn’t help our goals, or that we can’t gather enough data to make our findings statistically significant
- based on vanity - metrics used to impress your manager rather than please your users
- just plain incorrect - for example, you’ve mixed up a metric’s scope, your tracking hasn’t been implemented effectively, or your assumptions about what a metric means for a user’s experience are incorrect
When working on improving family visas, we undertook a wide-ranging analysis of user journeys, search keywords (from on-site searches and through third parties like Google Trends) and user research to refine our metrics into 3 signifiers of success. These were:
- a reduction of circular journeys, where users repeatedly view the same content
- successful conversions - the percentage of users who reach the service from the ‘apply’ section of the guide
- fewer searches for ‘spouse visa’ - the guide covers this, so searches for this topic indicate confusion around the content
There are some useful resources available for helping you to choose good metrics.
2. Write your hypothesis
Once you’ve got some good metrics in place, you need to write a hypothesis.
People underestimate the importance of having a hypothesis. Without one, at best, your data will be inconsistent. At worst, it will be meaningless. Without a hypothesis we can never be sure that our changes have made the impact we wanted, because we didn’t decide on the impact we expected to see before we made the change.
A hypothesis lets you answer a question that is meaningful to your users, and forces you to take a risk. You stake your success on something which you feel will have an impact. If your change goes as expected, brilliant. If it doesn’t, nevermind - you’ve learnt something new.
You can write a good hypothesis by using the following formula:
Changing w
To x
Will lead to y
Because z
When optimising family visa guides on GOV.UK our hypothesis was:
Changing the visa guide titles from ‘remain in the UK’ and ‘join family in the UK’
To ‘apply to extend your stay in the UK’ and ‘apply from outside the UK’
Will lead to fewer users going round in circles between the guides
Because the difference between the guides is more explicit - one guide is for applicants who are abroad and the other isn’t.
Two things to remember about hypotheses:
- Try to restrict the number of metrics you use. Hypotheses should be tight - the more metrics you choose the harder it is to get a meaningful result.
- Base your hypothesis on data - something you’ve observed.
3. The big reveal
Once you’ve made a change and gathered the results, you need to make sure that your results are statistically significant. This is one of the most important elements in our role as digital data analysts.
Without testing for significance we can make the wrong decisions, resulting in lost time, effort and morale. Some basic but popular statistical significance tests include the chi-squared test, the t-test and the Z-test.
The results of changing the family visa guide titles included a drop in circular journeys of between 35% and 41%. We are 95% sure that this is the case. By using statistical significance to test how robust our findings are, we’ve managed to confirm the data’s validity.
Paying attention to the right metrics, and ensuring statistical significance, allows us to make truly evidence based decisions - or, to put it another way, better decisions. You can find out more about how we use data at GDS on our Data in government blog.
Ivan is a performance analyst on GOV.UK. You can follow him on Twitter.