Skip to main content

https://insidegovuk.blog.gov.uk/2017/11/14/using-ab-testing-to-measurably-improve-common-user-journeys/

Using A/B testing to measurably improve common user journeys

Posted by: , Posted on: - Categories: How we work

We’ve been using A/B testing across 2 teams to make measurable improvements to common user journeys on GOV.UK.

What is A/B testing?

A/B testing is a method to compare two versions of a page against each other to determine which one performs better. This removes some of the guesswork so we can make data-informed decisions to improve GOV.UK.

Search team

The search team’s mission in quarter 1 was to improve search on GOV.UK. We ran 2 A/B tests to find out if the changes we planned would actually result in a measurable improvement for users or not.

Search format weighting A/B test

GOV.UK has different formats designed to meet different user needs. One of the ways we can affect search results is to weigh formats differently.

After analysing the data, we knew there was a weighting problem with ‘FOI releases’ and ‘service assessment reports’ in search. We think this is because they often have similar titles to services and guidance content on GOV.UK. For example, when a user searched for ‘NHS pension’ on GOV.UK, the top result was the assessment report for a service called ‘Your NHS pension’ distracting people from the content they were actually looking for.

In previous user research, we identified that ‘guidance’ formats are more helpful for users than other formats but aren’t always weighted highly in search.

Our hypothesis was that changing the weighting of some formats would result in more users seeing relevant results. So we made a change to format weighting and prioritised guidance, detailed guides and start pages then showed this to the B group.

As a result:

  • clicks on ‘guidance’-type formats in search results increased
  • clicks on ‘FOI releases’ and ‘service assessment reports’ decreased

Going back to our previous example of searching for ‘NHS pension’ on GOV.UK, the service assessment report was no longer the top result. It was actually not on the first page of results.

We also had an impact on the exit rate, click-through rate and refinement rate. We’ve estimated there will be:

  • 68,000 fewer exits per year
  • 280,000 more clicks on search results per year
  • 142,000 fewer search refinements per year

Benchmarking team

Every 6 months to a year, the user researchers test whether we are making GOV.UK better by looking at certain tasks and measuring whether users can complete them and how long it takes. We call this benchmarking research. In quarter 1, we set up a team to respond to some of the things we’ve learnt about the tasks and ran 7 A/B tests.

Estimate child maintenance payments A/B tests

One of the benchmarking tasks is to estimate child maintenance payments based on certain circumstances. We’d seen in the benchmarking research that people often got lost when they clicked the link the link to check child maintenance rates.

Users often tried to calculate child maintenance payments themselves rather than using the calculator when the purpose of the calculator is to help them work out child maintenance payments. Digging into the data, it showed that:

  • users who click on the ‘child maintenance rates’ link had a completion rate of only 62%
  • users who don’t click that link and click the ‘calculate your child maintenance’ button instead had a completion rate of 85%

So we ran an A/B test where we removed the distracting link.

We found that click-through rates on the 'calculate your child maintenance rates' green button increased by 6%. This is a good because the content after clicking the green button is where users can find the answer to the task.

Contact DVLA A/B test

Another benchmarking task is to find the number to contact DVLA about your driving licence and the correct answer is at the end of the smart answer after clicking the green button onthis page.

We found in user research that the words ‘Start now’ really threw people off in this scenario and made them think they were starting some kind of application rather than finding a contact number. So we changed the text in the green button from ‘start now’ to ‘find contact details’ and ran an A/B test.

We found the B group showed substantial improvements – there was a 30% increase in clicks on the green button on mobile, and a 14% increase in clicks on the green button on desktop. We’ve estimated there will be 15,000 more user sessions every week that get to the page with the correct answer – to contact DVLA about driving licences. This means around 780,000 more successful user sessions per year just from changing the text on one button.

What did we learn about A/B testing?

We learnt a lot about A/B testing this quarter.

Running A/B tests gave us confidence that the changes we were making were an improvement for users.

Not all of the A/B tests showed that the change we were testing performed better, but that’s ok – they were still valuable because we learnt quickly and abandoned the changes that didn’t work.

We also learnt that A/B testing isn’t a silver bullet and it isn’t a replacement for user research. We still value user research and we’re still doing user research across the GOV.UK programme. It’s more that A/B testing is an extra method that teams can use in conjunction with user research to make improvements for users.

Mark is a Product Manager on GOV.UK. You can follow him on Twitter.

Sharing and comments

Share this page

1 comment

  1. Comment by Tom Adams posted on

    Good to read this. With the traffic we get through some of the government services, there's no excuse not to do AB testing.

    I'm a user researcher at DWP, and I previously worked in an agency running a conversion optimisation team doing a lot of AB testing, so I find the last paragraph in particular really interesting.

    My view is that there is never an either/or between user research and data-led research. They're part of the same cycle of activity. UR is a critical part of creating strong hypotheses for AB tests, and AB testing is the only certain way to get a quantitative confirmation of the effectiveness of a design change.

    AB testing also gives really powerful data to feed back in to user research, particularly when you start segmenting down. I find there is sometimes a false distinction made between quantitative research like AB tests, and qualitative research like users interviews - but they're just two sides of the same coin. They both help us understand more about our users and what works for our users.