This month we’re beginning an experimental pilot of GOV.UK Chat, our AI-powered chatbot that allows users, for the first time, to get quick, personalised answers to their questions based on GOV.UK guidance.
Earlier this year we explained how we think generative AI could make it easier for people to find answers from the 700,000+ pages of GOV.UK, and shared our early findings. This test represents our next step in understanding how we can use AI to provide simple, helpful tools that save people time. This work is being led by the GOV.UK AI team, a multidisciplinary group of civil servants that includes data scientists alongside developers, user researchers and designers.
The team is testing and scaling GOV.UK Chat in stages. When we first tested the tool with hundreds of invited users in late 2023, our results showed nearly 70% of users found the chatbot’s responses were useful, and just under 65% were satisfied with the experience. These findings demonstrated the positive impact this technology could have for users looking to access government information and services.
The 2023 test also found answers did not reach the highest level of accuracy demanded for a site like GOV.UK. We’ve made a number of changes since then, with our latest evaluations showing a consistent improvement in accuracy and other key performance metrics.
For this pilot we’re now looking to test with users who rely on GOV.UK as part of running a business. Our aim is to get a better understanding of where GOV.UK Chat provides value and also identify areas for further improvement.
Here’s what we’ve been up to since we last blogged about GOV.UK Chat, and what’s happening now.
Making improvements and addressing risks
The team has improved GOV.UK Chat significantly since our last update.
Some of the new changes include:
- user experience improvements, introducing an onboarding process, answer checking, and better accessibility
- improving the accuracy and completeness of the answers generated
- adding “guardrails” (filters and rules) that help GOV.UK Chat determine which questions it should answer
While we’ve been able to steadily improve accuracy, there are still risks we need to manage when working with generative AI at this relatively early stage. Industry-wide, no one has been able to reach 100% accuracy when generating answers. This means managing the risk of ‘hallucinations’ – where the system generates responses containing incorrect information presented as fact – is part of working with this technology right now.
To reduce the risks of inaccurate answers, we:
- work collaboratively with subject matter experts at HMRC to score the accuracy of the LLM answers as we iterate
- assess AI answers against example answers written by content designers
- monitor for inaccurate or inappropriate answers and investigate when they occur
- explain the risk of inaccurate answers to users as part of the onboarding process
- provide a link underneath every answer so users can check the source guidance
In addition to reducing inaccuracies, we’ve also focused on the risk to generative AI systems of their safety mechanisms being compromised in order to deliberately generate potentially harmful responses.
While we accept that no technological system can be made 100% secure against this type of activity, we’ve taken extensive measures to minimise the risk of harmful outputs. For example, throughout the development of GOV.UK Chat we’ve carried out “red teaming” exercises. This is where colleagues from across government have tried their hardest to “jailbreak” the system or make it not behave as intended.
These exercises have allowed us to identify and address potential risks to help ensure a safe and secure experience for users interacting with GOV.UK Chat for its intended purpose of supporting citizens with business queries. Unfortunately, we can’t eliminate all possibility of an individual generating an inappropriate response; you can read more about how we’re understanding and addressing the risk of jailbreaking AI in our blog post on the subject.
What we’re doing now
For this pilot – known as a private beta test – a link to try GOV.UK Chat will appear on selected business pages on GOV.UK, and we’ll be rolling out access using a waiting list as capacity allows. We’re looking to gather a strong dataset about how GOV.UK Chat is used in real-world scenarios, which we estimate will take 4 weeks.
We will be monitoring the pilot closely, reviewing usage data and collecting feedback in a survey. In parallel, we’ll be running in-depth user research to understand what works for people and what doesn’t. Together these findings will help us make further improvements to GOV.UK Chat and inform what we do next.
How GOV.UK Chat works
GOV.UK Chat allows users to have short conversations with an AI chatbot. It can provide an answer to any business-related question that can be answered using pages published on GOV.UK.
GOV.UK Chat uses a method called Retrieval-Augmented Generation (RAG) to provide answers, which works like this:
- A user asks a question.
- We look for information on GOV.UK relevant to the user’s question.
- We generate an answer based on relevant GOV.UK pages.
- We make sure the answer is safe and appropriate to the question.
- The user is presented with an answer along with ‘Check this answer’ links to verify it.
GOV.UK Chat uses some of the same technology behind ChatGPT (OpenAI’s GPT-4o and GPT-4o mini), but unlike ChatGPT, GOV.UK Chat is designed to draw on GOV.UK as the source of its answers. This means we can ensure it’s always using the most up-to-date guidance, and users can trust the answer comes from government.
We’ve designed GOV.UK Chat so that no personal data is required to use it, and the service makes it clear that users shouldn’t provide personal data in their questions. Just in case, GOV.UK Chat detects common forms of personal data such as phone numbers, as it prevents them from being entered as part of users’ questions.
What’s next
We’ll be sharing the outcomes of the test on this blog once we’ve had time to analyse them. You can subscribe using the link below to stay up to date and read about our findings as soon as they’re published.
Subscribe to Inside GOV.UK to get the latest updates about our work.
Leave a comment