As the Government Digital Service (GDS) launches an experimental pilot of GOV.UK Chat this month, we want to share our thinking about a complex topic: the potential for Artificial Intelligence (AI) systems to be manipulated to produce unintended and sometimes harmful outputs, often referred to as “jailbreaking”.
GDS is committed to enhancing the digital experience on GOV.UK, so that people’s experience of interacting with government becomes more convenient and timesaving.
One way GDS is doing this is through experimenting to see if and how generative AI can improve the user experience of GOV.UK. But while we’re excited about the potential of AI to make it easier for people to access government information and give them time back, it’s crucial to acknowledge both the opportunities and the challenges this technology presents.
What is jailbreaking?
“Jailbreaking” is deliberately pursuing ways to manipulate Large Language Models (LLMs) into producing inappropriate or harmful content, often for malicious purposes. No LLM model is immune to this risk - and GOV.UK Chat, as an LLM-based application, is no exception. We’re not shying away from this reality. We've already encountered this ourselves during rigorous internal testing.
But it is important to note that a typical user of the tool is highly unlikely to see that type of output. Users do not have to be concerned that they will see any harmful content by interacting with GOV.UK Chat in everyday ways. These responses will only be produced by people who want to make the technology misbehave, by forcing these results.
Minimising the risk
We are taking clear steps to address this concern. We’ve invested heavily in safeguards to minimise the risk of harmful outputs, including:
- improving GOV.UK Chat’s guardrails - informed by the results of earlier testing, we’ve strengthened measures to detect and block the output of potentially inappropriate or harmful content
- collaboration with experts - we’ve been working with leading technical specialists across government, including colleagues at the Incubator for Artificial Intelligence (i.AI) and the AI Safety Institute, to ensure we’re using best practices and staying ahead of common attacks
- close monitoring - our team will be actively monitoring the system throughout the pilot, learning from real-world usage patterns and adapting our safeguards as needed
We’ll continue to refine these measures based on real-world usage. Our aim is to balance utility with safety, and we are working with colleagues at the AI Safety Institute - which, like GDS, is part of the Department for Science, Innovation and Technology - to help us get this balance right.
Even with these measures in place, however, we acknowledge that jailbreaking GOV.UK Chat remains a possibility.
The value of testing with real users
Ultimately, testing GOV.UK Chat is about learning, improving and understanding how people interact with AI in real-life scenarios. We think this is the best way to prove and develop a new technology that could make it easier and quicker for people to navigate complex information about government.
As we highlighted earlier, we cannot totally prevent deliberate malicious activity - but please be reassured that this will not affect your ability to try a tool we hope you will find useful as a way of accessing the information and services you need. By testing it in the open, with your help, we can better identify and address any limitations or vulnerabilities, creating a more robust and safer AI tool for everyone to use.
Subscribe to Inside GOV.UK to find out more about our work.
1 comment
Comment by Tarun Sharma posted on
It's great to see GDS exploring the potential of AI to improve user experience while maintaining strong safeguards. Balancing innovation with security is crucial, and your commitment to transparency and collaboration with experts is commendable.
Looking forward to seeing how GOV.UKChat evolves and enhances the way we interact with government services. Keep up the great work.