The information in this blogpost may now be out of date. See the current GOV.UK content and publishing guidance.
We’re often asked, ‘how does search work?’ Site search is quite a big topic, but what people usually want to know is: why are the search results in this order, or why isn’t something appearing, and what can we do about it?
GOV.UK uses an open source search engine called Elasticsearch, which we customise to suit our needs. The basics are the same as for most search engines: it makes an ‘index’ of our content, and when you enter a search query, it lists results that match those words, in order of relevance. (See Google’s guide to how search works.)
Matching a query
‘Match’ simply means that some of the words you searched for were found in the content. To reduce irrelevant results, more than half of the words need to match (for example, 2 out of 3 words).
If a word isn’t used in the content at all, then that content won’t appear in the results when people search for that word. There are ways around this, which I’ll cover further down, but it’s best to include likely search terms on the page where possible (which will also help external search engines such as Google).
Words in the search index are ‘stemmed’ to their root, so similar words can be interchangeable. For example, all of these are treated as ‘travel’: travel, travels, travelling, travelled, traveller and travellers. Search queries with any of these words will return results matching any of these words.
If the search engine thinks you’ve misspelled a word, it offers a ‘Did you mean’ suggestion that you can click to update your query. At the moment these suggestions are based on a standard dictionary, so some terms and names aren’t recognised and you might see some odd suggestions. We’re planning to improve this so that the search engine knows the words used in the content and offers more useful recommendations.
Relevance of results
To put the results in the most useful order, the search engine makes its best guess at what you’re looking for. It doesn’t always get it right, but we’re using our analytics data to keep improving the configuration.
For fairly unique content with distinct terms, searching already works quite well. But when there are dozens of pages about different aspects of a topic, it's more difficult for the search algorithms to understand them and pick the most relevant order.
The algorithms take into account things like how often the words appear as a proportion of the total word count, and which part of the page the words appear in - if a word is in the title then it’s more likely to be what you wanted.
When you search for several words, content where they’re together as a phrase appears higher than content where the individual words are spread out.
We also use some customised settings specific to GOV.UK’s content and structure. Results from the ‘mainstream’ services and information have a higher weighting than ‘departments and policy’ content, to make sure that the general public can easily find what they need without being confused by professional or specialist information.
Some types of content are also given higher or lower priority, based on what’s generally most helpful for most users. And the latest news stories are weighted higher than older news.
Recently we’ve started weighting results by popularity as well, using analytics data on how often the content is viewed. This helps the more common user needs to appear near the top of the results, above the wider pool of related content that matches the same words. It’s not perfect though, so sometimes very popular content outweighs more specific results.
There's more we'd like to experiment with, so the accuracy of results should keep getting better.
We regularly review our search data and feedback to see how people are searching, what's not working as well as it should, and what we can do to make it better.
We also look at search data from some of the websites transitioning to GOV.UK, so that we can start getting to know their content and anticipate potential search problems.
We look for patterns and recurring problems to help us plan improvements to our algorithms, and sometimes we make specific manual changes to deal with individual search terms.
The main method we use for this is ‘synonyms’. We can tell the search engine that one word or phrase is equivalent to another, to cater for things like:
- different spellings (eg adviser, advisor)
- same or similar meanings (eg expired, out of date)
- informal terms (eg bedroom tax)
- previous names (eg Inland Revenue)
- acronyms (eg NI for National Insurance)
- abbreviations (eg reg for registration)
- form and leaflet numbers (eg EX50 for court fees)
- common mistakes (eg DLVA for DVLA)
These synonyms apply to all matching content in the search index, so we use them with care to prevent unwanted side-effects. At the moment you can’t add hidden keywords to individual items, but that’s on our list for potential development.
We’ve also just released a long-awaited ‘best bets’ tool. This lets us specify results that will always be at the top for a certain search query. So if the analytics data shows that most users are clicking on a particular result further down the list, we can make their search easier by moving it up.
Finally, we maintain a set of external links to help people searching for common, government-related user needs that aren’t covered on GOV.UK.
The search results page
Last month, we combined the separate tabs into a single list of search results, so that users don’t need to understand which section to look at.
Only the first 50 results are displayed, but we’re planning to add pagination of results in future. Our data shows that two thirds of users click on one of the top 3 results, and 9 out of 10 users choose one of the first 10 results, so not many people need to go beyond the first page. In the meantime, you can find more results by using the filter or adapting your search query.
You can still tell the content types apart by their metadata: at the moment the mainstream results show their section, while departments and policy results show their date, format and organisation.
One of the concerns from transitioning organisations and their users is that GOV.UK search mixes everything together from the whole of the government, whereas on separate websites you’re searching just their content. To complicate things further, acronyms and abbreviations can have different meanings in different organisations and industries, and some programmes and publications share the same name - for instance, 'green book' could mean immunisation information or HM Treasury guidance.
To narrow down the results, you can filter by organisation (in site search or in the specific Policies, Publications and Announcements searches). We'll be looking at what other filters may be useful, particularly for specialist content.
Updating the index
Whenever a piece of content is published, the publishing tool sends a signal to the search engine, so it’s added or updated in the index straight away.
However, search results are cached for up to 30 minutes, so if you don’t see the new content immediately, try again after a while.
Right now some types of content aren’t being indexed for various reasons, including browse and specialist sector pages, campaigns, business support pages, finders and specialist documents, and uploaded files (such as PDFs). We’re aiming to add most of these in future.
Some other content types are already in the search index, but don’t always appear because they don’t have much content or metadata. These include policy supporting pages and groups. They often appear in search results as just a title without any information or description. We’re looking into what’s needed to make these easier to find.
Sometimes you can solve or avoid a problem simply by editing the content (for example by including other words), but wider issues will need algorithm or configuration changes.
We know that site search is still a big challenge as more and more content moves on to GOV.UK, serving different audiences. We want to keep it simple enough for inexperienced users, while providing enough functionality for advanced users to narrow down exactly what they need.
Following our initial search improvements project last year, we now have a permanent team and we’ve begun some significant changes with the release of ‘unified search’, with more to come. We've got lots of ideas, so we'll keep on discussing and prioritising them to decide which features to work on next.
If you have any feedback about search, do let us know.
Keep in touch. Sign up to email updates from this blog.