https://insidegovuk.blog.gov.uk/2014/06/13/how-gov-uk-site-search-works/

How GOV.UK site search works

The information in this blogpost may now be out of date. See the current GOV.UK content and publishing guidance.

We’re often asked, ‘how does search work?’ Site search is quite a big topic, but what people usually want to know is: why are the search results in this order, or why isn’t something appearing, and what can we do about it?

GOV.UK uses an open source search engine called Elasticsearch, which we customise to suit our needs. The basics are the same as for most search engines: it makes an ‘index’ of our content, and when you enter a search query, it lists results that match those words, in order of relevance. (See Google’s guide to how search works.)

Search results for 'jobs'
Example of a 'matched' query

 

Matching a query

‘Match’ simply means that some of the words you searched for were found in the content. To reduce irrelevant results, more than half of the words need to match (for example, 2 out of 3 words).

If a word isn’t used in the content at all, then that content won’t appear in the results when people search for that word. There are ways around this, which I’ll cover further down, but it’s best to include likely search terms on the page where possible (which will also help external search engines such as Google).

Words in the search index are ‘stemmed’ to their root, so similar words can be interchangeable. For example, all of these are treated as ‘travel’: travel, travels, travelling, travelled, traveller and travellers. Search queries with any of these words will return results matching any of these words.

If the search engine thinks you’ve misspelled a word, it offers a ‘Did you mean’ suggestion that you can click to update your query. At the moment these suggestions are based on a standard dictionary, so some terms and names aren’t recognised and you might see some odd suggestions. We’re planning to improve this so that the search engine knows the words used in the content and offers more useful recommendations.

search-results-thailand
Example of a 'Did you mean?' suggestion

 

Relevance of results

To put the results in the most useful order, the search engine makes its best guess at what you’re looking for. It doesn’t always get it right, but we’re using our analytics data to keep improving the configuration.

For fairly unique content with distinct terms, searching already works quite well. But when there are dozens of pages about different aspects of a topic, it's more difficult for the search algorithms to understand them and pick the most relevant order.

The algorithms take into account things like how often the words appear as a proportion of the total word count, and which part of the page the words appear in - if a word is in the title then it’s more likely to be what you wanted.

When you search for several words, content where they’re together as a phrase appears higher than content where the individual words are spread out.

We also use some customised settings specific to GOV.UK’s content and structure. Results from the ‘mainstream’ services and information have a higher weighting than ‘departments and policy’ content, to make sure that the general public can easily find what they need without being confused by professional or specialist information.

Some types of content are also given higher or lower priority, based on what’s generally most helpful for most users. And the latest news stories are weighted higher than older news.

Recently we’ve started weighting results by popularity as well, using analytics data on how often the content is viewed. This helps the more common user needs to appear near the top of the results, above the wider pool of related content that matches the same words. It’s not perfect though, so sometimes very popular content outweighs more specific results.

There's more we'd like to experiment with, so the accuracy of results should keep getting better.

search-results-fees
Example of 'fee' results with weighted popularity

 

Manual intervention

We regularly review our search data and feedback to see how people are searching, what's not working as well as it should, and what we can do to make it better.

We also look at search data from some of the websites transitioning to GOV.UK, so that we can start getting to know their content and anticipate potential search problems.

We look for patterns and recurring problems to help us plan improvements to our algorithms, and sometimes we make specific manual changes to deal with individual search terms.

The main method we use for this is ‘synonyms’. We can tell the search engine that one word or phrase is equivalent to another, to cater for things like:

  • different spellings (eg adviser, advisor)
  • same or similar meanings (eg expired, out of date)
  • informal terms (eg bedroom tax)
  • previous names (eg Inland Revenue)
  • acronyms (eg NI for National Insurance)
  • abbreviations (eg reg for registration)
  • form and leaflet numbers (eg EX50 for court fees)
  • common mistakes (eg DLVA for DVLA)
synonyms
Example of synonyms to manually influence search results

These synonyms apply to all matching content in the search index, so we use them with care to prevent unwanted side-effects. At the moment you can’t add hidden keywords to individual items, but that’s on our list for potential development.

We’ve also just released a long-awaited ‘best bets’ tool. This lets us specify results that will always be at the top for a certain search query. So if the analytics data shows that most users are clicking on a particular result further down the list, we can make their search easier by moving it up.

Finally, we maintain a set of external links to help people searching for common, government-related user needs that aren’t covered on GOV.UK.

The search results page

Last month, we combined the separate tabs into a single list of search results, so that users don’t need to understand which section to look at.

Only the first 50 results are displayed, but we’re planning to add pagination of results in future. Our data shows that two thirds of users click on one of the top 3 results, and 9 out of 10 users choose one of the first 10 results, so not many people need to go beyond the first page. In the meantime, you can find more results by using the filter or adapting your search query.

You can still tell the content types apart by their metadata: at the moment the mainstream results show their section, while departments and policy results show their date, format and organisation.

One of the concerns from transitioning organisations and their users is that GOV.UK search mixes everything together from the whole of the government, whereas on separate websites you’re searching just their content. To complicate things further, acronyms and abbreviations can have different meanings in different organisations and industries, and some programmes and publications share the same name - for instance, 'green book' could mean immunisation information or HM Treasury guidance.

To narrow down the results, you can filter by organisation (in site search or in the specific Policies, Publications and Announcements searches). We'll be looking at what other filters may be useful, particularly for specialist content.

search-results-green-book
Filtering search results by organisation

 

Updating the index

Whenever a piece of content is published, the publishing tool sends a signal to the search engine, so it’s added or updated in the index straight away.

However, search results are cached for up to 30 minutes, so if you don’t see the new content immediately, try again after a while.

Missing content

Right now some types of content aren’t being indexed for various reasons, including browse and specialist sector pages, campaigns, business support pages, finders and specialist documents, and uploaded files (such as PDFs). We’re aiming to add most of these in future.

Some other content types are already in the search index, but don’t always appear because they don’t have much content or metadata. These include policy supporting pages and groups. They often appear in search results as just a title without any information or description. We’re looking into what’s needed to make these easier to find.

The Government Service Design Manual and GOV.UK blogs have their own search functions, though we might include them in the overall GOV.UK search in future.

Search improvements

Sometimes you can solve or avoid a problem simply by editing the content (for example by including other words), but wider issues will need algorithm or configuration changes.

We know that site search is still a big challenge as more and more content moves on to GOV.UK, serving different audiences. We want to keep it simple enough for inexperienced users, while providing enough functionality for advanced users to narrow down exactly what they need.

Following our initial search improvements project last year, we now have a permanent team and we’ve begun some significant changes with the release of ‘unified search’, with more to come. We've got lots of ideas, so we'll keep on discussing and prioritising them to decide which features to work on next.

If you have any feedback about search, do let us know.

Keep in touch. Sign up to email updates from this blog.

18 comments

  1. Comment by John Ploughman posted on

    Hi Tara

    Great to see this detailed information about how search is working at the moment.

    In DVSA we're seeing searches from our homepage (www.gov.uk/dvsa) for our blogs, which are hosted on blog.gov.uk. At the moment these don't seem to be indexed. We've 'hacked' this by creating a publication page for 'Promotional material' that is set as a publication held on another website to point to the blogs.

    This isn't ideal though - it's creating an extra step for users to go through in order to get to the content they're searching for.

    Are there any plans to start indexing GOV.UK blogs, or is this something that would need to be handled through 'best bets'?

    Thanks!
    John

  2. Comment by Graham Francis posted on

    Hi John

    Interesting point. It'd be good to see specific details of the terms being searched for.

    But in general, our next step on the blog platform is to allow blog posts to be indexed and searchable within the blog platform itself. We should hopefully get there fairly soon.

    We might link this up to the main GOV.UK platform, but we're not yet totally convinced of the need and benefits (for example, would users searching for 'budget 2014' be expecting to see guidance from this blog on the mechanics of budget publishing to GOv.UK? )

    • Replies to Graham Francis>

      Comment by John Ploughman posted on

      Hi Graham

      The one where we're seeing most searches for is 'Matters of Testing' and variants, eg in uppercase, lowercase, all one word etc. The blog is at https://mattersoftesting.blog.gov.uk/

      In the last 30 days there have been 113 of these searches of the 1,652 unique searches from our organisation homepage.

      We're already using the new signposting design to get users to our most searched for information and services, and using the featured slots to promote 'Departments and policy' content that users are searching for most. So we've lowered search volumes overall.

      But I know we can do better to help these users searching for the blog so they don't have to go via a publication splash page to get what they want.

      I think just getting the blog names in as search best bets would be more helpful, rather than surfacing actual blog posts in the main GOV.UK.

      • Replies to John Ploughman>

        Comment by Graham Francis posted on

        Awesome. In which case, "Matters of Testing" might be a candidate for a 'best bet' search - but I'll let Tara comment on this when she's around.

      • Replies to John Ploughman>

        Comment by Tara Stockford posted on

        Hi John

        Thanks for this. At the moment there's a link to the overall blogs page, but adding links for the individual blogs is on our to-do list. That will give people a direct route if they search for a particular blog by name. (These are 'external links' rather than best bets, which are for content that's already in the search index.)

  3. Comment by Peter posted on

    Nice blog. One question on this piece:

    ------
    We also use some customised settings specific to GOV.UK’s content and structure. Results from the ‘mainstream’ services and information have a higher weighting than ‘departments and policy’ content, to make sure that the general public can easily find what they need without being confused by professional or specialist information.
    ------

    I understand the intent but could this be turned into a switchable option which a user can control as per their needs - e.g. acting in a professional capacity or acting as someone who wants to transact with a service?

    For example compare this:

    https://duckduckgo.com/?q=jobs+data+site%3Awww.gov.uk

    to

    https://www.gov.uk/search?q=jobs+data

    Because of the prioritisation on the Gov.uk search engine I get the same top 3 as your original example when actually I'm looking for some data rather than to transact with a 'mainstream' service.

    • Replies to Peter>

      Comment by Tara Stockford posted on

      Hi Peter

      Thanks for your feedback. We'll be looking at ways of sorting and filtering results in future, to see how we can best meet the needs of different types of users.

  4. Comment by Paulette HIll posted on

    Good, clear, detailed tips on searching and nice easy to read typeface. There is a lot of blank space on the right hand side of the screen. Need to insert links, graphics etc.

  5. Comment by Tim Blackwell posted on

    Hi. Thanks for this article: it answers quite a few of the questions I've had about how search works.

    Indexing PDFs could be very useful, but maximizing the benefit of this is not necessarily straightforward, as PDFs can be very long compared to most of the content on .GOV.UK. Furthermore, PDFs can be parts of collections where the linkages are none obvious if you go direct to the PDF. Context, currently rather weak on .GOV.UK, is critical (I am not suggesting that you do the worst UX thing ever: the PDF in a frame, still seen on some e-procurement portals).

    • Replies to Tim Blackwell>

      Comment by Tara Stockford posted on

      Hi Tim - thanks.

      For files like PDFs we're likely to show the page they're attached to, rather than taking you straight to a file out of context. We'll have to see how well it works in practice.

  6. Comment by Dave Williamson posted on

    Hi Tara

    Which search engine are using? And is the code for it available anywhere?

    Would be interested in seeing how we might use your development work

  7. Comment by Andrew Robertson posted on

    Thanks for a really useful post.

    'Stemmed' words you mentioned doesn't always work: there are very different results for "environmental permit" and "environmental permitting". Would be great if you can tweak to make the 'permitting' results the same as the 'permit' ones.

    Colleagues have noted that some results that seem unrelated appear high, notably for travel advice. Although what they want is usually in the results, I sense that people don't trust the results when the titles appear (to non-search engine experts) as irrelevant.

    Examples: Search for…. Get….

    Odour guidance…. Bosnia and Herzegovina travel advice

    River pollution ….. Thailand travel advice first result (‘river’ and ‘pollution’ not included on the page). Malaysia travel advice high for 'report river pollution'.

    Permit charges... Thailand travel advice and national insurance higher than publication 'Environmental permitting (EP) charges scheme: April 2014 to March 2015'.

    Environmental permit charges... not in the first few results unless you go via a mainstream page (The publication is called 'Environmental permitting (EP) charges scheme: April 2014 to March 2015' which seems to include all the key words searched for, unlike Thailand travel advice that is much higher above.)

    Exemptions ….. renew a tax disc top result (this doesn’t have the word ‘exempt’ or ‘exemption’ on the page. Admittedly ‘exemption’ is too vague and needs ‘waste’ or ‘agriculture’ adding to the phrase or filtering by organisation to be meaningful, but waste exemptions used to be findable in that way on our old website and think customers are expecting similar results from our home page).

    landfill engineering guidance.... tier 4 student visa

    I realise search is tricky, and hope these are useful examples. I'm working with colleagues to get specific details that they and the customers they talk to struggle with. It seems that people are getting more distracted with the results that may appear less relevant, even though what they want is usually high in results.

    And just to clarify, if we want to suggest a 'best bet' for search, is Zendesk the best route and what info will help you decide to accept or reject the request? Do you want requests grouped on one ticket or separate if very different searches? Can we suggest synonyms too?

    Thanks!

    • Replies to Andrew Robertson>

      Comment by Tara Stockford posted on

      Hi Andrew

      Thanks, that’s very helpful. We’ve seen other examples of irrelevant results - usually they do contain the search terms but they’re just mentioned in passing, rather than what the page is about. In travel advice they’re often on the subpages so it’s less obvious. We’re also aware of a couple of pages appearing even when they don’t seem to match the words at all, so we’re looking into that too.

      Interestingly I’d made ‘permitting’ an exception back in April, so that the ‘environmental permitting’ content wasn’t buried by the ‘environmental permit’ content. If that’s not helpful I can remove it so they’re interchangeable, and we’ll see if the results are better with the current algorithms.

      All feedback is welcome, and yes, Zendesk is best for requests or suggestions. We review them based on a mixture of your expert knowledge and our data on what users are searching for and which results they’re clicking on most, so feel free to include any useful evidence. Synonyms are a bit more complicated but even if we can’t add them right now, we can keep them on file for future improvements. So do keep them coming!

  8. Comment by Ricky Leach - Global Contact Centres, FCO posted on

    HI tara

    Thanks for the explanation. Perhaps you can give me a steer on aparticular problem we have in consular....

    On gov.uk, if you search ‘embassy madrid’ you get a reasonable result. Same goes for all named sovereign posts with embassy names or ‘high commission +sovereign city’).

    The same cannot be said for the organisation plus the country name ‘embassy madrid’ or ‘commission uganda’ or ‘high commission uganda’ which gives the result much further down the list. In the case of ‘embassy uganda’ you get no correct result at all, which means we are relying on our customers realising that we offer high commissions in countries with a commonwealth connection and embassies in the rest of the world – not a good basis for a search strategy.

    We already have identified issues matching the search for a consulate +city on gov.uk.

    Is there a way of getting the search to realise that embassy & high commission & consulate are all interchangeable and that when searched with a country or sovereign city name, that the result shows up the consular organisation for that country?

  9. Comment by Anita posted on

    You need to explain 'algorithms' a bit better perhaps.

  10. Comment by Andrew Robertson posted on

    Hi Tara, I'm asking this on the blog rather than via Zendesk in case the answer helps others understand search.

    Are the sub-pages of mainstream guides hidden from search? For example, if I search for "waste duty of care" I get page 'Business and commercial waste' (https://www.gov.uk/managing-your-waste-an-overview) near the top. That makes sense. But what might be even better is if I could go straight to section 2 'duty of care' in the mainstream content at https://www.gov.uk/managing-your-waste-an-overview/duty-of-care. Perhaps the heading in search could be something like 'Business and commercial waste - duty of care'?

    Thanks for your help on this and other search tweaks you've been helping us with.

  11. Comment by Keith Prust posted on

    Has the page about "Thailand travel advice" been artificially optimised to appear at the top of search results? It regularly comes top of near the top for seemingly unrelated searches. The latest example is a search for "employers' guide to access to work". Despite this exact phrase appearing in the summary of the page I need, that page comes second to Thailand travel advice. As navigation on GOV.UK relies on search, unhelpful results such as this won't provide much confidence for users.