Skip to main content

https://insidegovuk.blog.gov.uk/2013/11/21/link-validation-and-cleanup/

Link validation and cleanup

The majority of problems reported by users on the departments and policy section of GOV.UK are about bad links. In the last couple of days, we've added validation checks to make sure all links created using markdown in the departments and policy publishing tool are correctly formed. If bad links are detected in a document, you will be prevented from publishing it until they are fixed.

What's a good link?

There are three different kinds of good links in our publisher (this guidance is also found on the sidebar of the edit page):

  • All documents created in the publisher - policies, publications, news, speeches, detailed guides etc - should be linked to using absolute paths from within the publisher: [link text](/government/admin/policies/3373)
  • All content created under an organisation tab - collection pages, topics, organisations, people, roles etc - should be linked to using the full, public URLs: [link text](https://www.gov.uk/government/topics/climate-change)
  • For external websites, use the full URL including http://: [link text](http://www.example.com). The link will display with an external link symbol.

What the validation checks do

When you save a draft edition of a document, the publishing software will now check any markdown syntax for inline links and identify if the link has been correctly formed.

If the link is a path, we'll perform checks to confirm that it is:

  • To admin on GOV.UK
  • Absolute instead of relative (i.e. starts with a /)

If the link is a URL, we'll perform checks to confirm that it:

  • Starts with http:// or https:// or mailto:
  • Does not contain 'whitehall-admin'

If bad links are found

If bad links are detected, it will not be possible to publish or force-publish the document until you have fixed them.

When you look at the view page of any document, there will be a warning if the document contains bad links. When you edit one of these documents, on the right hand side you will see a list of all the links detected as invalid and a suggestion of  how they could be fixed.

Automatically fixing links

We will shortly be running an automatic fix for all content currently on the site, to correct the following common issues:

  • Links which refer to 'whitehall-admin', either as full admin URLs or any preview URL
  • Admin paths which are relative instead of absolute.
  • URLs which begin 'www' rather than 'http://'
  • Links which contain '@' rather than 'mailto:'
  • URLs which start 'http;' rather than 'http:'

This should remove the need for a lot of the manual clean-up by publishers.

Future iteration

Creating inline links in markdown is admittedly confusing. We have plans to make that easier in future, by building a tool in publisher that helps you to write good links. It might work like the current contacts tool, which suggests and autocompletes contacts for you as you begin typing.

And we are working right now on a more substantial link checker, which will scan the site and find all links returning 4xx and 5xx errors.

Sharing and comments

Share this page

10 comments

  1. Comment by E.A. Brown posted on

    HI Alice and Neil,
    Your message is welcome, as this feature appeared to us 2 days ago. We logged a bug because we now can't publish updates to our collections and our authors are getting anxious, and we've had no answer from Zendesk.

    It would be a great help if changes to the Publisher interface were announced before they happened, not afterward.

    Your instructions say: ' All documents created in the publisher – policies, publications, news, speeches, detailed guides etc – should be linked to using absolute paths from within the publisher: [link text](/government/admin/policies/3373)'

    We have many documents that we create in Publisher, that do not include 'admin' in the path, including publications and press releases/gov't responses. Example:

    https://www.gov.uk/government/news/vaccines-and-gelatine-phe-response

    Yet when we add this to a collection in the introductory content:

    [Vaccines and gelatine: PHE response](/government/news/vaccines-and-gelatine-phe-response)

    We get the warning 'You will not be able to publish this document until these are fixed'.

    As far as any of us trained on GOV.UK can tell, we have followed your instructions to the letter: the paths we're using are to items 'created within Publisher' and yet they're flagged up.

    Our content does not appear to have a path pattern like your example
    /government/admin/policies/3373 - most of ours look like

    /government/organisations/public-health-england/series/rotavirus-vaccination-progarmme-for-infants
    or
    /government/publications/the-complete-routine-immunisation-schedule-201314

    Your how-to instructions on github refer to the 'Publisher URL': how is this different from any other URL? and how do we get it?

    If I preview one of our collections I get

    https://whitehall-admin.production.alphagov.co.uk/government/collections/annual-flu-programme

    The live version is
    https://www.gov.uk/government/collections/annual-flu-programme

    I cannot see a difference after the /government part of the path.

    • Replies to E.A. Brown>

      Comment by Alice Newton posted on

      Hi Elizabeth,

      We discussed these changes at show and tell, but I appreciate that it's not possible for everyone to attend this, so we'll try to blog about them in advance.

      The reason your links are not working is because you are using the path for the front-end, rather than for admin. To get the admin path, you should search for these documents from within the publisher rather than on the front-end. So for example, having searched in publisher for your news story: https://www.gov.uk/government/news/vaccines-and-gelatine-phe-response
      I see that the admin path is: /government/admin/news/246359.

      It is more robust to link to documents in this way. However, for the purpose of swiftly fixing broken links like the ones you outline above, it would be acceptable to change the front-end paths for front-end URLs, which will then pass the validation. That is, change:
      /government/news/vaccines-and-gelatine-phe-response
      to: https://www.gov.uk/government/news/vaccines-and-gelatine-phe-response.

      I hope that helps - we can follow up via email if you'd like.

      Best wishes,
      Alice

  2. Comment by Tom Ripley posted on

    This automatic checking will be a useful feature.

    Your post includes the same guidance about links that's available on the sidebar of the publilsher, but the lists of documents that should be linked by the different methods both end unhelpfullly with 'etc'; can these lists be expanded to actually show which document types should be linked by each method, please?

    I'm pleased to see that mailto: paths are supported. Given the increasing number of users accessing the site from mobile devices, is there any plan to support the equivalent tel: path for phone numbers so that users can make a call with a single tap? Attempts to apply this useful feature currently result in an error message.

  3. Comment by Neil Williams posted on

    Hi Tom

    The list of document types already exists, it's the list you get under the "New document" tab in the publisher.

    Good suggestion about tel: thanks, I will discuss with the technical team here.

    Neil

  4. Comment by Ale del Cueto posted on

    Hi Alice, Neil

    Thanks for the update.

    On the whole, this seems like a useful tool, but we have a couple of questions:

    1. Why can’t we use the full live GOV.UK URL for all internal links? The blog post states we are going to have to find the admin URL in publisher for most internal links, even though using the full live URL works just as well. If the new system can validate normal http links, why do we need to use Whitehall links for internal content?

    The reason we’re bringing this up is that finding the publisher URL of every internal link will be potentially very time consuming, especially because editors who wish to add internal links normally have the live pages they are linking to open already. Also, we are concerned that this may end up discouraging busy editors from adding the more optional internal links that would make their copy a bit more useful.

    2. Won’t adding the full live GOV.UK URL actually help us avoid broken links? When, say, a publication is superseded by a more recent version, we normally submit a ticket for the old version to be unpublished and redirected to the new version. That means that any item that may have had an internal link to that publication just redirects to the new version. Our question is: will these redirects also work if we use the admin URL instead of the full live one in a piece of content?

    Thanks
    Ale

  5. Comment by Michael Williams posted on

    Your explanation of absolute and relative links seems to reverse the generally accepted definition – which is that absolute links use full URLs (usually beginning with http[s]:// ) and relative links begin with /. Relative links are so called because the specified path is relative to a common, base URL.
    See http://en.wikipedia.org/wiki/Uniform_resource_identifier#Examples_of_absolute_URIs

    • Replies to Michael Williams>

      Comment by Alice Newton posted on

      Hi Michael,

      We're talking about a few different concepts here:
      - URLs, which are absolute.
      - Absolute paths, which start with a /, eg /government/admin/policies/3373
      - Relative paths, which don't start with a /, eg government/admin/policies/3373

      After some internal discussion it seems that there are debates around terminology in this area. We're going to update our guidance to be really clear and use more examples.

      Thanks,
      Alice

  6. Comment by Pete Tilley posted on

    When I use the link checker it flags up all the internal links as follows
    Broken links

    We’ve found links in this document that may be broken:
    •#considerations
    •#gstpu
    •#motorfuel

    Will these stop the publish also?

    • Replies to Pete Tilley>

      Comment by Lisa Scott posted on

      Hello Pete - thanks for bringing this up.

      The Broken Link Checker warns of potential broken links only. It won't prevent you from publishing. We're aware that it flags up Anchor links and this is on our list to fix.