https://insidegovuk.blog.gov.uk/2013/12/06/making-transparency-data-more-transparent-with-a-csv-preview/

Making transparency data more transparent with a CSV preview

Last week we introduced a new feature that allows users to preview any CSV file on GOV.UK.

This means users can now quickly look up a specific figure or check if the file contains what they want without downloading and opening it in desktop software. It works automatically for any government data published to GOV.UK in CSV format – like this data about spending over £25,000 from DCMS or this report on British behaviour abroad.

We currently have 7,422 CSV files and 17,062 Excel files on GOV.UK. That ratio is probably the wrong way around – CSV is a great format for open data because it doesn’t require users to buy any proprietary software and, more importantly, it is machine-readable.

The Open Knowledge Foundation calls CSV “probably the most widely supported data format in the world” because it’s supported by spreadsheets like Excel, OpenOffice and Google Docs, complex databases and almost all programming languages.

We hope that providing a user-friendly preview will encourage departments to publish more of their data in CSV format.

Releasing this new feature also highlighted instances of poorly formatted CSV files that aren’t really CSVs, because they don’t contain tabular data. So here’s a quick guide for editors on how to create a well-formatted CSV:

1. Make sure it’s tabular.

CSV is a very simple format that represents tabular data. This will consist of an (optional) header row, followed by data rows. For example:

 Name, Role, Address
 David Cameron, Prime Minister, 10 Downing Street
 George Osborne, Chancellor of the Exchequer, 11 Downing Street

If you are trying to export an Excel spreadsheet with complex formatting as CSV, make sure you convert it to a simple table (header row, followed by data rows) first.

2. Make sure there aren’t any unnecessary blank lines or empty rows on the file.

3. Try and export your CSV with UTF-8 encoding.

All computer text files have a character encoding which defines how the bytes in the file maps to the characters that you see ultimately see on screen. For example, the ASCII character encoding set uses the numbers 0-127 to represent English characters. The most versatile and widely compatible character encoding is UTF-8, and where possible you should generate your CSV files with UTF-8 encoding.  This will ensure that it renders properly in the preview. If you are using Microsoft Excel, you can choose the encoding of your CSV at the point at which you export it (this article explains how in more detail).

You can read more about the rules for generating good CSV files here.

Analytics show that users are already making the most of the new CSV preview feature, so we’ll follow up soon with a detailed blog post about that. As usual we’d love your feedback.

Stay in touch. Sign up now for email updates from this blog.

5 other GDS blogposts we think you might find interesting

GOV.UK one year on
Improving browse and navigation
Government as a data model: what I learned in Estonia (GDS blog)
How many people are missing out on JavaScript enhancement? (GDS blog)
Do we need British Sign Language on GOV.UK? (accessibility blog)

7 comments

  1. Mark Scott

    A couple of useful clarifications required to your CSV hints:

    How do you handle data elements that contain commas?

    If you ask for them to be in quotes, how do you handle data elements that contain quotes?

    Reply
    • Tekin Suleyman

      Hi Mark, thanks for the question. If you are using a spreadsheet program such as Excel to export your CSV, you can generally rely on it to escape the content for you. If you are manually generating your CSV, then you will want to wrap fields that contain commas with double-quotes, e.g.

      David Cameron, Prime Minister, "10 Downing St, London".

      If your quoted data also contains a double-quote, then you can represent it with a pair of double-qoutes, e.g.

      David Cameron, Prime Minister, "10 Downing St, London, aka ""Number 10""".

      I've updated the article to link to the Wikipedia entry on the basic rules for escaping CSV – http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules_and_examples.

      Reply
  2. James Stewart

    This looks great.

    It's probably worth noting that as well as being the sensible format to use, UTF-8 was one of the first open standards to be adopted as part of the government open standards process so everyone should be using it. http://standards.data.gov.uk/challenge/cross-platform-character-encoding

    Reply
  3. David Read

    And of course all the data has been previewable on data.gov.uk for a couple of years now e.g.

    http://data.gov.uk/dataset/spend-over-25k-dcms/resource/f639b408-d038-4c77-bbe2-fdd1c177369e

    I guess that's the joy of us all using open source – you can just reuse the same code that we did.

    Readers might be interested to hear that the thousands of spend data CSVs are automatically aggregated from across all public bodies by the OpenSpending project:
    http://data.gov.uk/data/openspending-report/index
    This is the sort of task which requires good clean CSV files – mashing up thousands of CSVs at once. Consistent column headings (this sheet has a typo…), dates in a consistent format, avoiding putting totals at the bottom, etc. Government put out plenty of CSVs in Windows-1252 encoding etc and one useful thing that gov.uk might do is to add an automatic encoding checker, now that UTF-8 is set to become a government open standard.

    Reply
  4. Max Froumentin

    Work is starting within W3C and the ODI to link CSV and semantic web formats (see http://www.w3.org/2013/05/lcsv-charter.html and http://jenit.github.io/linked-csv/), primarily by offering a way to define machine-readable CSV labels, which enables the data to be linked to and mapped to RDF, making it 5-star linked data. Something to follow closely as the work progresses.

    Reply
  5. Bruce MacKay

    Can you clarify please, does the preview tool have a column width limit?
    For example table 1 on this page
    https://www.gov.uk/government/publications/statistics-on-international-development-2013-tables

    When the csv file is downloaded figures appear on one line however the same figure in the preview version is on two lines. Can you suggest a solution.

    Reply
    • Alice Newton

      Hi Bruce,

      The reason this isn't displaying properly is that your data is not tabular. To be a real CSV file, it can only have one heading row (which must be the first row) and then data rows. The post above gives some advice about how to do this.

      Thanks,
      Alice

      Reply

Leave a comment