Last week we introduced a new feature that allows users to preview any CSV file on GOV.UK.
This means users can now quickly look up a specific figure or check if the file contains what they want without downloading and opening it in desktop software. It works automatically for any government data published to GOV.UK in CSV format – like this data about spending over £25,000 from DCMS or this report on British behaviour abroad.
We currently have 7,422 CSV files and 17,062 Excel files on GOV.UK. That ratio is probably the wrong way around – CSV is a great format for open data because it doesn’t require users to buy any proprietary software and, more importantly, it is machine-readable.
The Open Knowledge Foundation calls CSV “probably the most widely supported data format in the world” because it’s supported by spreadsheets like Excel, OpenOffice and Google Docs, complex databases and almost all programming languages.
We hope that providing a user-friendly preview will encourage departments to publish more of their data in CSV format.
Releasing this new feature also highlighted instances of poorly formatted CSV files that aren’t really CSVs, because they don’t contain tabular data. So here’s a quick guide for editors on how to create a well-formatted CSV:
1. Make sure it’s tabular.
CSV is a very simple format that represents tabular data. This will consist of an (optional) header row, followed by data rows. For example:
Name, Role, Address David Cameron, Prime Minister, 10 Downing Street George Osborne, Chancellor of the Exchequer, 11 Downing Street
If you are trying to export an Excel spreadsheet with complex formatting as CSV, make sure you convert it to a simple table (header row, followed by data rows) first.
2. Make sure there aren’t any unnecessary blank lines or empty rows on the file.
3. Try and export your CSV with UTF-8 encoding.
All computer text files have a character encoding which defines how the bytes in the file maps to the characters that you see ultimately see on screen. For example, the ASCII character encoding set uses the numbers 0-127 to represent English characters. The most versatile and widely compatible character encoding is UTF-8, and where possible you should generate your CSV files with UTF-8 encoding. This will ensure that it renders properly in the preview. If you are using Microsoft Excel, you can choose the encoding of your CSV at the point at which you export it (this article explains how in more detail).
You can read more about the rules for generating good CSV files here.
Analytics show that users are already making the most of the new CSV preview feature, so we’ll follow up soon with a detailed blog post about that. As usual we’d love your feedback.
Stay in touch. Sign up now for email updates from this blog.
5 other GDS blogposts we think you might find interesting
GOV.UK one year on
Improving browse and navigation
Government as a data model: what I learned in Estonia (GDS blog)
Do we need British Sign Language on GOV.UK? (accessibility blog)