At the ODI, and especially in the tech team, we love Github. We know it’s great for collaborating on code (millions of people and thousands of teams use it every day), but can it be used for data too?
When we collaborate on code, we use the Github flow model, that is we work on some code, write tests, then open a pull request to merge the code into the main codebase. We use Travis to run our tests, and, if the tests pass, the code gets merged.
This works really well for us, but translating this to data is a different matter. Generally when you collaborate on data, we don’t have an automated way of checking the data is in the correct format before any pull requests are merged.
This got me thinking, after our work on CSVlint, we now have a way of not only checking the formatting of data, but also checking if it matches a schema. Could we use the code we’ve already written to use Travis to test the data and use a similar flow to our code?
I decided to use my innovation time to find out - the result is Cid, a Ruby script which validates data in a Github repo, and can generate a new Datapackage and push this to Github automatically if the validation is sucessful. If you’re interested in the nuts and bolts of it, you can read the README, but I’d like to use this blog post to ask for your help to test it out.
As we know, at the time of writing, voting in the UK European Elections took place yesterday. The count won’t take place until Sunday night, so I thought this would be a perfect opportunity to try out Cid, and see what collaborating on data in real time looks like.
I’ve set up a repo in Github which will contain the data. There’s detailed instructions on how to submit your data in the Readme (it assumes at least some knowledge of Github, but you don’t need to be a developer to submit some data).
If you fancy helping, I’ve listed the regions on Github here, and added a link to where the results will be published (according to the Electoral Commision’s website) - please comment on the region you’re planning to submit data for, and once the results are announced, check that council’s website for the results, add them to a spreadsheet and submit a pull request.
Although the European Commission are planning to publish open data on the election results themselves, I’m hoping this will be a nice proof of concept for wider publication of collaborative open data, as well as a starting point for something around next year’s general election, which, so far, has never had a single point of reference for open data.
If you have any questions, please feel free to open an issue on GitHub or comment below. Happy collaborating!