Shades Of Grey In Open Data

Jeni Tennison

null

While working on Open Addresses, a free and open database of UK addresses, we came across a problem: it isn’t always easy to work out who owns or can use data.

[Read the report: Open Addresses - the story so far]

Processing data can be a risky business.

Legal grey areas can create risks for organisations that rely on data-based products and services from third parties. These grey areas affect reusers of government data, large and small, as well as organisations relying on commercial products. If they don’t know exactly what they can do with the data they have, they may end up either exposing themselves to legal action or unnecessarily restricting what they do with that data.

If we are serious about building fit-for-purpose data infrastructures, or simply managing the data-related risks we carry within our organisations, we need to make law, contracts, and T&Cs clearer when it comes to data ownership and licensing.

We’ve recently found three areas where this ambiguity causes problems:

When developing Open Addresses, we ran into the question: who owns addresses in data that has been checked or validated using an existing address database? It isn’t clear if people can use addresses in data published under the Open Government Licence.
The new service from Companies House, as well as the free bulk downloads of company data, provide no guidance on how that data can be used. Without a clear licence, reusers have to rely on a judgement - that because Companies House have made this data freely accessible, they are unlikely to object when third parties extract or reuse the data they have collated.
We were pleased to learn that it would become possible to use UPRNs (Unique Property Reference Numbers) in open data. It’s only when you dig into Ordnance Survey’s policy guidance that it becomes clear that the use of UPRNs is still restricted based on how data is matched and what other data is provided with them.

Technology is built on ones and zeros, truths and falsehoods. UK law isn’t like that. Instead, each case is argued on its merits, based on legislation and the results of previous cases. There are many shades of grey.

New legislation necessarily brings uncertainty: because every case is different, good laws allow for the application of judgement. Legislation about new technologies can be particularly uncertain. For legislators, it can be hard to predict how the technology might evolve and how the law will keep up. The meaning of terms such as “reasonable” or “significant” are deliberately fuzzy, a common understanding of their bounds gradually formed through experience in court.

Despite having been first made in 1997, the UK legislation defining database rights has hardly been tested in court. There is uncertainty over the limits of the law. So the contracts, terms and conditions that we use need to be especially clear.

To help mitigate risks for users of data, organisations need to have clarity on their ownership of data through the contracts, service agreements and licences that they agree to. They have to avoid making accessible data that they should not make accessible, for example where it contains third party data that they don’t have permission to sell or publish. As well as protecting themselves, clarity on data ownership helps to ensure any business or organisation who uses their data isn’t running the risk of the data owner withdrawing access or pursuing them in court.

If you use a third-party service or product, your contract with the third party should tell you whether they own the resulting data, or you do.

If these third-party services:

correct invalid data that you pass to them — do you own the corrected data?
create derived data, such as the results of statistical analyses, based on what you provide to them — do you own the derived data?
annotate your data with additional fields, such as standard identifiers or geolocations — do you own the annotated data?
generate visualisations, such as a map or graph, based on your data — do you own the resulting images?
store your data and make it available in a structured form via an API — do you own the data that the API emits?

Good data providers will have standard terms and conditions which should clarify these questions of data ownership. Then you can be confident about how you can use the results yourself, and that if you make it available for others, you’re providing access to data that is safe for them to use.

Having a licence to use data created as a result of other data does not mean you own it. Ownership gives you complete flexibility to do what you want with data, whereas a licence may be revocable and is likely to come with conditions… conditions which may themselves be unclear or ambiguous.

Systemic impacts from sharing and opening data will primarily come from commercial products and services. In a competitive marketplace, ambiguous and unclear terms and conditions for the reuse of data lead to fear, uncertainty and doubt. We need to ensure all data is clearly owned and licensed, so we can all make the best of it, innovate and prosper.