What Is Data Exploring The Question That No One Asks

Atom paths

There is evidence that people fundamentally differ in their understanding of data. Three (implicit) philosophies speak of data as objective facts (measurements), as subjective observations (records), and as communications (signs). An example explains what this means in practice. The case study below is adapted from Brian Ballsun-Stanton’s PhD thesis Asking about data. He is an experimental philosopher and works as a data architect. I thank him for his permission to use it.

I am a practitioner. Why, then, engage in philosophical questions? It sharpens your mind. You can use fun words like geist and epistemological. And, it can offer clarity in discussions about data and may help to resolve “silos”.

What is data?

Brian Ballsun-Stanton writes in his PhD thesis Asking about data: “I noticed that practitioners use and understand the term ‘data’ differently than the people they are helping.” [1] He describes three philosophies of data. Before we explore them, here is a recent case study.

Imagine you are a tech team building a new online platform from scratch. Your umwelt (environment/surroundings) is data. You have great financial and intellectual freedom to build something that reflects the needs of users.

A main feature of the desired architecture is often vocalised as “everything is data” and “we want to employ all the data”. But what do the parties involved mean by this? Would different concepts of data lead to different platforms?

“Everything is data”

For purposes of this case study consider three team members. Mr O, Ms S and Ms C have different backgrounds and depart in their inherent concept of data.

Mr O is part of the executive team and understands data as “facts”. The platform, for example, should deliver objective numbers about page impressions. He admits we could collect and analyse all sorts of data, but is least comfortable with unstructured text. Mr O would say: “Data are measurable, small, recorded descriptions of the world.”

Ms S is part of the commercial team and acknowledges the value of relevant metrics. However, for her, the intangibles of user satisfaction are equally important. She is interested in how to decode human communications and would be happy to work with purely qualitative data. Ms S would say: “Live music is also data.”

Ms C is part of the tech team and recognises all digital records to be data. She considers data to be encoded human communications or encoded information. The online platform exists as a digital product, so Ms C understands everything is data in the most literal and exhaustive sense. Ms C would say: “Data is a bucket for information and knowledge.”

How people expect data to work

For Mr O data are objective measurements. He trusts in statistics to extract meaning from data and would feel least at ease with a database of unstructured text without the benefits of a semantic analysis. Mr O may be aware of meta-data, but he sees them as separate records from the actual measurements, thus, secondary. Mr O wants data streams that can directly feed into dashboards.

Ms S, as someone who holds data as subjective observations, is nominally “okay” with “anything” in the database. Ms S especially dislikes a database that is not internally consistent. If one blog post has a second title there is no point in recording that as a separate field. Only if reality changes, that is most blog post have two titles, the database structure should be updated. Ms S wants a flexible commenting system like Juvia, where visitor comments are recorded as data.

Ms C considers the database a repository of human interactions that is curated and made discoverable by other humans. Any design that interferes with that curation or hobbles the maintainer is discouraged. Because for Ms C the database carries value in itself, she is comfortable with whatever best supports human communication. Ms C wants every aspect of the platform itself to be accessible as data, for example, as machine-readable json.

Three philosophies of data

Ballsun-Stanton’s explorations identify three philosophies of data. In his studies people describe data

  1. as objective facts, measurements revealing the relationships of reality (Mr O);

  2. as subjective observations, sense-impressions filtered by knowledge (Ms S); and

  3. as communications, a container for meaning (Ms C).

While individuals may not “have philosophies”, they express their understanding of data through language and actions. For example, those who treat data as objective facts will treat of a spreadsheet full of temperature measurements as data, but would not accept subjective description such as “warm” or “hot”. Rather than being distinct categories theses concepts are on a spectrum. I encourage the interested reader to dive into a short, readable paper Asking about Data for more.

Quirkafleeg

If you haven’t guessed it by now: in the case study we are, of course, talking about the Open Data Institute and the new platform, codename quirkafleeg. Our internal communications took into account all three philosophies. Being aware of different understandings of data avoids silos and serves our motto Knowledge for everyone. We can admire a platform that looks great and is built in an open way.

We can also answer the question “what is data?” not by personal philosophies but with a pragmatic claim.

Data is the raw material of the information age.

[1] Ballsun-Stanton, B. (2012). Asking About Data: Exploring Different Realities of Data via the Social Data Flow Network Methodology. Doctoral thesis, University of New South Wales.