The pull of Open Data

How easy is to extract value from open data and open government data?

If we take open data as the centre of the ecosystem (which in itself may be a challenging idea) then we could imagine it has a certain gravity. A level of attraction, that pulls people in. Trying to understand how ecosystems build up around open data is about the natural order that falls into place as a result of that pull.

The strength of the pull is based on a number of factors that could be best summed up as the affordances of open data. In other words, what do you think you can do with that data. So, the extent to which you are pulled in by the data also depends on proximity and how close you get depends on two factors:

  • You know the data is there
  • You have the skills and resources to do something with it.


Thinking about that in broad (and visual) terms has resulted in a lot of circles. But the challenge has been thinking about the relationship between that pull and the extent of the ecosystem. The result is the two versions of the diagram at the start of the post.

In both versions of the diagram we’ve been thinking about three main groups:

  • Data Users: Data Journalists, data advocates, analysts and others who have the resources and access to use data as part of their work
  • Data Intermediaries: Commercial and non-governmental organisations that aggregate data as a product or service
  • Citizens*: People who might feel the impact or consume the outputs of intermediaries and users, but don’t have the resources or skills to source and use open data.


These are by no means exclusive and the borders between them are not fixed. Data Intermediaries are also data users and data users can, through their work become intermediaries — think data journalists and the Guardian data store for example. Citizens can also become data users — a specific need might push them closer to the data. The challenge has been thinking about where these groups sit in terms of the pull of open (government) data.

In one version of the diagram, we’ve been thinking about the data users as those most likely to be pulled closest to the data source. In terms of a value chain, the data intermediaries come at a point between users and citizens, often offering a degree of interpretation and generic tools that allow non-users to extract value from the data.



In the second version, we’ve put data intermediaries closest to the source. It reflects the use of open data portals but also the growing open data ecosystem — think open corporates for example. It suggests that the mix of skills and resources amongst data users is broader than in the first model. Our research (and the work of others) suggests that the skills base of journalists and community news providers for example is at best variable but generally low outside of small groups within organisations. Data users are also more proximal to citizens (a journalist and their audience) which suggests a smaller step for citizens to become data users.

In practice, maybe both of these models hold true especially when thinking about how value gets extracted from open data. For the Media Mill project, the idea of intermediaries is really important. The project focusses on data from open data portals — Leeds Data Mill and York Open Data — and the key output of the project is an open source data dashboard Solomon. Both could be said to function as data intermediaries.


Models and analogies

Beyond the model, there are lots of analogies that could be played with here . Maybe we can begin to think of open data a little like a super-dense object (rich in potential value) and the actions of data users and intermediaries as contributing to an accretion disc of open data content and services. In that analogy, the journey to the centre becomes more perilous and difficult the further out you are. We could also think about them as ripples from a rock dropped in a pond. Those closest feel it most but as the ripples spread they get weaker.

What do you think?

Placing the emphasis on one group over another is something that people may agree or disagree with — both models suggest hierarchies. The terms users, intermediaries and citizens are also loaded and may not be right. But it feels useful to have a broad description of open data ecosystems that can reflect the relationships and interactions with each group and the data and each other.

Perhaps the last analogy to draw here is a visual one; it looks like a target. Feel free to take shots at it- we’d love your feedback.

*The use of citizen here means a)we don’t fall foul of the semantics of the use of the word ‘normal’ people. b) we reflect the civic and democratic value of open data