So we decided, we were playing with the idea of a data to narrative site. My background is in local newspapers and I started my career in local journalism, but spent most of it working in television news at the BBC and ITN. My partner Alan Renick came from the more business and strategy side of the news industry and had worked for Trinity Mirror and most recently had been the publisher of the Bristol Post.
We met during consultancy work which I did for ten years, helping people to re-organise the set-up of these operations. We met on a project in Romania doing exactly that. And the conversations over the years developed into conversations about the future of local media and what could be done in the local media space, how local media would survive in difficult financial circumstances.
We came up with a challenge to ourselves at the start of the year. Could we try to re-invent city media? And if so, how would we try to do that?
It seemed to us from observing what’s happened to the media industry, that a lot of focus has been on technological change. A lot of the focus has been on distribution change. Very little focus has been on any changes to the source of news or the production process of news.
Despite what many people in this room are doing, and what we’re now trying to do, a lot of local journalism still involves individuals going out and chasing ambulances and trying to find stories from select small numbers of sources. Or perhaps reading old press releases and information that they are told in a more constrained circumstance as financial pressures push down reporter numbers.
That seemed to us to place local news operations in a difficult position for being able to actually report. So was there a solution to that which just didn’t mean just ploughing more money in it? Or was there another approach to doing that, to the source and to the production cycle of generating news.
So that developed into a separate question, which was, could we base a newsroom just on data? Instead of sending reporters to talk to people, could we do something just based around data, which would stand as a news product, would that work? Could we replace conventional sources? We find another way to get information which was newsworthy and so therefore useable.
Could we find a method that was easy for us to do? There were only two of us, so could we find a method of trying to generate news out of a data store? Could we find a method that was cheaper for a news operation to do? Mostly around the cost of employing people and the throughput of work they had, so could we find a cheaper way to do that?
And importantly, which was slightly ambitious at the start was, could we make a scalable model? So I am not here as the [hyper lonely 0:03:50] person in a way, I am here as a data person I suppose, but only in the last six months am I a data person, I am here as a journalism person.
We were interested in scalability as well. As to how you can invent something at a low level and then grow it widely across much larger geographies. Because one of the interesting problems of local media it stays local, it’s difficult to export what it does without duplicating what it does. So how can you do it without duplicating it?
So these were the conversations we were having about whether you could do that. We were both interested at the time in the growth of data, in the sort of things that David was doing at the Mirror. I mean the growth of the smart cities movement where more and more open data was being made widely available, at least to people who wanted to use it.
So we decided that we needed a geography to start it. I’m in London and Alan’s in Bristol, so it was going to be one of those two cities we sort of decided to start in. In the end we chose London, we both knew London. I still live there, Alan had lived there. The London data store, which is the GLAs main portal for all its published information has in excess of 600 datasets and growing.
Bristol had I think it’s about 150 or something like that, so we thought, “Let’s start where there’s somewhere where there’s more information available and potentially a bigger audience.” So London was the place to start.
So essentially the task we had was to take this, which is full of Excel spreadsheets and that was a key factor for us, that the data is quite well organised and quite clean, it’s been sorted, and turn it into a narrative product, turn it into a story that you can read on a tablet or on a mobile phone. So how would we go about that process?
It seemed quite a simple idea, but we also thought that quite a lot of local authorities are doing this sort of thing. We talked a bit about open data throughout the day. About how more and more local government entities are making data available, and they’re using taxpayers’ money to do that.
So the data they are producing is very useful to help them run the local authority. It’s quite useful if you want to develop an app about the local bus times. But outside of that small group of people, how many people are getting a value out of that?
So was there a democratising thing that needed to be done here, which is, if this is how the local authority says it is doing its job, how many people can see that and can we make it more transparent. Can we have the data, not just open but accessible for writing stories, which is what we do as a journalist, so we will try and do that.
We were also interested in the idea that what we were trying to do with this was to create a more systematic product. That the nature of news is [for random 0:07:08], some things are news because it’s odd. Some things are news because it’s out of the ordinary, generally. The front page story is not common place, the front page story is exceptional, and that’s a definition of news.
But we thought there was something more about how to tell the story of a city, which was based not on a single random event or one interview, but on trying to quantify and sort large numbers of occurrences that told you a slightly different story about how a city was. And that’s all in the data, if you can find it, so could we find it?
We drew up a simple line here, but we thought, “Oh my God, there’s a whole load of pain along that line in being able to do it, because we’re not data scientists.” So Mark has been indicated as the data scientist in the room, it seems during the course of the day,we all need to build something like the Leeds Data Mill.
But we didn’t have those skills, we had some basic Excel management skills and basic journalism skills. So could we do this? But then we thought if we do that, it is still the news. So we immediately thought, are we running the risk that what we are interested in particularly is a lot of information which people aren’t using. Well are they not using it because it is too mundane to be the news? Or are they not using it because maybe they haven’t noticed it, maybe they’ve not got the wherewithal to go into it? We will run a risk here by producing a not the news site? So that was a concern of how we thought about this.
So risk one, we have got to have a skill base, risk two, what would the product look like? Will it be newsy, because we’re news people and we are trying to do a news site? So we agonised quite a lot about that, the two of us, about how to make this work. We did various models and various try out days and we thought about it and experimented.
We got nowhere frankly. So in the end we thought, “Well we better just try to do it.” So quite simply built a WordPress site and started writing and started publishing. And doing it out in the open.
We did think of putting Beta on our site, but if you go to Urbs.London you won’t find me, so we thought it would just be cowardly really. Either you’re on or you’re off. We thought, “We’ll just do it, we will just show our drawers a little bit here. And we will just go out and try and do this.” In a sort of Kevin Costner, if we build it they will come, sort of ‘Field of Dreams’ idea, we thought we would try and build a news product based on data. Largely the London data store but obviously using lots of other London datasets in national and regional breakdowns that are available.
But this was going to be the bedrock from the start in our systematic telling the story of London.
So we then thought, “Well what sort of content will we do?” Where do you start? There are 600 datasets alone in there, do we just do the newest one? Or what shall we do?
So we decided that we would try to focus on some of the bigger things around living in a big city. So housing, jobs, health, transport, sometimes they’re sexy, sometimes not. So what could we do with this and how do we focus around the bigger things?
So we just basically bailed in and decided, “Well let’s start with housing, let’s start with now do something on transport.” And we dodged around quite a lot, because this from the outset has been experimental. We wanted to try out different depths of data, so different types of datasets and different types of content and topics that we could get out of them.
Some of them very well organised and easy to handle, some of them far more complex and taking lots and lots of time, which was very frustrating. Because once we started we felt a little bit on the new treadmill as it were, that we were a new site, so we had to publish. And with two of us working not full time on it, it was quite hard to sustain it, but trying to get a broad sweep of stories across the big topics of living in a city.
We tried to think of a methodology around this, but I did find it sounded a bit fancy, because we were journalists, we just needed to take some numbers and tell a story out of them, it’s quite a simple idea.
But we wanted to try and build a production methodology around this to give us some focus about, “Well there are only two of you, you are going to have to stop and think of it, what is it we do again?”
So we tried to build a methodology and the basic idea was, single data, multiple story. If we’re going to spend more time digging into something we want to get a reasonable amount of value out of that in terms of content.
So take something like lots and lots of crime data and what can you do with it? Well you will get some overview stories about big trends within crime across the capital. Then you will get more intricate, something to do with about the theft of dogs. And you could do something where you drill down into individual segments of the audience.
Then you could get into smaller geographies and with crime data it was very easy to do borough level pictures. There are 32 London boroughs and one around the city, so suddenly you’re giving yourself quite a task if you are writing not one but thirty-three stories. But we tried to have a go at that. And in fact crime data is done in London down to ward level. So you could drill down to a very, very localised level on crime data if you wanted to.
So we decided to give [this sweep of 0:13:05] stories that getting into this, putting effort into understand this would give us a cascade of content that worked at various levels. So that any reader in London might be interested to know what the murder pattern of London looked like, and whether you’re more likely to get bumped off in Enfield than you are in Harrow. A small number of dog owners might care about that or bicycle owners, but then everyone in Tooting might want to read that. So it’s about trying to find that segmentation of audience. And that method, taking us through in a way to do a number of other things.
So everybody complains that their tube line is the worst in London obviously, and we wanted to find out whether that was true or not. So we got all the data on delays across 12 months on the London Underground and tried to map that into what were the reasons behind those delays? Was there a big engineering project on the line? Was it that no staff turning up? What’s causing the delays on your line?
The idea behind that again is, it sort of works in a sort of geography around the map of London but again it works in a sort of tribal segment of, if you live on the Northern Line you live and breathe the Northern Line and you know how bad it can be.
So trying to find a way to connect with an audience again, and trying to find those groups and segments, right drilled down.
The good thing about this is again, it came out of production, there was almost a template for story as to how to do it. So there’s this point, there’s this point, there’s this point and what does that bit of data for this line mean?
Although there were lots of individual pieces of content to do, once you’ve done two you were almost production lining them and you were able to sweep through across London to do that.
We then did a similar thing with some of the census data about who lives in London. A large percentage of London, 38% of the population of London were not born in the UK. So we looked at the census data as to just the range of nationalities that exist in London. London seems to have more nationalities than the United Nations has nationalities, which is very comforting. Including one person who was born on an aeroplane, which I am quite keen to find that person. We were then able to look at it at a borough level again, and to do these maps of where Londoners are. Again that worked on a geographical locality that happily was about these sort of people. But it also worked again in a sort of, it’s my tribe sort of thing. Some of these became quite viral, amongst people who come here to live from the United States, sending it back, “I’ve ended up with all the other Americans.” So nationalities are far more tribalist. Scots are spread all over the place in London, Irish less so, and Jamaicans are quite concentrated and Indians live in the other part of it. Generally, you can see these patterns.
Most of what we do is based on words. We are trying to tell stories. We use visualisation a little bit. We saw this map and fell in love with it. It was designed by, some people called After the Flood who are a design company. We got some funding from I think Innovate UK and future Cities Catapult and are developing a new map of London based on the periodic table.
So we liked it so much that we decided we wanted to have it, so we came to an arrangement with them to use their map, which can be fed with data and colourised, so it works very well as what David referred to as, does it instantly tell a story with one quick visual.
What we didn’t want to do was get involved in complex infographics which take time to interpret, because I think one of the problems in visualising data stories, is that sometimes those visualisations themselves require levels of interpretation. That that seems like a barrier to understanding, so we were back to the simplest way is to tell a story and to write the words.
We get this sort of, “This is London and this is what London is about.” But sometimes it felt like, “Well was it newsy enough?” And so, within the mix we can still sometimes try to look at a singular story and try and find something which worked more with a news culture bit.
So we were interested in discrepancies between wealth and poverty in London. So we looked at pensioners and we were quite surprised with the statistic, that 75,000 Londoners don’t even register for their state pension that they are entitled to, which is 7% of the pensioner population in London. Elsewhere in the country pension uptake is near universal, it’s 99% or 100%, everywhere else apart from London. In fact in Kensington and Chelsea only 75% of pensioners claim their pension.
And yet in another part of London huge numbers of pensioners are on benefit credit of some sort to supplement their income. So you have this very much tailored to London, where everyone is quite familiar with anecdotally, we’re trying to look at it in terms of the wealth and benefit levels at pension level was quite instructive.
Question from the floor: Did that come from the Department for Pension, the data?
It’s the Working Pension Credit, yes, so it was just a localised borough level breakdown on total numbers. So it’s quite easy to work out a percentage and then get this number, which was a very surprising number. Then we didn’t know we were going to happen on the story.
Quite a lot of what we do has been, you get this dataset and think, “I wonder what’s in there?” It’s almost like a, we just have a drill around or we try and have developing a hypothesis around what might be in there, just to try and direct us. And have the flexibility to change tack quite quickly when we see that’s not the case.
But to try to focus on what we’re doing in a – to answer a simple question, which is what’s the story here? What is the story I am trying to tell? Not to push the data to deliver the story, but just to try and focus how you interrogate it a little bit. Otherwise it’s difficult to know when to start, it would take longer and that’s the sort of process of understanding the mid-point data analysis and having just a journalistic notes I think, for sniffing something and thinking, “That feels a bit odd to me like we should investigate that a little bit more.”
I don’t know if you’re interested in housing and affordability in London. One of the risks we thought we had was, if we’re doing a news site, how topical should our news site always be?
Coming from a daily news background, for years I got quite anxious about this and wanted to spend a lot of time just trying to flow with the agenda, and that lasted for a few weeks until I was exhausted and realised I wasn’t going to have them with two of us on an experimental site.
So we had a slightly different attitude towards topicality, which is the role of us is not to try to do the story of the day. The role of those might be to provide some enlightenment around the story of the day. So that we can pick a big topic like housing and affordability and rents and property price, and we can dig into that to provide a bedrock of information about what’s really going on behind the scenes there, which on any day when the story is running becomes very useful.
So we drilled down into looking at affordability in the rented sector comparing London to national averages and regional averages. With the notion of what is the level of premium you pay in London for a certain type of property above the UK average?
And in fact we found quite quickly that it wasn’t single people or young couples who were really getting it in the neck about it, it was families. It was if you needed three or four bedrooms you were really going to be paying a huge premium on top of the UK average. So a big story developed but also some useful more localised stories developed, which told you what the average price of a two bedroomed property in any borough in London was, that was mapped as well.
So there was a resource about rental affordability there, which could be quite easily written and informed the on the day story, whenever it happened and allows us to re-spread it as it were.
One of the things we also had to consider when we started doing this, was what’s the house style? How are we going to do this? We thought that the best thing to do would be to write short. That if we were trying to get to an idea that we want to take a lot of complex things and make them simple. We want to try and make them simple in a quite small number of sentences and not write a grand magnum opus about something and not do every…
So editing and selection is quite important. We roughly aim for about 300 to 400 words for a story on the notion that the audience for it is probably best on mobile, so we wanted to write quite short.
Again some visualisation but limited, and the test for the visualisation, “Would it work on my phone?” Well just about, and if the visualisation doesn’t work, does the story still stand perfectly well without the visualisation?
Looking at the test of that is that every story we’ve written you could just do on the basis of text, you wouldn’t need the visualisation to give you another layer.
Also because of the shortness of our stories, was there a sense that you weren’t getting the whole picture? What we tried to do with that was make sure that we linked very thoroughly within the stories, because in doing the news this way as it were, what we were trying to build was an inter-locking picture of little stories, which gave you the bigger whole.
That one story wouldn’t do it, but if you read four of them you can get a much bigger picture. So that linking and connecting all through, and sometimes across themes.
How a story about transport might affect a story about housing, trying to connect those things in the way that London as a city, the way work is inter-connected and trying to connect our stories to tell a bigger story of a city. Which sounds quite ambitious but if you just start with small building blocks the Lego does seem to click together and you begin to get some insights from looking at one thing from the other.
So linkage and trying to make those connections was very important in what we were trying to do.
The other thing we thought in the style of what we were trying to do, was that we would like to keep it quite short, it might be that people wanted more on something. This was not the end word on the subject, so always linking directly to the source out of the story was hugely important. I hope that everything we’ve written, that you can find the spreadsheet we worked from in every story. There might be broken links in there, but I am hoping that they all click through to the relevant spreadsheet or the place we got that information, so that anyone who wants to know more can go and verify for themselves.
Because it seems to be that we now live in a media age where authenticity and verification are becoming more and more important as people select what news they trust. They either get that from a brand value, and we have no brand value, because it’s just me and Alan, we’ve only just started. So we have to build that trust by, I think David used the phrase, ‘show your workings’. And professional journalists are not very keen on that. I wasn’t very keen on that, but it seemed like an obvious thing to do in this sort of project.
But if I was saying it was all based on verifiable data, and one sort of like, people having to guess what it was I used, but here it was, and that was six months on.
If I read a story in a national publication I am hugely frustrated that they haven’t linked to the data stores to their story. Or not being even explicit about what it is. They haven’t described it very well.
So we thought that stylistically we should always try and do that to enable someone to get further with the story if they wanted to.
As I said whilst it’s been an experiment really, which has been going for six months. So audience was not really what someone who cared about measuring but we sort of did. We wanted to find if anyone was reading what we were doing, because it is quite soul destroying to write and to post into the ether, and to think no one ever saw it. (Laughter)
And part of the learning curve was doing lots of it anyway, it was the learning process of how to do this and what we could do. But we sort of found an audience through Facebook. We haven’t marked it at all, it was just my social network and Alan’s social network and we now have, I think it’s 22,500 likes on Facebook over six months, which we thought was reasonably healthy.
Some of our stories have gone in quite a lot of page news. We found that doing the more serious core topics of the city has worked far more successfully for us, than trying to do something light and frivolous. It’s almost like people expect us now to be that sort of data journalism, serious stuff, this is where I get my information. Not something trite or something much lighter.
I have written something on cat lovers. I mean I’ve written something on the cost of football across London. They got a little bit of traffic but nothing compared to doing something using national insurance registration numbers on migrant workers across the capital. Which to date is our most successful story.
So the Facebook thing that has been important to us, but we’re not too worried about it because we haven’t pushed to try to get this somewhere yet, we are trying to work out where it should go.
So as I said, we have been going now for six months. We published our first story at the end of April. So we have had quite a learning curve, as we came to data. So I thought, “Well what have we learned over the first six months? What am I taking away from this as an experiment? Is this experiment going to lead anywhere?”
The first thing was, I think David touched on this with the ONS data. That there’s an enormous amount of information which seems to be largely untapped, that people just aren’t looking at it and using it. I don’t know whether that’s because of a failing by journalists. Whether collectively as a profession you’ve got used to being spoon fed and not going back to the source. Or whether the idea of a digital source seems so difficult for us to cope with that we can’t tackle it.
But there are lots of things that are easily available and worthy of inquiry we found, that other people weren’t touching. So it has given us some opportunity to take our time about things sometimes because we know no one else is looking. It’s incredibly frustrating when suddenly they did look or that a press release went out about something we had been working on for a couple of days.
The other test I suppose was that starting point, was this worthy of a news site? Was it too mundane? Did it register as news? And I think we found that there’s a lot of things in stuff that was being ignored, that actually do tell you quite a lot about how a city works.
It may stretch the definition of news into news and information or news and useful information. But in a way useful information is one of the definitions of news anyway. So I think I’m pretty happy that there is enough interesting content to be dug out of open source data, even just made available by local authorities.
Blissfully the data was easier to handle than we feared it would be. Now that’s probably because we’re just skimming the top at the moment in the first few months, and we can see there’s a lot more that we want to do, that will require some level of expertise.
But the more positive thing about the early experience was, with just a basic modicum of Excel skill, we could excavate quite a few stories to start with. Occasionally we had to bring someone in to try to clean up some of our CSV files for us and to date when we needed to do that, we’ve used People per Hour, and found someone with a skillset who can do a job for us in half a day, and that’s worked fine for what we are trying to do at the moment.
Now the last part was, that I don’t think what we do would ever replace conventional media or not totally, because we are not seeking to do that for a job. So in trying to do something which is based on take data and translate it into narrative, we decided that we would try to be as objective as possible and not to pick up the phone and chase comment, because that wasn’t part of our experiment of doing things. That was just journalism we’d done for years in our career. It’s one of the tests of whether we could get a quote out of somebody.
So we didn’t need to do this, we could get something which was sufficiently branded to publish, but maybe the starting point for someone else to jump off and chase the reaction to that. It’s maybe something that Urbs will build to, but it’s not something that a two person operation need to do at the outset.
In fact it’s been slightly liberating and more purist, that we just take the data and translate it into an as objective as possible narrative. And as a journalist it’s never completely objective, because what you leave out is what decides how objective it is. And you’re always selecting and it’s a written form, so we are always trying to put a nose on a story. But we’re trying to do so as neutrally as possible, and I don’t know whether we have succeeded, I hope we have succeeded in doing that.
So that’s what we’ve learnt. The next question will be, what do we do next with it? Six months in, with you know 22,500 people who like to see us post something and the idea that you could build a data driven news site in a city like London.
When we got under way, obviously world domination was the aim, in that if we could do it in London we could do it in any city in the world that had a reasonable supply of open data. Going back to that idea of scalability, we didn’t need to be in the city to do that.
So Alan and I could both do Bristol, because he lives in Bristol and he’ll know the difference between one area and another and he will know nuance of the city. But potentially had we had this vision of a room about this size, in the middle with this group of data analysts who crunch the spreadsheets and translated them a little bit. I saw like little groups of hacks from various cities around the globe, like the United Nations.
And from such a room we could run an international Urbs operation providing city websites for numerous cities in the world.
Part of me thinks there’s still an ambition around that, whether the financial model would work on an advertising funding model or potentially in London where you’ve got eight and a half million people, in a smaller city, well maybe not, I don’t know. That’s what we’re trying bottom out at the moment, to see whether the direct consumer approach is a viable option for us.
The other thing we’ve looked at is, whether Urbs is not a consumer product as it were, or a service product, is what we’re doing more of an agency supply for people who don’t have the capacity to do it. Can what we do provide a baseline bedrock for other people to pursue stories? Can we offer leads? Can we offer resource and background to understanding an issue and to look at the potential model of that? We’re investigating that.
One of the other options we’ve looked at is whether going back to that idea that a local authority publishes a data sheet and thinks it’s in the open, but there’s a demographic deficit around that. Is there even a service to be provided for local authorities, which is to turn their data into a readable format for people who pay their taxes in that area?
We tried an experiment with one local authority around doing that for them. So there may be things to do with that.
We have a number of different avenues, verticals as they sometimes say in business, which we are investigating. Is to how what we’ve learnt with Urbs as an open city data to news narrative can take us. We’re hopeful that one of those will mean that a six month experiment can turn into some form of sustainable business.