In less than three years the Trinity Mirror Data unit has grown from two people to a team of seven, providing data driven content for the whole TM group. In this presentation, the head of the data unit David Ottewell, talks about some of the work they do and the tools they use.

You can find out more about how the data unit approach local data journalism in our case study.[/vc_column_text][/vc_tab][vc_tab title=”Transcript” tab_id=”236d7dfd-5dac-10″][vc_column_text]So this is repeating things. Like the way we said before, please cheat in the spirit of abandoning a, sort of, rosy-tinted outlook, It’s the greater hits. So I’ve just realised the slides and spreadsheets aren’t so great, is it? This is like…

So we started as a bit of an experiment two and a half years ago. ‘Data determinants’ was certainly a buzzword then. So the, sort of, fashionable thing. Not really being done here. My background is very much a, sort of – I was political editor of the Manchester Evening News and then the chief reporter. So it was a hard news background, and then I was on the newsletter.

But I was always interested in data determinants and I agreed to set up the data in the Trinity News as a whole. It was certainly the regional thing. So more the Trinity papers. And that’s, sort of, Manchester local, Birmingham, and Newcastle. A lot of smaller types. Two we also worked for with America. Originally two people – me and the deputy – which is a very, sort of, top-heavy structure. But we’d arranged to begin the case quite quickly.

We’ve expanded quite rapidly. We’re up to seven at the moment, including an encoder and graphic designer who have massively changed the scale of what we can do. Anyway, this is, sort of like, how we work and what we do when we get data. And I think all of this is stuff that you don’t need to work for a large media company to do it. It’s still – I think you’ll find we’re across still quite a small team.

So when I set it up, or when we set it up, right from the start we wanted to split it into work streams really. One is news. One is that you can have, sort of, data coming out, which is newsworthy in an old-fashioned sense, if you like. So news is this perpetual thing where it’s interesting on a given day and then the next day – what’s the old saying about ‘today’s news is tomorrow’s chips’? But there’s some truth in that. The, sort of, traditional news, it’s interesting for a bit and it’s not interesting anymore. And it’s kind of aiming to set the agenda. You’re kind of aiming to say, “This is an important thing. We want you to talk about that.”

But then we also thought the data was an opportunity to do something we call, sort of, ‘resources’, which is where you’re giving people the power to understand how they explore data themselves. You’re letting the user set the agenda. And you’re, kind of, trying to give them access to data sets. So you, if you like, giving them access to interesting things. So that might be stripping down a really complicated data set and making some sort of interactive, which allows them to explore maybe in a way which is useful for them. And that is obviously not as useful on a particular day. That can be useful whenever that person wants, depending on the data set.

I want to talk about, sorts of news. This is, sort of, tapping into my opening remarks. So the government clearly – the government puts out now, on any given day – well, it’s government. But government and governmental organisations put out anything between a dozen and, say, three-dozen data sets every single day. All of which appear on the statistics portal.

And it never fails to astonish me. It’s really good for me, because it means we get, sort of, free reign at it. But it never fails to astonish me how few of these data sets are actually picked up and analysed, because they’re absolutely full of really, really important public interest news. It’s a major source of where we get our news from.

And part of the way we work is that, obviously the work is an internal wire service. So we don’t publish direct. We, sort of, find news stories, we send that to various titles. And tailored to those titles, where they’re relevant to those titles. So there’s a bit of a lag between us sending them out often, and it actually appearing on all of our websites. So for example, it might need a lot more investigative, sort of, work and requires a real hindsight, or case studies, videos, etc., etc.

And the upshot of that for me is that I wanted to focus on the, sort of, explosive stuff that people aren’t doing. I don’t want to be part of a race. If it was a race, we’d lose. But almost always, it’s not a race with ONS stuff, because unless the Press Association pick it up, unless on some rare occasions the ONS goes to the press office themselves, we don’t really lose the race. Nobody else is looking at this stuff.

Just an example – I just wanted to go through a few things. These all came out on the 15th of October. We’re looking at the 15th of October. These are some that we didn’t do. This is stuff we could do, and nobody else did. We were too busy doing other stuff. This is not, like, stuff we did or had time to do. This is like an F class, if you like, that nobody was doing. A couple of examples: There were some figures out on museum visits by ethnicity, which had this huge gap between white and BEM people. I think this was London-ish. There was this, kind of, gap. It actually got bigger as well, on that site, which is quite an interesting thing.

There was data on wellbeing on Northern Ireland, which had health and life expectancy statistics. They’re not how long you will live, but how long you will live before you have a limiting inhibition. How long you will have a healthy life for. And again, there was a, sort of, substantial gap – a six-year gap in fact. Or maybe a six-year gap for men and a four-year gap for women, in terms of healthy life expectancy. And again, that gap had actually been growing. Now obviously that’s not very interesting to people in Northern Ireland. There wasn’t any good reporting on it. It might as well never have happened.

But stuff on the crime rates comes in, in England and Wales. From which, you could quite easily extract the number of cases in which not only was nobody found guilty, but the police just basically closed the case without ever identifying the suspect. And this was for a particular force. I can’t remember which one now. And it was in 80% of burglary cases they give you a number and say, “But I really don’t think you’re involved with that,” basically. But to be fair, 86.5%. That’s a story. Surely that’s a story? I mean, I know the police are very hard pressed at the moment. But that can’t not be a story.

And there’s something, which would have taken a bit more time, but would have, sort of, been very much worth doing. Some things came out on that day about the number of patients registered at a GP practice. This is just the rise. This is the percentage rise.

From the floor:  Can I just confirm that one to three-dozen data sets a day, you said?

On the statistics portal. If you go to the – I can’t show you actually. But it’s easy. Just Google ‘ statistics portal’. I’ve broken it down in different ways. So some of this is probably nationally. These I thought were the ones that were broken down regionally. So I did it further than that, so to either commissioning group level or council level or, sort of, societal level. We don’t tend to touch stuff that hasn’t been broken down, because primarily we’re at regional numbers. Although that – yes. There’s probably a tonne of stories there that we’re not even uncovering to be honest with you.

So this is just the percentage rise in the number of people registered at the GP, in one year. And in some parts of South London, it’s up very – 10%. Sort of, about 8-9%, 9%, or something like that. And in a lot of places in London, it’s a whole 5%. And the thing about that was that in itself might not be a story as such. But if you cross over that with another data set, which came out of a couple of weeks ago, which was the number of GPs…

… the number of GPs serving and B, the actual number of doctors working in those areas. Quite quickly, you can construct an, “NHS under pressure in these areas” kind of story. And significantly under pressure. I mean, that is – the number of GPs is going down in those areas, while the number of patients is going up by 5%. You can’t tell me that’s not something that people should be aware of, and maybe talking about as an issue as part of the debate on the funding of health services.

The ones that we did do, for example, that day: The provisional GCSE results came out for every single school in the country. Again, it’s one of the spreadsheets where you open it and it just looks terrifying. But if you know what you’re doing with a spreadsheet – and that is a skill which can be acquired pretty quickly if you want it – you can do great things with it.

So you’re not just doing things like – and this is the point about making data relevant to people. It’s not just a matter of saying, “Okay. What we did. And here are the ten top, sort of, areas – starting schools in your area profiled, always does well that  kind of story. But a bit of skill with programming and spreadsheets allows you to fiddle with the gadget. A pretty straightforward gadget where people enter the name of their school. It’s their resource. Immediately for all those people whose school isn’t in the top ten or bottom ten, you’re giving them a reason to be interested in that story, that they wouldn’t otherwise have. You’re enacting them with that data.

Just some – we would expect to get between four and five slushes a week, of which half would be from stories from, which nobody else is picking up on. That is some examples of those.

The thing which information, which the gentleman was mentioning earlier… It’s a hugely important source for us as well, because while – as I said, my primary concern is that there is an enormous amount of data, which is being put out, and nobody’s picking up. Nobody’s accepting. That bothers me more than anything, to be honest, with data.

There is also information that you can only get through Freedom of Information requests at the moment. And that’s – those of you who have used it will know, there is an art in submitting of Freedom of Information. It requires – there’s an art to knowing where. And making sure you don’t fall foul of exemptions. And often, there is an argument to be had where you do fall foul of an exemption that you don’t really think it’s legitimate.

Nonetheless, they’re really important. And one of the things that we have to do, which I think is worth anybody doing who is intending to use Freedom of Information requests to gather large-scale data, statistical data for example, but for a range of organisations… It’s to – well, two things. One is have… I mean, we for example, have lots of different mailing lists of different organisations. You know, every health body in the country, regional health body in the country, every university in the country, every police force, etc., every council.

Where if I normally send an FOI out to every council in the country and, sort of, immediately get 360 – not answers. But, “Yes, we got your email. But we’ll forget, because you’re not in those places.” The ultimate spam… But we do that.

And the other thing – the only point I’d make about Freedom of Information, is that as journalists, it’s really important that if you’re looking for that, sort of, large scale data and to make comparisons, to be incredibly precise about the data that you’re asking for. And often to… For example, if you’re asking for a numerical data set, to give an example almost in a table. You know, “I want it as a spreadsheet. I want it as a spreadsheet in exactly this format, please.” And then you’re able to make comparisons between different authorities at a time, etc., etc.

You don’t want to get 350 different data sets back that aren’t quite directly comparable. Like you’ve got all these financial years and everyone uses calendar years, and suddenly you find you can’t make, you know, a comparison between those areas. So it’s a hugely important source of stories first, and data for us. And I hope anybody who is following the latest developments, is doing everything they can to have their voice back to, sort of, preserve Freedom of Information. Because it’s incredibly important to us, and I think it should be incredibly important to anybody working in journalism. Yes.

So just some of the examples of things that we’ve done with FOI. This is quite an interesting one. This is pointed out if you ask lots of – if you ask the same things, lots of things, you can get lots of stories for different areas. This one here, this was one that we did while – it was just before some welfare reforms. There was a lot of mood music around from the government that was basically doing this, “Well, we all know that we’re on benefits, most of us. We’re scrounging. Nobody wants to work.” The government was making a lot of those, sorts of, dog whistle type, you know, noises that were trying to give that impression, before they cut those benefits.

We thought it would be quite interesting to find out if that were the case, or whether the case, as we thought it might be… In a lot of areas that the job seeking world was favourable. Yes. The job market is favourable. It’s very hard to get a job in lots of places.

So we FOI’d all the councils in the country, asking them, from their point of view, what were the five jobs that they advertised that got the most applicants in the previous year. How many applicants did they get? What was the pay? What was the job? What was the pay? And lo and behold, you know, jobs where hundreds of people were applying for it, it was a minimum wage job. Now you can’t say that 408 people – you’ve got that working for a job. They applied for it. They just had no realistic chance of actually getting that job.

And the other point – a quick point I’ll make about FOIs. It doesn’t just have to be – a lot of the time, you know, reporters will come to me and they’ll say, “I’m going to ask for this data.” And they’ll be asking for one number, or something like that. You know, “I’m going to ask the council how many times this happened, blah, blah, blah.” But of course, FOI is a lot more professional now. And one of the best uses of it – a lot of the best uses of it, I don’t see in asking for the documents, but asking for reports. And then the things like that, correspondence. Especially from government ministers. It’s always, always denied when you ask of them.

But you can also use it for much more large-scale data sets and data, which isn’t routinely published. So for example, we – every year, we will ask schools for their – or ask the councils to give us, for each school, the number of people who applied for – where the parents had applied for a place at every school in that area. How many people – how many of those got a place? And what was their first preference of the place? And that is an interesting news story, yes.

But we also use it to publish supplements and things like that, which list every school. It tells you how likely you are to get in at that school, depending on where you live. And that’s really, really interesting valuable data. And that’s not just asking for one number. That’s asking for a quite a lot of – a big data set. But what it yields is something much more interesting than a one-off news story.

And then stapling. I don’t know how familiar you are with stapling. But this is a really important skill to learn as a data journalist. Or, increasingly, I would say, as a journalist. Stapling is essentially using tools, which… Was scraping mentioned? Do I need to?

Chair:   No. It wasn’t mentioned.


There was no data scraping, kind of, thing? Okay. Soscraping is essentially – there are tools available, which… If you have a website, which has a lot of pages, which are structured in a similar way, the stapler can automatically extract a lot of that data. So an example of something we scraped yesterday. We do quite a lot on the MPs’ register of interests and MPs’ expenses, because in spite of the big battles to get them public they’re, kind of, not sexy anymore. So people seem to have stopped monitored them or being interested in what sort of stuff goes on.

All of those pages are available on the same website. Or essentially, as I say, register of interests. Name of MP, blah, blah, blah. And they’re structured in the same way. So there will be one dot, outside earnings. This is all the outside earnings. Two dot, travel paid for by, you know, a third party, blah, blah, blah. If you want to know how much money each of those MPs has got from outside earnings from media companies for example, paying them to write stuff, or from, you know, other companies paying them to do work, or for second jobs and things like that, you could open all those pages independently, copy, paste, and blah, blah, blah. And that would be fine. You would get the information. It would take you absolutely forever.

Or if you use a scraper, you can point at all of those pages and you can say, “Scrape everything between one dot, and two dot” – just the outside earnings. And split it in a certain way after account type or something like that. And you can set it to work, and within ten minutes you will have a spreadsheet, which will have the name of the MP, how much they earn, and what they earned it for. And that takes no time at all.

And suddenly it means quite complicated sets of data available in different parts, or where they’re on different pages or whatever, that would be very time consuming normally to go through to compile, you can do it in a matter of minutes. Incredibly powerful tool, which as I say, data journalists, but increasingly I would say all journalists, should be aware of. They should be able to use and see the possibilities of.

So this is just really fairly straightforward. But it’s actually – because, you know, missing persons, you’re actually… It’s some incredibly interesting cases of people whose bodies where found and who were never identified. There’s loads of really interesting cases. So we just spoke to – and we set this data to group them essentially by geography. Geography, and time, date, and things like that. And so the outputs of the spreadsheet were – instantly we have for every area, all the cases of missing people who had never been identified.

And it made the front page for, you know – essentially, that’s all the information you need. There were really sad human-interest stories for people, often in the same place. They were often, you know… So for example, in Huddersfield, there are two missing cases who were found almost at the exact same spot. In fact, a year a part or something like that. In the same spot, in the same river basin. And that was – obviously scraped data. But you would not necessarily have known that was going to be the case, or seen that was the case, unless you were, sort of, systematically going to go through it. So that’s scraping.

And then, kind of, just like the value of ideas. It’s another really important thing. For some reason, I think people think of data determinants, and think of reactive analysis type stuff. But actually, I think having really good ideas and testing them – having… Well, data journalism gives us – it’s often the tool to test out hypotheses that you wouldn’t have been able to do before.

We know where to find information on the web. If we know how to explore data sets, we can often answer questions that we might never have thought of asking. So having the ability to come up with those questions in the first place. Interesting questions. I would say it’s more important than ever with data journalism.

So this data doesn’t always have to be serious. Just a few examples of things that we’ve done along those lines. So we spoke the Met Office. Data going back 100 years to test the hypothesis, “Is Manchester really the rainiest city?” Every month they monitor rainfall at monitoring sites around the country, and have done for hundreds of years. And you can get the data. And you can go back and you can make some comparisons, and whatever.

And so we did this stuff – I think about 100 years worth of data. And lo and behold, no, Manchester is far from the rainiest city. Preston is. (Laughter) But above that work, Cardiff was the – where you have the Western Mail. You kind of said, “What a fun night for them.” It’s like, one of the wettest. Which people then, kind of, thought, “Yes, of course we’re the wettest city.” In Manchester, no one did this line as well, which was, “Shock. Manchester not rainiest city,” basically.


Floor: It’s lies. Damn lies and statistics. (Laughter)


It’s 100 years worth of statistics, my friend!

But it just did phenomenally well. And just a really simple thing. But I think that most people probably wouldn’t think of doing it. It wasn’t like anybody else had done it. We just thought – we were having all of our regular ideas meetings, and this was an idea that somebody had. And it just – it was just nice, sort of, story.

One that was done at my behest, because it’s something that annoyed me as a former political reporter: Council reports. I don’t know if anybody’s tried to read a council report. It’s the thing they put in public that you’re supposed to read to inform the decision that they’re going to make. Utterly impenetrable, most of the time. Utterly full of jargon. And the worst of all is the council budget reports, which should be the clearest of all. It’s the most important of all. But they are the worst of all.

And so what we did, was we downloaded a number of those from large councils across the country. And then we ran them through a text analyser, basically, which worked out from the sentences, the sentence structure, the languages, how many years of education you would need to understand them. And then we compared them to long extracts from various very complicated texts.

And lo and behold, council budgets, on average, required at least a year or half a year’s worth of full education. The Wikipedia entry about integrals was a year more than A brief History of  time and so and so forth. And I think one of them was so complex, it technically would have required 18 years. It might have been Middlesborough. I’m not sure. But it would have required, like, 18 years of education, which is a slightly, sort of, funny point.

But it’s a serious point as well, isn’t it? In that, these things should be really easy to understand. If you go to a council meeting and knock around, there’s never anybody in the public gallery. Well, it’s no surprise. It’s no surprise people are disengaging from this. These things need to be clear. They need to be accessible. And it’s, kind of, making a serious point.

This is another one. We downloaded a load of sheet music for various singers and entertainers. And from that, you can deduce that the vocal ranges they’re using. The maximum vocal ranges they’re using. And so we did this for a large number of songs for a large number of singers. And we were surprised to find out that The Editors’ Tom Smith had the largest used vocal range. But we did it as a, sort of, socially shareable graphic. So, you know, we revealed one after another and so forth.

And again, you know, it’s a very, sort of, fun frivolous thing, if you like. But it’s something people really engaged with and we did incredibly well with that. Not least, because I think his wife or girlfriend is Edith Bowman, and she kept retweeting it. It was brilliant for us. You know, send it round the world with Bowman.

I mentioned resources at the start. And this is just one of the examples of what I would consider a resources piece. So this stuff isn’t used on the day. It can be used more generally. So yes, every year we do, kind of, a schools rating project. And this was based on the insight that the schools data – there is an enormous amount of open data. But the one you always tend to see is GCSE results. Obviously, the GCSE results. And that’s, sort of, that.

That’s not really a fair way of ranking the schools. Obviously, because there are – schools in posh areas, schools in affluent areas, will always do better. There are many other ways of rating a school. There are many other factors, which are interesting in rating a school. One of which, for example, is value added. Value added is really important. That shows the measure of progress that each pupil has made relative to where they started, which is important.

But there are also loads of other really interesting data, which are all relevant to having a school list. Which is pupil-teacher ratios, certain financial indicators are quite useful as well, and various things. None of them – you can get all that data. That’s all, sort of, scattered around the web basically. It’s really hard for parents to get all that and understand that really easily.

So what we did essentially, is take 24 data sets that we thought were important. We talked to academics about how we might weight them, and which might be more important, and formulas we could use to, sort of, come to some provisional overall rating type thing.

And with all of this – all of our, sort of, maths, if you like, have been completely in public. So we’ve, sort of, had an on-going dialogue with head teachers about, “Are we rating things correct? Are the things correct as you go up?” That’s a really odd thing for journalists. The old school journalist ___[0:23:26] was, “Don’t show you working. Never show your working.” But it’s very important to me as a data journalist that we show all of that. I think it’s really important.

So we did this, and it’s been phenomenally successful every year. Essentially we do it as an online thing, but it’s a print thing as well. And it always does really well. And if you Google pretty much any secondary school in one of our areas now, this page for that school will be one of the top results. I think that’s a really, sort of, straightforward thing.

But then also, it allowed people to explore all of the data. So this was exactly how we were doing the maths. And they got all those individual data sets in one place. So if they were interested in a school, they get an  understanding. But then they’ll also be able to draw down data, then to say, “And…”

Because this is a quantitative thing, not a qualitative thing, and we have Ofsted for the reporting. We always link through to the Ofsted reports as well. So we could make the point, “This is just what the data says. But if you also want to know what the inspectors say, or you want it in lots of detail, here you are. Complete your standard Ofsted thing. This isn’t another way then to supersede that. It’s just another way of looking at this. So that was one.

The other big one that I usually talk about is – well, the one project we did where we worked with the Commonwealth War Graves Commission, who gave us their record of more than 1 million war dead. It’s an amazing resource. They were compiled over, you know, decades by incredibly dedicated people. But it was never accessible in a really user friendly way.

Another thing that had never been done was – it’s essentially just a, sort of, dump of all these incredible records. So you have to, sort of, navigate your way through, usually from a… Their searches are, sort of – you can search by name. But it’s all tied up with historians searching by century, or badge number, for military stories and things like that.

And we thought, “Wouldn’t it be fantastic if for the 100th anniversary of the war – could we strip it down, make it simple, and maybe get tailored data for individual communities, and individual houses, individual cities?” Which was a lot easier said than done, basically, because it wasn’t – they hadn’t stored data in a forum where you could easily extract the address of where the person was or something. They had the data. But it was all, sort of, dumped in a cell in a spreadsheet. And it took quite a lot of complex spreadsheet formula work in order to reliably extract the places where those people who died were from – what town, what city? Who were the survivors?

Nonetheless, once we’d done that, and we had all that data in spreadsheet, our graphic designer was able to do, like, a page template. And then at the press of a button, we could generate pages for all that site, or do it in every town or city in the UK, actually, because we’d got them all. We could extract the relevant data from the spreadsheet, as per this page.

So it wouldn’t just tell you how many people died in Hartlepool in the First World War. But it also extracted the oldest person who died. A 67 year old who died in the war was from Hartlepool. The youngest was a 14 year old from Hartlepool, who died. And it gave you a bit of information about where he was born, his parents.

We’ve got these days. So these are the days when the most people from Hartlepool died, and the relevant battles that took place on those days. The most number of surnames in Hartlepool. The first and last…

There were a lot of really, really good human interest stories in that. I mean, a 67 year old dying in the war, part of that has got to be an interesting story. Although the journalists working for Trinity Mirror when we gave them this… Yes, they published this. But they also tracked down a lot of these cases, and it made really great coverage for the war centenary. It took work. But it wasn’t impossible though. It was an enormous data set. But if you know what you’re doing with data, if you acquire skills with spreadsheets and databases, it’s not impossible. And a lot of it isn’t even difficult if you acquire those skills.

And we also did a really simple search. So it was, like, a stripped down version of theirs. But based on the intuition, a lot of readers would just want to search by surname or their street, or their town, or a combination of those. So you could find out really a lot of – for example, who died in the war. Or you could find all the people who died in your street or your road, during the war. And it did phenomenally well. Was it 1 million times? A million times now? Something like that, since we set it up, which is good.

So just, sort of, lessons we’ve learned about things. Well, I think making data personally relevant is incredibly important. And it allows us to connect people to issues that they might not, certainly from our point of view, from, sort of, tabloids and, kind of, a major audience point of view, they might not automatically connect with.

An example of that is deprivation. So the deprivation figures come out every year. They’re really awkward, because they’re broken up by difficult to lie out areas. Small neighbourhoods, which were identified by a number, and then by local authority. Manchester 004, something like that. Which don’t easily fit names. They’re usually, “Oh, you mean the area around such-and-such?” Well, yes. But it’s actually more specific than that.

So you can’t actually name these places, other than Manchester 004, which means nothing to anybody. We show people on a map. But people are quite bad at small-scale maps. So if you show somebody a map of their, sort of, area and say, “This is Manchester 004,” they’re, sort of, like – they don’t know about it, or they’re not entirely sure. So it’s not really very handy data.

And also with deprivation, unless you’re one of the most deprived areas, how interested are you? I mean, we may be. But I can tell you the answer is, from a general point of view, “A little bit, maybe. I’d like to be. But I’m too busy reading about Kim Kardashian, or something like that.” So one thing you can do is to make that data not just about the most deprived places. You make it about every single place. And the data set, or the spreadsheet, if you’ve sorted your spreadsheet, you’ve picked out the top ten most deprived areas.

What do you do with the other, you know, thousands and thousands of areas? Well, why don’t build a gadget? Why don’t you build a gadget? I thought there was a hand up there. There wasn’t. Why don’t you just build a gadget? You can put all that in a gadget, where they can type in their postcode and there is stuff available to match every single person in the country with a singled out area. You can just download it all, and put it into the gadget. Why not do something like that? So people can type in their own postcode, and find out how deprived they are or aren’t.

So this was – was that the sixth..? I think that was the town hall or something. They manage this. So if you view the figures that they’ve used to calculate how deprived it was, often in that overall scale are the mostly deprived. But you can also give them – they also calculate deprivation based on individual categories. So how many people have a job? How many people have health, education, housing, etc., etc.? And you can give them all of that data.

Now you can’t write – I don’t know how many simply out of the area as well. Thousands and thousands. You can write an individual story for each of those places. But that’s actually really easy to make. And a lot of people – this story then became, like – this was briefly, like, the most shared thing on the Mirror website. Now to me, that is the moment where I punch air, because I think I have made people on the Mirror website stop thinking about videos of people punching each other at an airport, and celebrity stuff, and you know, who Manchester United are going to buy. And they’re reading about deprivation. And that has to be a win.

Now you could argue how interested are they in the issue of deprivation? Are they just logging on reassure themselves that their area is actually all right? But they’re still reading about it. And they’re reading this story. And that’s got to be a win. We did… Yes?


Question:   In terms – slightly before. How many – how long would you ask – would it have been on, say, a certain area within the – you’re writing about?


David:   Like, for an area like Manchester?

Question:     Yes.


David: We were just relying on the insight. It’s a bit like… The story that we were giving to Manchester would be, “These are the most deprived places in Manchester.” Or it might have been – I can’t remember – “This is an area, which has slipped into bottom.” It might have been a story about just the most deprived areas in that – in Manchester. Or if there was a better line… You know, say for example, Manchester used to have one that was rock bottom and it’s climbed off – a good news story. Whatever. We’d probably have focused on that. But, you know, 400 words or something.

But the way we marketed those stories with this is, This is the most deprived in Manchester. Find out where your…” You know, as part of that, “Find out where your…” You’re selling them on the interactivity. And then they get there and they’ll read about where the most deprived place is or something. And then they get to the interactive, which is embedded in the story somewhere, and they have a play with that. And then maybe they carry on reading. I hope they do. And then they’ve learned something, and they’ve got something they can share. And they’ve got something where they can send to other people and say, “Have you seen this?” And then more people are reading about it. Yes.

There is talk about, like, a build your own data. That’s also a data source. And then servers, and things like that. You have to be careful with – make clear that, you know, a lot of the time that we’re not demographically weighting them in a way, so that they don’t carry… You have to be careful about what they do, and what statistical validity they do and don’t have.

But one of the things that we would do in elections – and there is a lot of talk with election coverage generally that… And I think it’s generally true. That the politicians now – everything they do on the election campaign is driven by the [grit 0:33:08] on any single day. They’ll go to a place. They’ll have a message that they are going to deliver. They will take maybe one question from a local journalist. And they question will be about, “This local, you know, bridge is falling down, blah, blah, blah.” And they’ll go, “Yes. That sounds very interesting.” But then they’ll talk about the thing that they want to talk about.

And then before the journalist will say, “Well, what the hell was that? You didn’t answer my question.” They’ll be off. And it’s no bother, it seems to me, that the journalists are just taking sound bites and just sticking to the agenda, and don’t ask any questions at all. Looking of course, like, this is major politics. So what we thought was, “Is there any way we can, sort of, get some of that in the agenda?” So we set up a, sort of, ‘My Manifesto’ project, where we polled readers.

I mean, I would say our poll was really, really well promoted in the push, which was asking people – this was mobile friendly, people on the go. Essentially a poll, where people would say ‘male / female’, where they’re from, the age they were. And then we’d ask them certain questions. Some local and some national. But also lots of local stuff. And from that we did, sort of, deduce the things that they felt the most strongly about. The things where there was the strongest – you know, “This should be a really important election issue.” And from that, we were able to draw up, sort of, local manifesto for each of our areas.

And these are the things that we say… And a huge amount of people actually did those. They did have quite a bit of, sort of, weight behind them. We were able to draw up, like, a manifesto. And then when the politicians came around, it wasn’t – we were able… Some things to put in their face to, sort of, say, “This is definitively what local people care about. It’s not what you’re here to talk about. Local people want you to do something about this.” And it was quite successful.

And we teased slightly – it was very simple. And it was just to say… I mean, any manifesto were you have , “This is an amazing manifesto. This is really…” And I think it really is a brilliant manifesto that nobody could disagree with, or maybe they should have done. But it did allow us to set the agenda. It did allow us to highlight those issues and force the politicians to engage with it. And say what they’re going to do about it. So to some extent, it allowed us to mess about with the agenda and get people re-engaged with that process.

I’m not going to talk about this one. I’ll keep on talking just a tiny bit about visualisation with a purpose. So none of these are, like, the graphically best thing we’ve done. These are just things that we’ve done. But the point I wanted to make about visualisation with a purpose is that a lot of the time our, sort of, visualisations, which are very beautiful and some of them are designed to be interactive and let people – so that’s absolutely fine. But the most successful visualisations, kind of, tell a bit of story as a plant. And I think it’s worth being mindful of that.

I think the purpose of your visualisation is to engage people in a story, as opposed to giving them a way of exploring data. It’s really important to be mindful of that. We’ve put out some things that I consider very, sort of, useful and very intricate. People haven’t got them. Like, perhaps they haven’t understood them. They perhaps haven’t understood what we’re trying to say.

And there is a little bit of me that thinks that it’s, kind of like, a story that isn’t really a story. The best stories are – there is a really important thing happening. You can tell it really succinctly in the intro and then you can – you can tell them whatever. But a really good visualisation I think tells a story as well, in fact.

So this is one we did on council cuts. The data is actually again comparable data for 2010, the start of the coalition, and now. The comparable data on how much money the councils have either gained or lost. It’s surprisingly difficult. The government has some really very good spreadsheets with a hidden column. They put important data in actually – literally hidden columns. You have to unhide the columns so you can see them. It’s really weird and quite – I’m not a conspiracy theorist. But if I was, that wouldn’t make it any better. But you can do it. And we did it.

And we essentially did just a really simple Google map, where a council is in green has gained money. And this is in an area where you’ve probably all sorts of headlines, which are like “The council now has no money left at all.” These have gained money since 2010. Yellow has gone down a little bit. Up to 10% of their budget gone. And the red ones have lost up to and including 20%. So 10-20%. Now you look at that. And I think anybody who has any – regardless of the geography of the UK, there’s obviously a north or south thing. These are also all the major – they’re not… And it’s impossible to look at that and not think, “Those places have been basically”

Now that’s creative political decision to do that. But it raises a really important question. So that – I mean, that’s created from my personal account . That might be in my own space, from my personal account. Let alone from the group accounts that we also source it from. It was interactive actually that one. You could drill though, and put in the areas. Again, individual data. But that wasn’t why people were sharing it. They were sharing it, because it might reveal something, and because it was a really important story.

That reminds us very quickly of what happens if you skew – change the area of the different parts of Europe in order to reflect the number of refugees living in each of those countries, per 1000 of the population. And I bet if you asked a load of of our reader of the UK, it would be something huge. And the opposite happens. Sweden, roughly 16 times the size of the UK. I don’t know why they all stay in Portugal.  But it, kind of… Sorry. Say again? You’re translating it, yes?

It kind of tells the story. It shows that actually in terms of refugees, it’s Scandinavia and it’s Eastern and Central Europe, which is taking it all of the burden. It isn’t the UK. It’s certainly not true. It’s, kind of, just a myth that we think, you know, “Oh now, it’s…” We think, you know… Maybe they its the medias fault!. But it tells the story, in fact.

This is what happens if everybody at the last election who didn’t vote, didn’t vote at all, had voted for, what we’re calling the ‘I’m not voting party’. The ‘I’m not voting party’ would rule all those cities, which are white. And it just shows the power of non-voters in every election at a glance. Weird that actually white… It’s mainly areas which have the highest degree of – call it apathy if you will, or call it deliberate non-voting. I suspect a lot of it is just ‘can’t be bothered to vote’ or ‘don’t see any point voting’. But nonetheless, it shows the huge amount of power non-voters potentially have.

This is one we did when Labour were talking about maybe £2 million – a tax on £2 million properties. And there was a lot of talk, “That’s just a volume tax, isn’t it?” And we thought, “Oh, we’ll find out if it really is.” And it kind of would be. Because these are house sales on £2 million or more, over – I think it was a five year period, within the whole of the northeast. A tiny bit of Hyde Park. And we tweeted that out. And again, that just went absolutely mad, because it told the story at a glance. Whether you supported the tax or not, it kind of put a number on the tax.

And that one on the right, finally, is – that shows the percentage voting against Labour at the 2015 election, for which constituency, compared to the previous election in 2010. So in the red areas, Labour’s vote actually went up, in terms of vote share. Grey areas went down a bit. Black areas went down. I think it was 10 percentage points, or something like that. And to me, the correspondence said, “It sort of looks like you had just won on the Labour front.” And that, kind of, is what leaps out.

I mean, it shows you that whatever happened in Scotland with Labour, it wasn’t a minimal thing. That – the bottom picture, that’s what happens to a party which is getting its message, kind of, right in some areas, but not others. Up, down a little bit. The black, that’s an absolute – what happened in Scotland was utterly unique. And that’s a map, which tells that story at a glance. And that’s a map that you’d want to share. And that’s a map that can be used to push you into the analysis of the story.

I’m not going to talk about these different resources, because I’m out of time. But I will take any questions. Yes, just shout out, anybody.

Question: Can you tell us a little bit more about how the unit works in terms of using content for the group? So when you do something like the spread for the missing persons thing, it gets into the news story across several papers. Are you producing the content as a finished article? Or are you producing a, sort of, backbone of information for the individual typist?

David:  We work as a, kind of, insider wire service. So we’ll put out copy. And tailor it towards each title, where there is a good story for that title. And if there are any, sort of, interactive graphics we’ll put them out too. There is, kind of – the way it works is with some of the things we do… The weather one, what needs to be done with that? We’ve got the things that we do. So like, for instance, there are things, which are intrinsically controversial anyway. But we’ve done something, which shows that – I guess, that’s just double in Manchester, or something along those lines really.

We aren’t going to contact the council. Somebody needs to contact the council. The understanding is they will do that and that work, because they have the local contacts. So they won’t – so they’ll do that. We don’t explicitly state, “Oh, why can’t they..?” And it needs this work, on this one. We, kind of – I mean, it’s going to be… You’d expect those news editors to appreciate that. No journalist worth their salt is going to put out a story like that, without giving right to reply. So there is an element of trust, from my part. But I trust that they’re experienced journalists and that’s fine.

And it’s interesting that sometimes they do more than others. So some people might just give the right to reply. The interesting one on Manchester with that, is they also sent somebody out on the streets to talk to those people, to talk about their experience, and they did video interviews them. They talked to other charities who said, “Yes, this is massively understated on this thing.” And that’s, you know…

I mean, it’s really interesting, to me, to see the difference, because I think that a lot of this – some of the broader dailies actually did quite an in-depth data investigation. And it really benefits from that, kind of, additional work. Some of this, the local agents want to speak to the editors. “What do you need?” That’s a story, isn’t it? Unless you’ve got to try and track it down.


Question: I was just wondering – could you just tell us what are the tools that you use for data scraping?

David:  Data scraping? We tend to use OutWit.

Question:  OutWit?

David:  OutWit Hub. So O-U-T-W-I-T. Of which, there is a free version and a pro version, which is the one we use. Which is, like I said, not particularly expensive and fantastically powerful. It’s also worth – if you’re interested in scraping, there is a really simple easy to use, and completely free thing, called So, which is worth exploring as well.

Question : What was called again? Harvest..?

David: The second one?

Question:  Yes.

David:  ImportIO, as in, like, import-export, dot, and then I-O.

Question:  Thank you.

Question:    I’m interested to know what criteria you actually use for, like, homelessness. Some years ago – quite a few years ago, they’d instruct you, if you were going along counting the homeless people. And if the homeless person got up out of the gutter as you went along with your clipboard, you couldn’t count them, which was absolutely ludicrous.

David: A lot of the stuff that we’ve done on that – a lot of the stuff – the stories that we’ve done about homeless stuff… And I should say ‘rough sleeping’, because homelessness is a different thing. But in terms of what you’re talking about, which is rough sleeping, that’s still happening with it. They have a count once a year and exactly that. It’s when… It doesn’t. And I think with those stories, in the past, which were, “Are there only really three rough sleepers in the whole of Birmingham?” And that’s, kind of, the story. Where the story is that the data just cannot possibly be really true. But again, that is a really good example of one where that is going to be an big level of work.

Question: But do you publish the criteria that you use when you put these figures up?

David:  Pretty much. I mean, it would depend on the… I mean, that’s a particularly controversial one. I think there are others about GCSEs, or – what context is there? And they’ve got, sort of, all the results and averaged it out. But with things like that – crime statistics is another one. They have to be quite heavily caveated, as I think we all know. It doesn’t mean they’re useless. It does mean that we should make it clear where there are caveats in this. So we do relay it.

Question: Cool.

Question: I’ve got one actually.

David:  What’s that? Yes.

Question: With one – with three-dozen sets of data coming out a day, do you ever feel like you’re on the verge of a nervous breakdown?

David:  Well, we don’t look at them all. I mean, this is the thing. I have to news edit all of the deputy news edits in the same way. No newspaper contains all the news that’s happened in one day.  There is always interaction. So all we can do is look at what’s coming up, use the knowledge we have on previous submissions of several things, and then make a judgement based on how interesting / reporting news this is. And to me, it’s a combination.

I really don’t tend to – it’s a combination of things where we’re trying to set up interesting things and things always knowing that we have to get people to read them. So interesting things – probably the deprivation things, where it’s important. Things like that are really important to me. Where we can show that we’re taking something, which is heavily complex and people might say, thats worthy Certainly our audience, the tabloid audience isn’t interested. So I’d like to do a lot more. That’s true. But yes, there is a selection process, purely in terms of what we’re going to do.

Question: I was wondering how you might respond to the question that I’d like to have. Like, confrontational type media from, like, an editor’s point of view. Sort of, for example, a lot of the splashes there are about the shedding of the police and the council, and whatever. Very, you know, to the jugular.

David: I think that’s just – I mean, that’s just how newspapers have always been, in terms of headlines. But what was the question? What was the question you wanted to..?

Question: Just about, like, engaging with – about, like, negative and positive, and being, sort of, full-on, and…

David: I mean, to be honest, in terms of how it’s presented in the papers, that’s kind of a matter for local editors. So it’s not something I have to directly address. But my view is if I think something is important – you know, if I think something is… I mean, I know the process from the political point of view. And I don’t claim it’s – I make no claims for any… In terms of objectivity and things like that, you know, I don’t want to make grand claims about that sort of thing. But I think if I recognise something that I genuinely think is important, and I think I’m being fair about it…

I mean, I think you said something like, “These are really – these are genuine stories. You’re not making the stories up.” And I completely agree with that. I think, you know, let’s say for example with the county cuts. That’s, like, one county that’s at 50% now, and it’s a Labour county. And that’s happening under the Conservative government. And no other council is at anything like the figure, in terms of that.

That has to be a story. Do you know what I mean? Regardless of the politics, you know, it can’t not be a story that. And so with something like that – I think you go in hard with something like that. And I think that’s quite legitimate. I mean, I just do. And that’s, kind of, insane, for me. A good test is, for me, if you could say that your Labour council has done that to a Conservative council, I’d feel exactly the same way. That’s a story.

Question:  Right. (Applause) Certainly my story about the school submissions – the expert at the beginning. It makes a little bit more sense. But again, you can see that bounding decision, can’t you, in that, I think? But I think there is something that is definitely worth stressing, in terms of – particularly the way the data unit operate, is even though it operates at scale… Don’t confuse that scale with bags and pots of money and resources. I think, in terms of what it does across the range of scale its got, it’s a – in a very positive way, it’s a small mind-set, isn’t it?

David: I knew I should have done the free tools side of it.

Question: (Laughter) Just picking up on that point, particularly the OutWit – there is a very similar application called Tableau, which gets a lot of, kind of, traffic in a similar vein. Less of the scraping. More for the presentation. Tomorrow, as I say, one of the things that we’ve got tomorrow is Adam, who is talking about – a bit more of a workshop on the, kind of, investigative process. We’ll also – I’ve got the guy called Matt Burgess, who is going to come in and talk about FOI. About that process, and about how you can, kind of, start to engage with that. And picking up on some points David made about where you ask for data, and who you ask, and that kind of thing.

But there is space tomorrow that if we want to put a little bit of time aside to talk about basic scratching and scraping, I will happily do that as well. That’s part of the aim tomorrow. It’s to try and make it as functional as well, for people. Because I appreciate there is a, kind of, skills thing. So that’s tomorrow. Anyway, for today, five minutes. We’ll get Gary set up, and we’ll be on with our next bit of the presentation. So I’ll give you a shout back, if you want to ___[0:51:14].

