Doing Journalism with Data: First Steps, Skills and Tools
I don't think I ever really got started in data journalism one day,
but I guess I saw examples of data journalism
and wanted to work out how those techniques
could help me and other journalists save time or do deeper journalism
or cover stories and issues that weren't being covered.
And probably around 2005 when I saw Adrian Holovaty's work
on Chicago crime,
that was really when I particularly started getting interested
and looking at just automation,
not necessarily spreadsheets, but automating things.
So I didn't have to do them over and over again.
What is next for you in your work? What are you looking forward to?
The one area where I'm spending a lot of time at the moment is
web security, the ability for journalists to protect their sources,
to protect their information.
Now that might include data security.
But it's particularly if you're dealing with leaks from people.
It's becoming harder and harder to make sure that those people
aren't identified in some way.
So for example by publishing the data or through your communications.
And I think most journalists, the vast majority of journalists
are very, very careless and very ignorant
of just how public their communications are.
And obviously we know a lot now about surveillance
and the collection of information about journalists as well as everyone else.
So web security, I think, is the number one issue right now.
And that's something I'm very interested in.
The other area that I'm more excited about I guess, rather than pessimistic,
is the ability to tell stories effectively.
So it's one thing to have lots of numbers,
but the narrative, the telling of the story and doing that well,
I think is the next challenge once you've got the data.
How do you see the future of data journalism?
I have absolutely no idea what the future of data journalism holds.
Certainly we can expect more and more data
and as a result, I guess we can expect more and more data tools
and we can expect computers to get more powerful in doing things with that.
Beyond that, it's very difficult to tell what might happen
in terms of what those tools can do.
What commercial environment we're going to operate in as journalists.
I think there are trends in both directions.
Freedom of information laws are spreading to more and more countries.
But also there's a reaction against them politically.
So in some cases, there's an attempt to narrow the scope of those laws.
And there's also an attempt to broaden it in other areas.
Organizations and government are becoming better at dividing
being accountable under those acts.
There's more hiding, I think, of information.
So legally, I think the landscape is going to continue to change
both for good and for worse.
Scraping becomes easier, but again, I think organizations will get better
at making it harder for us to scrape that information.
And I think that the connection between data is particularly powerful.
I think the ability, you see it in a few examples,
like ProPublica's use of a Facebook login to tell you stories about
how your school performs on a particular story.
I think that ability to personalize data could be incredibly powerful in the next
10 or 20 years where by logging in through an account that has information
about us we can find out more about how a story or an issue effects us.
That also brings up new challenges for making sure we are connected
with the wider social issues and not just people like me.
What is your advice for junior data journalists?
The first thing I would say to a junior journalist who is interested in
data journalism is don't focus on tools.
Don't look at whether you should learn Fusion tables or Excel.
Focus on stories.
What is the story you want to tell?
What is the issue you're interested in?
What data is available in that area?
And what challenges does that data present?
So is it a case that you do need to learn spreadsheets in order to
work out an average or compare figures,
subtract one figure from another?
Or is it because that the data actually is a little bit ugly
and is a little bit incomplete
and maybe you need to do some cleaning?
Or is it because that the data is very clear,
but you need to visualize it in some way to tell the story?
So different stories and different issues will present different problems.
The best way to learn data journalism is to be guided by each story.
Start with very simple stories that don't present a lot of problems.
And then get progressively more ambitious as you want to tell bigger
and harder stories.
Don't feel you have to tell a big story to begin with.
Tell a very small part of it first.
And then tell another small part.
And then bit by bit you can start to build up the jigsaw of the big picture.
Often that's how big stories evolve.
They don't come out all in one piece.
They come out bit by bit and then something happens.
A threshold has passed and we get the big story.
♪ (music) ♪