Ghost Factories: Behind the Project

This is a cross-post of a recent item I wrote for Investigative Reporters and Editors’ On the Road blog. “Ghost Factories” was perhaps the most fun, interesting and well-executed project I’ve done at USA TODAY, largely because the people and process worked so well. This covers all the moving parts:

*  *  *

In April, after USA TODAY published its Ghost Factories investigation into forgotten lead smelters, we heard from several people who wanted to know more about how the project came together — particularly the online package that included details on more than 230 of the former factories.

The following is an expanded version of a post originally sent to IRE’s NICAR-L mailing list:

Alison Young was the lead reporter who conceived the idea for the project. In late 2010, she came to me with a couple of PDFs showing a list of suspected lead smelter sites, which I parsed into a spreadsheet and plotted on a Google map for her to research. Then she started digging, as one of our editors said, “Armed only with faded photographs, tattered phone directories, obscure zoning records, archival maps, fuzzy memories of residents and shockingly incomplete EPA studies.”

Ghost Factories

 
In December 2010, she began filing the first of more than 140 FOIA requests. The requests produced thousands of pages of government documents related to the sites, and to catalog them she created a project inside DocumentCloud. The product was extremely helpful both for organizing documents and for presentation. Brad Heath of our investigative team would later use the DocumentCloud API to integrate metadata from the documents — particularly their titles —  into our database so we could present them online. He also used the API to batch-publish all 372 documents that were included in the project. (He did most of the work using python-documentcloud, a Python wrapper by the Los Angeles Times’ Ben Welsh that makes it easy to interact with the API programmatically.)
Continue…

NICAR 2012: Words and Nerds

Briefly, some recaps from my week at the 2012 National Institute for Computer-Assisted Reporting conference, held in late February in St. Louis:

The basics: 2012 marked my 10th NICAR conference, an annual gathering of journalists who work with data and, increasingly, with code to find and tell stories. It’s sponsored by Investigative Reporters and Editors, a nonprofit devoted to improving investigative journalism. Panels ranged from data transparency to regular expressions.

Catch up: Best way to review what you learned (or find out what you missed) is by reading Chrys Wu’s excellent collection of presentation links and via IRE’s conference blog.

Busy times: Our USA TODAY data journalism team served on a half-dozen panels and demos. With Ron Nixon of The New York Times and Ben Welsh of the Los Angeles Times, I led “Making Sure You Tell a Story,” a reminder to elevate our reporting, graphics and news apps. (Here are the slides from me and Ben.) I also joined Christopher Groskopf for a demo of his super-utility csvkit, which I’ve written about. And, finally, I spoke about USA TODAY’s public APIs and how building them helps newsrooms push content anywhere.

Award!: Our team was excited to pick up the second-place prize in the 2011 Philip Meyer Awards for the Testing the System series by Jack Gillum, Jodi Upton, Marisol Bello and Greg Toppo. Truly an honor.

Surprise Award!: At the Friday evening reception, I received an IRE Service Award for my work contributing 2010 Census data to IRE for sharing data with members on deadline and eventually for use in IRE’s census.ire.org site. Colleague and master of all things Census Paul Overberg also was honored, along with the NYT’s Aron Pilhofer, the Chicago Tribune’s Brian Boyer and others. Out of the blue and humbling.

On the Radar: I ran into O’Reilly Radar’s Alex Howard at the conference — the side conversations are always a bonus of these things — and he later emailed me some questions about data journalism. My responses ended up in two pieces he wrote: “In the age of big data, data journalism has profound importance for society” and “Profile of the data journalist: the storyteller and the teacher.”
Continue…

And In Local News … Editor’s Acquitted

So, you’re the 67-year-old editor of a small-town newspaper who also happens to do the books for a local businessman.

The local businessman’s not just your boss. He’s also the owner/landlord of your newspaper’s office, your residence, your son’s residence and your daughter’s business. You live in one of those in-grown places that dot America, a place where everyone whispers everyone’s business.

One day, you’re arrested. The charge: embezzling $9,000 from this businessman-boss-landlord.

The arrest happens in the middle of the day. Somehow, the local police chief decides to give you a perp walk in handcuffs down a main street of your little town, where everyone knows you and you know everyone. And, somehow, a freelance photographer just happens to be there, takes photos of you perp-walking, and sells them to a rival weekly newspaper, which of course publishes them.

You, the newspaper editor, say it’s all a mistake. Of course she didn’t steal anything … it was an accident!

The town’s in an uproar. Scandal! And on top of it a perp walk right in town for a 67-year-old lady!
Continue…

A Price That Minimizes Risk

Do pricing trends in music and books have any resonance for news and, in particular, investigative journalists?

When Amazon.com recently made a new album by Explosions in the Sky available for $2.99 for 24 hours, it caught my attention.

Until then, I hadn’t bought any of the band’s albums. I’d been mildly interested in EitS since it played an episode of Austin City Limits, but given my limited music-purchase budget, I hadn’t prioritized one of its albums over buying new releases by my favorite artists.

But $2.99 made it too easy. I clicked “buy.”

Later, I thought about the psychology of the buy. Why did $2.99 win me when $4.99 or $5.99 might not have? As I type, the price is back up to $7.99 for a download. Had I stumbled on that title today at that price, I would have passed.

But $2.99 hooked me. Why?
Continue…

Lessons From a Census Factory

After two months of processing Census data and writing about it here, I’m ready for a nice break. But before I go off to explore other topics, I thought I’d wrap this episode of Census 2010 with a look at how my teammates and I processed the data. My deepest thanks to my colleagues for doing such a great job. And many thanks to the journalists across the U.S. who offered encouragement as we shared our work with the journalism community.

*   *   *   *

On a Thursday afternoon in the first week of February, three of us from our newsroom’s database team gathered at my computer and tried our best to subdue the butterflies swarming in our stomachs. What we were about to do, we hoped, would not only help us cover the year’s biggest demographic story but also help journalists across the country do the same.

That’s because weeks earlier, somewhere in the midst of poring through Census technical manuals and writing a few thousand lines of SAS code, we’d had a bright idea:

Let’s share this.

Really?
Continue…

Free Software and APIs: NICAR 2011 slides

I had the privilege this week of speaking on two panels at the 2011 Investigative Reporters and Editors Computer-Assisted Reporting* conference in Raleigh, N.C. Here are the slides my co-presenters and I put together:

– “Free Software: From Spreadsheets to GIS” with Jacob Fenton of the Investigative Reporting Workshop. Here is part 1, and here’s part 2.

“APIs: Making the Web a Data Medium” with Derek Willis of The New York Times.

* Those of us with a few miles on the tires remember that the conference used to go by the name NICAR — for National Institute for Computer-Assisted Reporting. People still call it that.

Data Journalism and the Big Picture

The web-o-sphere this week brought forth a collection of opinions on the value of data journalism and the skills that go with it. To wit:

  • Tim Berners-Lee, he who invented the World Wide Web, told the Guardian that “journalists need to be data-savvy” and that “data-driven journalism is the future.” The story then goes on to question whether data analysis could ever replace traditional reporting.
  • The blog 10,000 Words declared that one of the “5 Myths about digital journalism” is that “journalists must have database development skills” and suggested that most journalists should leave high-level hacking to the experts.
  • Another site, FleetStreetBlues, opined that “amidst all this hype, earnestness and spreadsheet-geekery, here’s the truth about so-called ‘data journalism’. It’s still about the story, stupid.”

There’s been a bunch of reaction to these posts, including a few people pointing out a 1986 Time story that sounds similar to the one this week from the Guardian. And therein lies the problem with all three pieces: None of them benefits from a big-picture, historical perspective on data journalism — not where it came from, not how it’s changed and especially not the massive amount of ground the label covers these days.

We used to call it CAR

Back  when software came on 5.25-inch floppy disks, or maybe before then, the idea of using a PC to “crunch numbers” was christened “computer-assisted reporting.” These days, we call it data journalism because, along the way, it became obvious the old name was anachronistic. As Phil Meyer once said, we don’t talk about telephone-assisted reporting, do we?

When I got into the game — when Paradox was the desktop database manager of choice — our newsroom had a personal computer designated as the “CAR station.” While others worked on dumb terminals connected to a mainframe, I was surfing the web with Netscape and ringing up Paul Overberg for advice on Census data. I was the newsroom data expert — the guy reporters called when they had a spreadsheet on a disk or an idea to get data from city hall.

In that era — with database-driven web startups like Amazon.com spreading cultural revolution — it was easy to foresee a time when reporters wouldn’t just get the occasional spreadsheet but find themselves inundated with data. Thus was born (at least in my sphere) the drive to evangelize CAR in the newsroom. We taught Excel, we sent people to IRE boot camps, we set up presentations showing the kinds of stories journalists were landing with these skills. The message of CAR was about finding stories and using simple tools to do it: spreadsheets, databases, maps, stats.

Now we call it hacking

Soon enough, though, the craft began to change and so did the talk at IRE CAR conferences — especially in the hands-on classes and demos. In Philadelphia in 2002, the hands-on classes mostly covered Access, Excel, SPSS and, for the adventurous, SQL Server. Just a few years later, in Cleveland and Houston, the offerings included sessions on web scraping, Perl, Python, MySQL and Django.
Continue…

It’s Journalism, Yes. But Is It Art?

With journalism in the midst of a reinvention, there’s no shortage of opinions as to which content or practitioners will carry the flag forward. We’ve read enough about whether data is journalism, and we can fill a book with opinions on bloggers and whether what they do is journalism or not.

But here’s another question: Regardless of what you’re doing — writing, coding, designing — is it worthy of being called art?

On a recent trip to New York, we stopped in Mountainville to tour the Storm King Art Center. It’s a 500-acre sculpture museum with works by Maya Lin, Andy Goldsworthy and others who take simple elements and arrange them in fresh, surprising ways. We toured the fields, and we saw stone, glass, metal and earth all crafted into surprising shapes. The place is massive and so are the works. For example (click for full size):

Beethoven’s Quartet (front) and Pyramidian by Mark di Suvero:

Frogs Legs, also by di Suvero:

Storm King Wall by Andy Goldsworthy, snaking through a stand of trees:
Continue…

Write Better: Seven Tips For Journalists

Concise, clear writing is one of the journalist’s best assets. No matter which platform you’re feeding — print, web, mobile or a technology to be named later — good writing separates the amateurs from the pros.

Here are seven ways to improve your word skills. And if these whet your appetite for more, try Roy Peter Clark’s excellent Writing Tools: 50 Essential Strategies for Every Writer or William Strunk Jr. and E.B. White’s classic The Elements of Style. Also helpful are the sections on writing mechanics and grammar from the Purdue Online Writing Lab.


1. Put commas in their place.

You can solve half of the world’s comma problems by remembering this rule:

Add a comma between two independent clauses linked by a coordinating conjunction — and, or, nor, but, yet, for. An independent clause has a subject and a verb. Don’t throw a comma before a coordinating conjunction unless what follows is an independent clause.

Right:
The thief stole a television and a laptop, but he left behind a bag with $1,000.

Wrong:
The thief stole a television and laptop, but left behind a bag with $1,000.


2. Conquer its/it’s confusion.

Not knowing the difference between its and it’s says “amateur” the way Chuck E. Cheese says “stimulation overload.”

For the record:

Its = possessive; “belongs to it”
It’s = “it is”

Right:
The team lost its game by one goal.

Right:
It's a beautiful day in the neighborhood.


3. Keep sentences short.

You’re not writing the great American novel. You’re conveying information to readers. Stick to one or two thoughts per sentence. If you have more than two commas in a sentence, try to split it.

Cringe-worthy:
The Burkett County legislature voted Monday to add six new police officers to the county force, adding staff at a time when the county budget is already 5 percent ahead of last year's spending, a level that some activists say will add to a deficit, which at $250 million is already on pace to bankrupt the county by 2012.

Better:
The Burkett County legislature voted Monday to add six new police officers to the county force. The move adds staff while the county budget is already 5 percent ahead of last year's. The level, some activists say, will add to a $250 million deficit that's already on pace to bankrupt the county by 2012.


4. Be active.

Active-verb construction — sentences in subject-verb-object order — carries more punch. Although it’s not imperative to write every sentence that way, avoiding passive sentence construction adds punch to your prose.

Limp:
The mayor was struck by the protester's sign.

Stronger:
A protester's sign hit the mayor.

Notice, also, the substitution of “hit” for “struck.” “Struck” is a word often found in police press releases; others are “perpetrator,” “brandished” and “apprehended.” You don’t use those in conversation. You say “man,” “waved” and “caught.” Write the way you speak — you’ll sound less phony.
Continue…

The New Journalism Mosaic

Last week’s launch of The Bay Citizen, a San Francisco journalism non-profit that will, among other things, feed The New York Times’ Bay Area report, adds one more piece to a journalism mosaic that’s increasingly experimental, entrepreneurial and, dare I say, hopeful.

It’s pretty amazing, really, to see what’s emerged over the last several years. It’s the antithesis of journalism pre-1995.

Back then, news reporting mostly came in five basic flavors: newspaper, radio, TV, magazine, book. Now, enabled by technology, forced by the economy and in recognition of core declines, journalism’s finding a way forward in smaller, independent ways:

  • Non-profits spinning up to handle investigative or regional journalism. ProPublica, with its recent Pulitzer, is one of the most prominent. But there’s also California Watch (from the Center for Investigative Reporting), Voice of San Diego, The Texas Tribune, Texas Watchdog and many others.
  • Educators breathing life into investigative journalism, such as those at American University’s Investigative Reporting Workshop.
  • For-profit, web-only journalism startups. Washington, D.C., has the soon-to-launch TBD.com from Allbritton Communications. The Faster Times bills itself as “a new type of newspaper for a new type of world.”
  • Data, maps and stories targeted to the block where you live (Adrian Holovaty’s Everyblock).
  • “Community-powered reporting,” where the public suggests and funds stories (Spot.us).
  • Hundreds or thousands of bloggers and citizen journalists who are writing about their town or street — and organizations that  aggregate or network them.

I call it a hopeful sign, even if some of it’s not brand new. While legacy journalism battles to refashion itself, and lays off thousands of skilled journalists in the process, from the wreckage emerges a hint of a rebirth.

Particularly encouraging: They often focus on investigative journalism or local coverage that’s been the victim of cuts at legacy institutions, and they’re making smart use of data and analytic journalism.

Whether these efforts thrive or fizzle will, I believe, be determined largely by the quality of the content they produce. But their emergence is good news, whether you’re a journalist fresh out of college or one who needs to reinvent yourself 20 years into a career. Or a reader.

Update, June 1, 2010: Check out this list of promising local news sites from Michele McLellan. There’s even more to this mosaic than you might realize.

Save Journalism? It’s The Content, Kids

Out of the 9,000 words in James Fallows’ recent Atlantic piece “How to Save The News,” this quote from Google News creator Krishna Bharat resonated most with me:

“Usually, you see essentially the same approach taken by a thousand publications at the same time,” [Bharat] told me. “Once something has been observed, nearly everyone says approximately the same thing.   …  I believe the news industry is finding that it will not be able to sustain producing highly similar articles.”

During my undergrad journalism studies at Marist, then-professor David McCraw assigned us Timothy Crouse’s The Boys on the Bus — a chronicle of the 1972 presidential election from the view of the reporters covering it. Aside from coming away thinking that R.W. Apple was quite the character, the book introduced me to pack journalism — the tendency for news media to follow one another to the point where they all say mostly the same thing.

It’s this tendency that Bharat — in a world where search engines reveal and aggregate everything written on a topic — finds unsustainable. I agree. It seems to me that:

  • Unique content is a journalism organization’s most valuable currency.
  • Width, depth and quality on a topic builds uniqueness.
  • Uniqueness breeds reader loyalty.
  • No one will pay you for something they can get free elsewhere.
  • Trying to match “the pack” on stories that wire services and others already have covered pulls you away from achieving bullets 1 and 2. So, either find something unique to say or don’t bother.

These hold true whether you’re a blogger or a worldwide brand, whether you’re doing stories, photos, news apps, graphics or databases. Why? Because, as Fallows’ story says, the assumption being made by Google (which seems to be smart) is that people actually are willing to pay for news. But not just anything:

… People inside the press still wage bitter, first-principles debates about whether, in theory, customers will ever be willing to pay for online news, and therefore whether “paywalls” for online news can ever succeed. But at Google, I could hardly interest anyone in the question. The reaction was: Of course people will end up paying in some form—why even talk about it? … The deeper differences [between news orgs and Google] involve Google’s assumptions about what the news business will have to do to “engage” readers again—that is, make them willing to spend time with its printed, online, or on-air products, however much they cost.

If this is true, and I suspect it is, news organizations need to answer a basic question:

What do we have that readers can’t get anywhere else?

IRE’s CAR Conferences: What I’ve Valued

In two and a half weeks, Investigative Reporters and Editors will host  the 2010 CAR Conference — the annual gathering of journalists who crunch data for stories and visuals. This year’s conference is in sunny Phoenix, a welcome change of pace for those who’ve endured a few blizzards this winter.

If you’ve never attended and are wondering whether to go, here are five things I’ve found valuable:

– You’ll be challenged to up your game. Every year, I am reminded that if I stand still in developing my skills, I am actually losing ground. The Web has forced journalism to become nimble, and the people and talks here will challenge you to be the same.
– There’s lots of opportunity to learn. Training is a huge component of the conference. People are genuinely open and willing to share data, code and skills.
– You won’t leave empty-handed. Every year, I go home with plenty of tips on new software or programming techniques, sources of data and story ideas.
– Beginners are encouraged. There’s a really good mix of super-technical subjects and sessions for those just starting in data analysis, programming and visualization.
– You’ll meet some smart cookies. The speakers’ list includes Pulitzer winners, folks working in the emerging area of non-profit journalism, expert coders and statisticians, and a load of really, really good journalists all around. Their stories and ideas will inspire you.

Spreading data journalism in the newsroom

A reporter called recently for tips on setting up “a CAR desk” in the newsroom of a decent-sized community newspaper. The editor had watched the reporter’s success at gathering and analyzing data and, as typically happens,  now wanted the reporter to train the rest of the newsroom.

Here was my advice:

Focus on a few: Instead of holding building-wide Excel classes or database journalism seminars, start with just one or two reporters who show a combination of interest and decent technical smarts. That lets you go deep on a couple of beats rather than spread yourself thin. Also, success breeds success. Watching a few reporters land great stories will possibly spur interest from others.

Have the right goals: Goals like “publish one CAR story a week” miss the point. Better objectives are to have data-thinking ever present in the reporter’s mind, have the reporter well-versed in her beat’s data sources, and have the reporter develop basic data skills. From that, stories will flow.

Inventory data: Speaking of data sources, have each reporter you work with find out the sets of data local governments keep. File FOIA requests for table layouts and database schemas. Get the data, then study it. That will spur story ideas.

Crawl first, run later: All the hot talk in data journalism these days is on Web frameworks and visualizations, but there’s plenty of work for the beginner in the land of Excel and Access. Build those skills as a starting point.

Your thoughts? Add a comment below …

The danger of thinking like it’s 1985

For a devout music fan weaned on what’s now called classic rock, the ’80s were miserable. Sure, we had U2 — they alone helped ease the pain of hair metal and synthpop. But from an audiophile’s perspective, for someone who thinks sound is as important as structure, the era made for painful listening.

Why? Because most music recorded in the ’80s — for all its supposed ambition and technical innovation — sounds more dated, more processed and more fake today than the music of the ’60s and ’70s, including disco. Line up Abbey Road or Dark Side of the Moon next to anything by Duran Duran or Human League and the point is made.

What hurt ’80s music most was the rush to digital sounds. Musicians grabbed every gizmo they could find — synthesizers, drum machines, vocal effects, digital guitar processors — and abandoned their lovely analog gear. When Phil Collins’ engineer figured out how to use a noise gate to make his drums sound as big as a 747, everyone copied. Songs now revolved not around good lyrics or melodies but the sounds of these machines. It all had a big wow factor, but it lacked one important quality:

None of it was timeless.

Oh, people thought it was. That’s what it feels like in the midst of every movement. “This will last forever.” Well …

Continue…