NICAR 2012: Words and Nerds

Briefly, some recaps from my week at the 2012 National Institute for Computer-Assisted Reporting conference, held in late February in St. Louis:

The basics: 2012 marked my 10th NICAR conference, an annual gathering of journalists who work with data and, increasingly, with code to find and tell stories. It’s sponsored by Investigative Reporters and Editors, a nonprofit devoted to improving investigative journalism. Panels ranged from data transparency to regular expressions.

Catch up: Best way to review what you learned (or find out what you missed) is by reading Chrys Wu’s excellent collection of presentation links and via IRE’s conference blog.

Busy times: Our USA TODAY data journalism team served on a half-dozen panels and demos. With Ron Nixon of The New York Times and Ben Welsh of the Los Angeles Times, I led “Making Sure You Tell a Story,” a reminder to elevate our reporting, graphics and news apps. (Here are the slides from me and Ben.) I also joined Christopher Groskopf for a demo of his super-utility csvkit, which I’ve written about. And, finally, I spoke about USA TODAY’s public APIs and how building them helps newsrooms push content anywhere.

Award!: Our team was excited to pick up the second-place prize in the 2011 Philip Meyer Awards for the Testing the System series by Jack Gillum, Jodi Upton, Marisol Bello and Greg Toppo. Truly an honor.

Surprise Award!: At the Friday evening reception, I received an IRE Service Award for my work contributing 2010 Census data to IRE for sharing data with members on deadline and eventually for use in IRE’s census.ire.org site. Colleague and master of all things Census Paul Overberg also was honored, along with the NYT’s Aron Pilhofer, the Chicago Tribune’s Brian Boyer and others. Out of the blue and humbling.

On the Radar: I ran into O’Reilly Radar’s Alex Howard at the conference — the side conversations are always a bonus of these things — and he later emailed me some questions about data journalism. My responses ended up in two pieces he wrote: “In the age of big data, data journalism has profound importance for society” and “Profile of the data journalist: the storyteller and the teacher.”
Continue…

The 2011 Best-Selling Books

In 2011, a year when consumers unboxed millions of e-readers, fiction dominated even more of USA TODAY’s Best-Selling Books list. Colleague Carol Memmott and I reported today that 78% of the titles in the weekly book lists last year were fiction, up from 67% in 2007. The finding is one of several covered in our annual look at trends off the book list:

“People are interested in escape,” says Carol Fitzgerald of the Book Report Network, websites for book discussions. “In a number of pages, the story will open, evolve and close, and a lot of what’s going on in the world today is not like that. You’ve got this encapsulated escape that you can enjoy.”

We’ve posted the 100 top-selling titles of 2011 in a handy data table that includes the annual lists back to 2007.

Again Towards The Analog

The feeling came a few weeks ago as I drove along a back road near the Potomac River. I was in the lowlands, about to cross from Virginia to Maryland, driving alone during a day in which I’d purposely disconnected from email, Twitter and most things digital.

I think we see things differently on those days.

My car rounded a bend, and through the trees I could see the river. The scene was perfection: bare trees arrayed on a grassy plain, standing watch next to the Potomac. If I’d shot a photo, it would have brushed up against Ansel Adams in intent if not quality. It took my breath, and I gave thanks.

Soon I was on a bridge crossing the river and then into Maryland. But the scene stayed in mind as I drove toward my destination, the road now winding through rustic small towns that seemed to take me even farther from the office.

I’ve thought back on those minutes often as 2011 disappeared into time past. I’ve thought how I need many more of those minutes.
Continue…

And In Local News … Editor’s Acquitted

So, you’re the 67-year-old editor of a small-town newspaper who also happens to do the books for a local businessman.

The local businessman’s not just your boss. He’s also the owner/landlord of your newspaper’s office, your residence, your son’s residence and your daughter’s business. You live in one of those in-grown places that dot America, a place where everyone whispers everyone’s business.

One day, you’re arrested. The charge: embezzling $9,000 from this businessman-boss-landlord.

The arrest happens in the middle of the day. Somehow, the local police chief decides to give you a perp walk in handcuffs down a main street of your little town, where everyone knows you and you know everyone. And, somehow, a freelance photographer just happens to be there, takes photos of you perp-walking, and sells them to a rival weekly newspaper, which of course publishes them.

You, the newspaper editor, say it’s all a mistake. Of course she didn’t steal anything … it was an accident!

The town’s in an uproar. Scandal! And on top of it a perp walk right in town for a 67-year-old lady!
Continue…

‘Goshen’ WordPress Theme on Github

At the start of 2011, I simplified the first WordPress theme I’d built for this site and turned it into something far more minimalistic. I went from two sidebars to one, lost the bulky header and turned from color to black and white. Part of this was a desire for simplicity; part was my reaction to my lack of design sense. Color is not my strong suit, and I shouldn’t be caught trying to pretend.

Since then, I’ve made a few tweaks, but one thing I hadn’t done all year was post the theme — which I call Goshen — for anyone to use. Today I fixed that and pushed the files up to their own repository on Github. You can download the files and hack away. (In your WordPress install, under /wp-content/themes/, create a folder called Goshen and unzip the files there; then you can activate the theme via the dashboard.)

I’ll continue to tweak when I have time. I can’t say enough about how much WordPress theme hacking has taught me about HTML, CSS, templates and web design. If you want to start from scratch, I recommend this excellent tutorial. You’ll discover that WordPress themes have only a few moving parts. Mastering them will let you make your site exactly what you want it to be.

 

Scraping CDC flu data with Python

Getting my flu shot this week reminded me about weekly surveillance data the Centers for Disease Control and Prevention provides on flu prevalence across the nation. I’d been planning to do some Python training for my team at work, so it seemed like a natural to write a quick Python scraper that grabs the main table on the site and turns it into a delimited text file.

So I did, and I’m sharing. You can grab the code for the CDC-flu-scraper on Github.

The code uses the Mechanize and BeautifulSoup modules for web browsing and html parsing, respectively. Much of what I demonstrate here I started learning via Ben Welsh’s fine tutorial on web scraping.

We’re still early in flu season, but if you watch this data each week you’ll see the activity pick up quickly.

Update 10/22/2011: Ben Welsh has lent some contributions to this scraper, adding JSON output and turning it into a function. Benefits of social coding 101 …

Setting up Python in Windows 7

An all-wise journalist once told me that “everything is easier in Linux,” and after working with it for a few years I’d have to agree — especially when it comes to software setup for data journalism. But …

Many newsroom types spend the day in Windows without the option of Ubuntu or another Linux OS. I’ve been planning some training around Python soon, so I compiled this quick setup guide as a reference. I hope you find it helpful.

Set up Python on Windows 7

Get started:

1. Visit the official Python download page and grab the Windows installer. Choose the 32-bit version. A 64-bit version is available, but there are compatibility issues with some modules you may want to install later. (Thanks to commenters for pointing this out.)

Note: Python currently exists in two versions, the older 2.x series and newer 3.x series (for a discussion of the differences, see this). This tutorial focuses on the 2.x series.

2. Run the installer and accept all the default settings, including the “C:\Python27″ directory it creates.

Continue…

csvkit: A Swiss Army Knife for Comma-Delimited Files

If you’ve ever stared into the abyss of a big, uncooperative comma-delimited text file, it won’t take long to appreciate the value and potential of csvkit.

csvkit is a Python-based Swiss Army knife of utilities for dealing with, as its documentation says, “the king of tabular file formats.” It lets you examine, fix, slice, transform and otherwise master text-based data files (and not only the comma-delimited variety, as its name implies, but tab-delimited and fixed-width as well). Christopher Groskopf, lead developer on the Knight News Challenge-winning Panda project and recently a member of the Chicago Tribune’s news apps team, is the primary coder and architect, but the code’s hosted on Github and has a growing list of contributors.

As of version 0.3.0, csvkit comprises 11 utilities. The documentation describes them well, so rather than rehash it, here are highlights of three of the utilities I found interesting during a recent test drive:
Continue…

My First Earthquake

I was looking at my watch because the meeting was scheduled for an hour, and the hour was nearly over.

We were in a second-floor conference room in the USA TODAY building in McLean, Va. That side of our glass-enclosed HQ faces the intersection of the Dulles Toll Road and the Capital Beltway, and for the last few years we’ve been front-row-center to the construction of new HOT lanes for the Beltway and the work going on for the new Metro Silver Line.

Loud noises are not uncommon.

At 1:50 p.m. I checked the time. I have a bad habit of frequently and obviously looking at my watch, which implies that I am bored or inpatient. I’m not; I just like to know what time it is. I’ve always been a clock-watcher. I’m always on time. So, I looked, mentally noting that I had a free hour until my next meeting at 3.

A moment later, the floor began to vibrate. There was a sound, rumbling, like the bulldozers and cranes that had been outside for months, but somehow different.

“Is that a crane coming toward the building?”

I stood to push back the shade and look out the window. I never got that far. The room began shaking from side to side, and people in the next room started exclaiming.

Earthquake, I thought. I dove under the conference table and lay on my side while the room pulsed.

Part of me was in disbelief. They always said earthquakes don’t happen here.

And then it was over, and someone said, “Let’s get out of here!” And then we were outside, everyone trying to make a call on a cell phone and no one getting through.

Some Favorite WordPress Plugins

With the 100-degree heat broiling the East Coast this weekend, I decided to stay inside and make some design and performance tweaks to my site. I added Google +1 buttons to posts and the index page, and I also tweaked some of the settings in my plugins.

Speaking of those, here’s what I’ve been using to make life easier:

Akismet: Gets rid of a ton of comment spam for various Russian “services” so I can spend my time doing other things. You’ll need to sign up for an API key, but otherwise it’s simple and effective.

Contact Form 7: After trying a few contact plugins, I settled on Contact Form 7 and have had great results. It powers my Contacts page, which I prefer to use instead of posting an email address. For spam filtering, I implemented the quiz feature, but the plugin also supports CAPTCHA. I rarely get spam.

Google XML Sitemaps: Generates a sitemap.xml file that Google and other search engines use to index the site. Lets me include or exclude content and control how often to update the file.
Continue…

A Facelift for a Book List

The USA TODAY Best-Selling Books list has a new look and added interactivity, part of a relaunch of books coverage. It’s been a fun project that has been on my front burner for about three months.

I get to work with all kinds of data at USA TODAY, but the book list has been a constant. When I arrived at USAT in 1997, one of the first projects I took on was to build and analyze an archive of the list to mark its fifth anniversary. Since then, as that archive grew to hold nearly 18 years of data, we’ve used it to anchor stories about authors and trends in publishing. We’re awfully proud of the list, and people in the publishing industry tell us it’s one of the most accurate accounts of Americans’ weekly reading habits.

Last year, we opened the archives up to developers via a Best-Selling Books API. This year, giving the list itself a facelift was the next logical step.

We were fortunate to assemble a crack team of designers, developers and product managers who, in a short time, conceptualized, designed, redesigned, and coded an entirely new collection of book-related pages for our site. What’s new:
Continue…

A Price That Minimizes Risk

Do pricing trends in music and books have any resonance for news and, in particular, investigative journalists?

When Amazon.com recently made a new album by Explosions in the Sky available for $2.99 for 24 hours, it caught my attention.

Until then, I hadn’t bought any of the band’s albums. I’d been mildly interested in EitS since it played an episode of Austin City Limits, but given my limited music-purchase budget, I hadn’t prioritized one of its albums over buying new releases by my favorite artists.

But $2.99 made it too easy. I clicked “buy.”

Later, I thought about the psychology of the buy. Why did $2.99 win me when $4.99 or $5.99 might not have? As I type, the price is back up to $7.99 for a download. Had I stumbled on that title today at that price, I would have passed.

But $2.99 hooked me. Why?
Continue…

Persistence

For the last many years, I’ve had an idea for a project. At work, in meetings and casual conversations, if an opening came up for me to tout my vision, I’d take it. Launch the pitch, follow up with an email.

“I’ve said it before, but we really should …”

Sometimes, I wondered whether people were thinking not about my grand idea but rather, “How can I get away from this man?” Mostly, they encouraged me — even though at the end of our talk it would be clear that other priorities held sway, and my pet idea had to go back to the shelf.

And so it did. Until about two weeks ago.

That’s when a spark out of nowhere set fire to the pile of kindling I’d been setting up all that time. Suddenly I found myself giving my pitch and hearing, “Let’s do this.”

And so for the last two weeks I’ve found myself in a room with the very people I’ve been bugging — some of the smartest, most creative people in my company — each one focused on turning this idea into something you’ll be able to see.

And the best part is that the end product is going to be way better than I ever imagined. Because now it won’t be my idea, but OUR idea.

A pile of kindling. A random spark.

Never give up.

Lessons From a Census Factory

After two months of processing Census data and writing about it here, I’m ready for a nice break. But before I go off to explore other topics, I thought I’d wrap this episode of Census 2010 with a look at how my teammates and I processed the data. My deepest thanks to my colleagues for doing such a great job. And many thanks to the journalists across the U.S. who offered encouragement as we shared our work with the journalism community.

*   *   *   *

On a Thursday afternoon in the first week of February, three of us from our newsroom’s database team gathered at my computer and tried our best to subdue the butterflies swarming in our stomachs. What we were about to do, we hoped, would not only help us cover the year’s biggest demographic story but also help journalists across the country do the same.

That’s because weeks earlier, somewhere in the midst of poring through Census technical manuals and writing a few thousand lines of SAS code, we’d had a bright idea:

Let’s share this.

Really?
Continue…

Census 2010 State Stories: Week 8

The eighth and final (phew!) week of Census 2010 P.L. 94 redistricting data releases brought data nerds back to east coast states — including one of the largest, New York. Here’s my final roundup of interesting stories and data applications made by journalists for this round of the Census:

District of Columbia: With 39,000 fewer black people since 2000, the nation’s capital is on the verge of seeing blacks lose majority status there, The Washington Post wrote. Its story explained:

The demographic change is the result of almost 15 years of gentrification that has transformed large swaths of Washington, especially downtown. As housing prices soared, white professionals priced out of neighborhoods such as Dupont Circle began migrating to predominantly black areas such as Petworth and Brookland.

The Post offered a ward-by-ward graphic explaining the city’s population changes, and its interactive map was updated to include D.C. along with Maryland and Virginia.

Maine: The state, which is 94% white, lost population in its north and eastern counties, The Bangor Daily News reported. On that page, note the BDN’s use of a Census Bureau-provided interactive map — one of many cases where news orgs picked up a government-issued graphic.
Continue…

Census 2010 State Stories: Week 7

This week’s release of nine states’ worth of Census data took us from corner to corner of the U.S. — from Alaska to Florida — with a bunch of upper Midwest states thrown in. Only eight states plus Washington, D.C., are left.

My USA TODAY colleague Paul Overberg and I continued pulling each state’s data for our interactive map and state profile pages, and our shop continued to write at least one story about each state. This week, reporter Dennis Cauchon’s story on North Dakota’s population boom was picked up by the Drudge Report and became our site’s top story for a day and a half. Who’d have thought?

Here’s a rundown of interesting stories and interactives:

Smart story: Rob Chaney of Montana’s The Missoulian wrote about Huson, one of 85 new “places” designated by the Census Bureau in the 2010 count. Shows what you can do if you can think non-numbers about a numbers story. Don’t miss the final quote.
Continue…

Which web browsers do journalists favor?

After I started playing with Internet Explorer 9 tonight — and knowing that most developers, including Microsoft, want to wean the world from IE6 as soon as possible — I grew curious about the browsers favored by my site’s visitors. A quick dig into Google Analytics gave me the data for the last few months, and the Google Charts API let me build a quick pie:

Site visits by browser, November 2010-March 2011

I can’t know for sure, but I suspect that most people who read my site are journalists or developers. Most traffic comes from links I post on Twitter or via search keywords that tend toward journalism, data, math and, lately, the Census.

Generally, you’re not an IE-centric crowd — just 12%. That’s lower than overall metrics, which tend to place Internet Explorer at anywhere from 40% or more of the overall market.

Oh, and the percent using IE6? Less than 0.4%.

Census 2010 State Stories: Week 6

Week 6 in the Census 2010 redistricting data rollout included some of the nation’s most populous states — California, Ohio and Pennsylvania among them — and one of the deepest selections of stories and news apps yet.

Highlights:

Arizona: The state’s 46% increase in Hispanic residents in the last decade was a prime mover in its growth, The Arizona Republic reported. The New York Times’ story says that Arizona’s Hispanic growth was slower than expected, however, and some activists suspect an undercount.
Continue…

Census 2010 State Stories: Week 5

This week’s release of Census 2010 redistricting data for Delaware, Kansas, Nebraska, North Carolina and Wyoming brought the number of states out so far to 26. Next week, biggies California, Arizona, Ohio and Pennsylvania are among seven states due. So, if you’re looking for national stories, you’ll soon have more than enough of a national data set to mine.

On to this week’s highlights. USA TODAY added stories on each state released in Week 5, and we updated our interactive map and data profile pages. A quick take on our stories:

Delaware: Mike Chalmers of The News Journal in Wilmington wrote that the state’s two smaller southern counties grew much faster than its more-populous northern county. Asians, he wrote, were the state’s fastest growing racial group, up 75.6%. (Also see Chalmers’ lengthier analysis at DelawareOnline.)
Continue…

Free Software and APIs: NICAR 2011 slides

I had the privilege this week of speaking on two panels at the 2011 Investigative Reporters and Editors Computer-Assisted Reporting* conference in Raleigh, N.C. Here are the slides my co-presenters and I put together:

— “Free Software: From Spreadsheets to GIS” with Jacob Fenton of the Investigative Reporting Workshop. Here is part 1, and here’s part 2.

“APIs: Making the Web a Data Medium” with Derek Willis of The New York Times.

* Those of us with a few miles on the tires remember that the conference used to go by the name NICAR — for National Institute for Computer-Assisted Reporting. People still call it that.