Setting up Python in Windows 7

An all-wise journalist once told me that “everything is easier in Linux,” and after working with it for a few years I’d have to agree — especially when it comes to software setup for data journalism. But …

Many newsroom types spend the day in Windows without the option of Ubuntu or another Linux OS. I’ve been planning some training around Python soon, so I compiled this quick setup guide as a reference. I hope you find it helpful.

Set up Python

Get started:

1. Visit the official Python download page and grab the Windows installer. Choose the 32-bit or 64-bit version, depending on your version of Windows 7 (right-click the Computer icon on your desktop and select Properties to find out which one you have). Note: Python currently exists in two versions, the older 2.x series and newer 3.x series (for a discussion of the differences, see this). This tutorial focuses on the 2.x series.

2. Run the installer and accept all the default settings, including the “C:\Python27″ directory it creates.

(more…)

csvkit: A Swiss Army Knife for Comma-Delimited Files

If you’ve ever stared into the abyss of a big, uncooperative comma-delimited text file, it won’t take long to appreciate the value and potential of csvkit.

csvkit is a Python-based Swiss Army knife of utilities for dealing with, as its documentation says, “the king of tabular file formats.” It lets you examine, fix, slice, transform and otherwise master text-based data files (and not only the comma-delimited variety, as its name implies, but tab-delimited and fixed-width as well). Christopher Groskopf, lead developer on the Knight News Challenge-winning Panda project and recently a member of the Chicago Tribune’s news apps team, is the primary coder and architect, but the code’s hosted on Github and has a growing list of contributors.

As of version 0.3.0, csvkit comprises 11 utilities. The documentation describes them well, so rather than rehash it, here are highlights of three of the utilities I found interesting during a recent test drive:
(more…)

Free Software and APIs: NICAR 2011 slides

I had the privilege this week of speaking on two panels at the 2011 Investigative Reporters and Editors Computer-Assisted Reporting* conference in Raleigh, N.C. Here are the slides my co-presenters and I put together:

– “Free Software: From Spreadsheets to GIS” with Jacob Fenton of the Investigative Reporting Workshop. Here is part 1, and here’s part 2.

“APIs: Making the Web a Data Medium” with Derek Willis of The New York Times.

* Those of us with a few miles on the tires remember that the conference used to go by the name NICAR — for National Institute for Computer-Assisted Reporting. People still call it that.

Test Drive: Freebase Gridworks 1.1

Update, 11/10/2010: Since I originally reviewed Freebase Gridworks, it has been acquired by Google. It’s now called Google Refine, and version 2.0 has been released. Original post follows:

——–

Data journalists spend lots of time wrestling dirty data, so when I heard the News Applications team at the Chicago Tribune raving about the data-handling abilities of Freebase Gridworks, my interest was piqued. Anything that can lessen the pain of cleaning data is worth a closer look!

Freebase Gridworks is a Java-based app that runs locally in your web browser. The makers’ pitch describes it best:

… A power tool that allows you to load data, understand it, clean it up, reconcile it internally, augment it with data coming from Freebase, and optionally contribute your data to Freebase for others to use. All in the comfort and privacy of your own computer.

Installation is simple. I chose to load Gridworks on my Windows XP-based work laptop, although you can download Mac and Linux versions from the code page. I was up and running in about five minutes, which included loading a new version of Java. Once running, the opening screen looks like so (click for larger version):

You can open an existing project or create a new one by importing a data file — and Gridworks hints at its utility by providing options to parse delimited or non-delimited files, limit the import to specific rows, etc. For testing, I grabbed the Academic Libraries: 2008 Public Use Data file from the National Center for Education Statistics — a tab-delimited text file of about 4,100 rows.
(more…)

Minkoff, Data Delvers and Yours Truly

Michelle Minkoff, perhaps the hardest-working journalism student I’ve ever encountered, for the last few months has been writing up a series of interviews with hacker-journalists and newsroom data nerds at her web site. Her subjects include include designers, coders and data lovers of all stripes. Among them are Pulitzer winner Matt Waite of PolitiFact fame, my Gannett colleagues Gregory Korte and Matt Wynn, and the St. Paul Pioneer Press’s Mary Jo Webster, whom I worked with for several years at USA TODAY.

Now add me to the list. Michelle interviewed me right after one of this winter’s east coast blizzards, and my cabin fever shows in the sheer verbosity of my responses. But it was fun reliving my early days — when I discovered the power of merging data and reporting. Here’s one quote:

A reporter in the newsroom came to me and said, “Hey, it would be really good if we could figure out what the most valuable properties are in the city of Poughkeepsie. And I thought to myself, “You know, this might be a good opportunity for me to go and make friends with the IT guy over in City Hall.” I went over and visited him, he was down in the basement of City Hall, in the computer room. Back in those days, they all had big mainframe computers in an air-conditioned room.

Actually, what I first did was I went to the tax assessor’s office, and I said, “I want a list of all the properties in the city of Poughkeepsie and how much they’ve been assessed for.” And they pointed me over to the corner where there were these big books filled with computer printouts, and they said, “Well, all the numbers are there, and you can just start copying them down.” And I thought to myself, “If they were printed on this piece of paper that looks like computer paper, then certainly they are in a computer somewhere in this building. And I can get that data on a disk that I can bring over and put into my computer.” And that’s how I really started figuring out that we can do computer-assisted reporting by going to the government and getting data.

That’s what I did. I went to visit that guy in City Hall, and I said, “Look, I know you’ve got a file on your computer. I’d love to have you put it on this floppy disk for me.” And he had to check with the local attorneys, and get their permission, and I called up a sunshine advocate in New York state and got him to weigh in, and they agreed that, “Yeah, the law says we can do this.” The next thing I know, I had that data on the computer and was going through it in Paradox. We wound up writing a couple of stories about different properties.

A hat tip to Michelle for a smart way to gain insight into our slice of journalism.

The danger of thinking like it’s 1985

For a devout music fan weaned on what’s now called classic rock, the ’80s were miserable. Sure, we had U2 — they alone helped ease the pain of hair metal and synthpop. But from an audiophile’s perspective, for someone who thinks sound is as important as structure, the era made for painful listening.

Why? Because most music recorded in the ’80s — for all its supposed ambition and technical innovation — sounds more dated, more processed and more fake today than the music of the ’60s and ’70s, including disco. Line up Abbey Road or Dark Side of the Moon next to anything by Duran Duran or Human League and the point is made.

What hurt ’80s music most was the rush to digital sounds. Musicians grabbed every gizmo they could find — synthesizers, drum machines, vocal effects, digital guitar processors — and abandoned their lovely analog gear. When Phil Collins’ engineer figured out how to use a noise gate to make his drums sound as big as a 747, everyone copied. Songs now revolved not around good lyrics or melodies but the sounds of these machines. It all had a big wow factor, but it lacked one important quality:

None of it was timeless.

Oh, people thought it was. That’s what it feels like in the midst of every movement. “This will last forever.” Well …

(more…)