Last week, I deployed my first live Django app. Time from start to finish: three years.
Cue the sound of snickers and a thousand eye-rolls. Go ahead. But I confess: From the moment I said, “I want to build something using Django” to the moment I restarted Apache on my WebFaction server and watched the site load for real in my browser, 36 months passed through the hourglass of time.
You see, I got diverted along the way. I’ll tell you why. But first, two things:
1. Learning is wonderful, thrilling, maddening and rewarding. If you’re a journalist and want to see new worlds, let me encourage you to take a journey into code.
2. The site is right here and the code is here. It falls way short in the Awesome Dept., and it will not save journalism. But that’s not why I built it, really.
* * *
The tale began March 2009 in Indianapolis at the Investigative Reporters and Editors Computer-Assisted Reporting conference. That’s the annual data journalism hoedown that draws investigative journalists, app coders and academics for a couple of days of nerdish talk about finding and telling stories with data.
Read more » »
Let’s say you want to generate a few hundred — or even a thousand — flat JSON files from a SQL database. Maybe you want to power an interactive graphic but have neither the time nor the desire to spin up a server to dynamically generate the data. Or you think a server adds one more piece of unnecessary complexity and administrative headache. So, you want flat files, each one small for quick loading. And a lot of them.
A few lines of Python is all you need.
I’ve gone this route lately for a few data-driven interactives at USA TODAY, creating JSON files out of large data sets living in SQL Server. Python works well for this, with its JSON encoder/decoder offering a flexible set of tools for converting Python objects to JSON.
Here’s a brief tutorial:
1. If you haven’t already, install Python. Here’s my guide to setup on Windows 7; if you’re on Linux or Mac you should have it already.
2. In your Python script, import a database connector. This example uses pyodbc, which supports connections to SQL Server, MySQL, Microsoft Access and other databases. If you’re using PostgreSQL, try psycopg2.
3. Create a table or tables to query in your SQL database and write and test your query. In this example, I have a table called Students that has a few fields for each student. The query is simple:
SELECT ID, FirstName, LastName, Street, City, ST, Zip
4. Here’s an example script that generates two JSON files from that query. One file contains JSON row arrays, and the other JSON key-value objects. Below, we’ll walk through it step-by-step.
Read more » »
Briefly, some recaps from my week at the 2012 National Institute for Computer-Assisted Reporting conference, held in late February in St. Louis:
The basics: 2012 marked my 10th NICAR conference, an annual gathering of journalists who work with data and, increasingly, with code to find and tell stories. It’s sponsored by Investigative Reporters and Editors, a nonprofit devoted to improving investigative journalism. Panels ranged from data transparency to regular expressions.
Catch up: Best way to review what you learned (or find out what you missed) is by reading Chrys Wu’s excellent collection of presentation links and via IRE’s conference blog.
Busy times: Our USA TODAY data journalism team served on a half-dozen panels and demos. With Ron Nixon of The New York Times and Ben Welsh of the Los Angeles Times, I led “Making Sure You Tell a Story,” a reminder to elevate our reporting, graphics and news apps. (Here are the slides from me and Ben.) I also joined Christopher Groskopf for a demo of his super-utility csvkit, which I’ve written about. And, finally, I spoke about USA TODAY’s public APIs and how building them helps newsrooms push content anywhere.
Award!: Our team was excited to pick up the second-place prize in the 2011 Philip Meyer Awards for the Testing the System series by Jack Gillum, Jodi Upton, Marisol Bello and Greg Toppo. Truly an honor.
Surprise Award!: At the Friday evening reception, I received an IRE Service Award for my work contributing 2010 Census data to IRE for sharing data with members on deadline and eventually for use in IRE’s census.ire.org site. Colleague and master of all things Census Paul Overberg also was honored, along with the NYT’s Aron Pilhofer, the Chicago Tribune’s Brian Boyer and others. Out of the blue and humbling.
On the Radar: I ran into O’Reilly Radar’s Alex Howard at the conference — the side conversations are always a bonus of these things — and he later emailed me some questions about data journalism. My responses ended up in two pieces he wrote: “In the age of big data, data journalism has profound importance for society” and “Profile of the data journalist: the storyteller and the teacher.”
Read more » »
In 2011, a year when consumers unboxed millions of e-readers, fiction dominated even more of USA TODAY’s Best-Selling Books list. Colleague Carol Memmott and I reported today that 78% of the titles in the weekly book lists last year were fiction, up from 67% in 2007. The finding is one of several covered in our annual look at trends off the book list:
“People are interested in escape,” says Carol Fitzgerald of the Book Report Network, websites for book discussions. “In a number of pages, the story will open, evolve and close, and a lot of what’s going on in the world today is not like that. You’ve got this encapsulated escape that you can enjoy.”
We’ve posted the 100 top-selling titles of 2011 in a handy data table that includes the annual lists back to 2007.
The feeling came a few weeks ago as I drove along a back road near the Potomac River. I was in the lowlands, about to cross from Virginia to Maryland, driving alone during a day in which I’d purposely disconnected from email, Twitter and most things digital.
I think we see things differently on those days.
My car rounded a bend, and through the trees I could see the river. The scene was perfection: bare trees arrayed on a grassy plain, standing watch next to the Potomac. If I’d shot a photo, it would have brushed up against Ansel Adams in intent if not quality. It took my breath, and I gave thanks.
Soon I was on a bridge crossing the river and then into Maryland. But the scene stayed in mind as I drove toward my destination, the road now winding through rustic small towns that seemed to take me even farther from the office.
I’ve thought back on those minutes often as 2011 disappeared into time past. I’ve thought how I need many more of those minutes.
Read more » »
So, you’re the 67-year-old editor of a small-town newspaper who also happens to do the books for a local businessman.
The local businessman’s not just your boss. He’s also the owner/landlord of your newspaper’s office, your residence, your son’s residence and your daughter’s business. You live in one of those in-grown places that dot America, a place where everyone whispers everyone’s business.
One day, you’re arrested. The charge: embezzling $9,000 from this businessman-boss-landlord.
The arrest happens in the middle of the day. Somehow, the local police chief decides to give you a perp walk in handcuffs down a main street of your little town, where everyone knows you and you know everyone. And, somehow, a freelance photographer just happens to be there, takes photos of you perp-walking, and sells them to a rival weekly newspaper, which of course publishes them.
You, the newspaper editor, say it’s all a mistake. Of course she didn’t steal anything … it was an accident!
The town’s in an uproar. Scandal! And on top of it a perp walk right in town for a 67-year-old lady!
Read more » »
At the start of 2011, I simplified the first WordPress theme I’d built for this site and turned it into something far more minimalistic. I went from two sidebars to one, lost the bulky header and turned from color to black and white. Part of this was a desire for simplicity; part was my reaction to my lack of design sense. Color is not my strong suit, and I shouldn’t be caught trying to pretend.
Since then, I’ve made a few tweaks, but one thing I hadn’t done all year was post the theme — which I call Goshen — for anyone to use. Today I fixed that and pushed the files up to their own repository on Github. You can download the files and hack away. (In your WordPress install, under /wp-content/themes/, create a folder called Goshen and unzip the files there; then you can activate the theme via the dashboard.)
I’ll continue to tweak when I have time. I can’t say enough about how much WordPress theme hacking has taught me about HTML, CSS, templates and web design. If you want to start from scratch, I recommend this excellent tutorial. You’ll discover that WordPress themes have only a few moving parts. Mastering them will let you make your site exactly what you want it to be.
Getting my flu shot this week reminded me about weekly surveillance data the Centers for Disease Control and Prevention provides on flu prevalence across the nation. I’d been planning to do some Python training for my team at work, so it seemed like a natural to write a quick Python scraper that grabs the main table on the site and turns it into a delimited text file.
So I did, and I’m sharing. You can grab the code for the CDC-flu-scraper on Github.
The code uses the Mechanize and BeautifulSoup modules for web browsing and html parsing, respectively. Much of what I demonstrate here I started learning via Ben Welsh’s fine tutorial on web scraping.
We’re still early in flu season, but if you watch this data each week you’ll see the activity pick up quickly.
Update 10/22/2011: Ben Welsh has lent some contributions to this scraper, adding JSON output and turning it into a function. Benefits of social coding 101 …
An all-wise journalist once told me that “everything is easier in Linux,” and after working with it for a few years I’d have to agree — especially when it comes to software setup for data journalism. But …
Many newsroom types spend the day in Windows without the option of Ubuntu or another Linux OS. I’ve been planning some training around Python soon, so I compiled this quick setup guide as a reference. I hope you find it helpful.
Set up Python
1. Visit the official Python download page and grab the Windows installer. Choose the 32-bit version. A 64-bit version is available, but there are compatibility issues with some modules you may want to install later. (Thanks to commenters for pointing this out.)
Note: Python currently exists in two versions, the older 2.x series and newer 3.x series (for a discussion of the differences, see this). This tutorial focuses on the 2.x series.
2. Run the installer and accept all the default settings, including the “C:\Python27″ directory it creates.
Read more » »
If you’ve ever stared into the abyss of a big, uncooperative comma-delimited text file, it won’t take long to appreciate the value and potential of csvkit.
csvkit is a Python-based Swiss Army knife of utilities for dealing with, as its documentation says, “the king of tabular file formats.” It lets you examine, fix, slice, transform and otherwise master text-based data files (and not only the comma-delimited variety, as its name implies, but tab-delimited and fixed-width as well). Christopher Groskopf, lead developer on the Knight News Challenge-winning Panda project and recently a member of the Chicago Tribune’s news apps team, is the primary coder and architect, but the code’s hosted on Github and has a growing list of contributors.
As of version 0.3.0, csvkit comprises 11 utilities. The documentation describes them well, so rather than rehash it, here are highlights of three of the utilities I found interesting during a recent test drive:
Read more » »