Updates from the Lands of Life & Work

Apologies for the lengthy radio silence. It’s been a busy and complicated couple of months — so busy that I never did write the 2013 year-end wrap I’d planned. Life and work served up some changes from the predictable, and writing fell off the table. A dose of reality.

In the past, each of these nuggets might have been posts of their own, but to get caught up here’s a mix of work and life highlights in the old USA TODAY Newsline format:

Mass killings interactive: After a year-long effort, last December we published an immersive data viz called Behind the Bloodshed: The Untold Story of America’s Mass Killings. Inspired by the events surrounding the Newtown, Conn., school shooting, it lays out the facts about mass killings over the last 8+ years: They happen often and are most often the result of family issues. My team at Gannett Digital collaborated with USA TODAY’s database team, and a post I wrote for Knight-Mozilla OpenNews’ Source blog explains our tech and process. We and our readers were super-happy with the results. We won the journalistic innovation category of the National Headliner Awards and made the short-lists for the Data Journalism Awards and the Online Media Awards.

mk2

 

NICAR 2014: The annual IRE data journalism conference, held in Baltimore this year, was great. About 1,000 attendees made for the largest turnout ever, and a “Getting Started With Python” session I taught was packed (here’s the Github repo). Highlights always include catching up with friends and colleagues, and as usual I focused on sessions with practical takeaways, such as learning more about d3.js and Twitter bots. Chrys Wu, as always, rounds up everything at her site. Next, I’m hoping to catch the IRE conference in San Francisco in June.

Relaunching our platform: For the last four months at work, I’ve taken a detour away from interactives to help our team that’s extending our publishing platform across all our community news and TV station properties. In short, versions of the complete makeover USA TODAY got in 2012 are now appearing on sites ranging from the Indianapolis Star to Denver’s KUSA-TV. It’s more than cosmetic, though, as Gannett Digital’s also moving all the sites to a shared CMS and Django- and Backbone-powered UX. In addition to desktop, there’s all-new mobile web, Android, iPad and iPhone apps. It’s been tiring but rewarding. In the process of personally launching the Wilmington News Journal, Springfield News-Leader, Montgomery Advertiser and several other sites, I’ve gained a better view of the breadth of Gannett’s journalism and found some great opportunities for collaboration.

Other cool work things: While I was relaunching websites, the rest of our interactives team collaborated with USA TODAY’s Brad Heath on his project exploring how felons can escape justice by crossing state lines. I’ve started refactoring the scraper behind our tropical storm tracker to get it ready for the upcoming season. We’ve been bringing Mapbox training to our newsrooms, which has given me the chance to finally dig deeper into TileMill and the Mapbox API. And you might have heard we have some big elections coming up in November. Finally, I recently tried both Google Glass and the Oculus Rift. Check back in five years on whether they’ve changed/saved journalism, but overall the experience reminded me of how I felt when I began using a web browser. Clunky but filled with potential.

Family life: The biggest event of the last while was another detour. At the end of 2013, my wife was hit by a devastating illness that required a lengthy hospital stay and convalescence. In the interests of privacy, I’m not going to post details. But it’s true that these events are life-changers — sitting in the hospital ICU with a first-row seat to a life-and-death drama changes perspectives and priorities quickly. I am sure the lessons we learned from and about people and life will play a big role in how the rest of 2014 plays out for us. (Oh, and before you ask: she’s doing better now.)

Goals for the rest of the year: Between illness and detours, it feels like the year is just getting started. I hope to post more often with Python, data and tech tips. I’ve bought Two Scoops of Django and JavaScript: The Definitive Guide for light summer reading (right), and I continue to plug away on a writing project that I hope to finish soon. And that’s in addition to lots of family and fun stuff we have in sight.

Thanks for hanging in, and please stay in touch!

NICAR ’14: Getting Started With Python

For a hands-on intro to Python at IRE’s 2014 NICAR conference, I put together a Github repo with code snippets just for beginners.

Find it here: https://github.com/anthonydb/python-get-started

For more Python snippets I’ve found useful, see:
https://github.com/anthonydb/python-snippets

Finally, if you’d like an even deeper dive, check out journalist-coder Tom Meagher’s repository for the Python mini bootcamp held at this year’s conference.

Thanks to everyone who showed up!

Setting up Python in Windows 8.1

One of my family members recently acquired a Windows 8.1 laptop, and I was curious as to whether Python setup was as easy as when I wrote about installing it on Windows 7. Turns out, it is — and not much different. Which could spawn a whole conversation about Windows OS development, but that’s for another day …

Here’s your quick guide, modified from my earlier Win 7 post:

Set up Python on Windows 8.1

1. Visit the official Python download page and grab the Windows installer. Choose the 32-bit version. A 64-bit version is available, but there are compatibility issues with some modules you may want to install later.

Note: Python currently exists in two versions, the older 2.x series and newer 3.x series (for a discussion of the differences, see this). This tutorial focuses on the 2.x series.

2. Run the installer and accept all the default settings, including the “C:\Python27″ directory it creates.

Continue…

FTP with Python: Quick file stashing

Recently at work, our team[1] built an app to track Atlantic and Pacific tropical storms during hurricane season. To keep the map current — and avoid manual processing — we run a Python script at regular intervals to check shapefiles at the National Hurricane Center and convert them to GeoJSON. After conversion, we push the GeoJSON to our cached storage environment. Using flat files keeps things fast and tidy.

To publish the files, we ship via FTP. A colleague told me about Python’s built-in FTP support, and I was really happy with how easy it was for me to write a script to handle the task.

Here’s a sample script that logs into an FTP server and uploads all files in a local directory. Grab it, along with other useful bits, at my python-snippets repository on GitHub.

# Store all files in a local directory to a server
# using FTP
 
from ftplib import FTP
import os
 
# ftp settings
settings = {
    'ftp': {
        'url': 'ftp.some-server.com',
        'username': 'your-account-name',
        'password': 'your-password',
        'remote-directory': '/path/to/files'
    }
}
 
# local paths
paths = {
   'local-directory': 'my-files/'
}
 
# list of local files
files = os.listdir(paths['local-directory'])
 
# connect and store
for f in files:
    ftp = FTP(settings['ftp']['url'])
    ftp.login(settings['ftp']['username'], settings['ftp']['password'])
    ftp.cwd(settings['ftp']['remote-directory'])
    ftp.storbinary('STOR ' + f, open(paths['local-directory'] + f, 'rb'))
    ftp.close()

[1] Python hackery by Andrea Fuller, mapping by Juan Thomassie, design/production direction by Kristin DeRamus and Maureen Linke. I rode devops/producer this time around.

Diary Entry, May 6, 1971

This is a page from my elementary school diary. In some ways, life hasn’t changed much. Also: Thank you, teachers, for letting me run the Bell & Howell projector. That was huge.

mychart

 

Favorite Albums List, 1963-?

I bought my first record sometime in the late 1960s/early 1970s. It was a 7-inch, 45 RPM disc featuring “I Like Science” b/w “We Need the Rain.” Artist unknown.

Nerd from the start is what I say. I also ran the projector in elementary school.

Since then, some of the music I bought has endured — as in my beloved progressive rock — but a lot hasn’t. I regret telling my boss at WPDH-FM, where I played music in the ’80s, that I would always listen to Judas Priest.

Today, here’s what has lasted — for me. To make this list, the album in general or the song specifically still has to raise the heart rate:

A Map of the World — Pat Metheny
The Fire Theft — The Fire Theft
Seconds Out — Genesis
Continue…

Python: Pull Twitter, Facebook User Data

Updated July 4, 2013: The Twitter portion of this post has been revised to reflect Twitter’s retirement of v1 of its API and its move to v1.1. Since writing the original, I also discovered the excellent python-twitter library, which is extremely useful for handling the now-required authentication with the API and also providing a wrapper around the API itself.

Original post (updated):

The APIs offered by the two social media giants, Twitter and Facebook, offer plenty of possibilities for data gathering and analysis. From tweets and status messages to numbers of followers and friends, photos, locations and more, there’s a lot of information waiting.

Given my nascent interest in Python, I decided to explore the APIs via some simple scripts, fetching Twitter profile and Facebook page data and writing the values to a SQLite database.

These examples are simple but offer a framework for you (and me) to build upon. SQLite support is built into Python, but for the Facebook script you must install the Requests library if you don’t have it.

Facebook Page Data

This script (available on Github) pulls the number of “likes” and “talking about this” for each Facebook page specified in the list called names_list. It creates a SQLite database called social_data.db if none exists and also creates a table to hold the data.

# Fetch Facebook page metrics via Social Graph API into a SQLite DB
# Grabs the number of likes and "talking about" numbers
 
import requests
import sqlite3
import os
from datetime import datetime
 
# These are the accounts for which you will fetch data
names_list = [
    'fallingskies',
    'usatoday'
]
 
# API base URL
base_url = 'https://graph.facebook.com/'
 
# Function to add row to accounts table
def insert_db(handle, likes, talking):
    conn = sqlite3.connect('social_data.db')
    cur = conn.cursor()
    cur.execute('''
        INSERT INTO fbaccounts VALUES (?,?,?,?);
        ''', (datetime.now(), handle, likes, talking))
    conn.commit()
    conn.close()
 
# Create the database if it doesn't exist
if not os.path.exists('social_data.db'):
    conn = sqlite3.connect('social_data.db')
    conn.close()
else:
    pass
 
# Create the table if it's not in the db
conn = sqlite3.connect('social_data.db')
cur = conn.cursor()
cur.execute('''CREATE TABLE IF NOT EXISTS fbaccounts 
    (FetchDate Date, Handle Text, Likes Integer, Talking Integer)
    ''')
conn.commit()
conn.close()
 
# Iterate over handles and hit the API with each
for user in names_list:
    url = base_url + user 
    print 'Fetching ' + user
    response = requests.get(url)
    profile = response.json()
    handle = profile['name']
    likes = profile['likes']
    talking = profile['talking_about_count']
    insert_db(handle, likes, talking)

Twitter profile data

This script (also on Github) uses the python-twitter library to fetch some basic profile data – screen name, followers and description — into a SQLite DB. To get the keys and access tokens required now by version 1.1 of the Twitter API, you’ll need to register an application under your profile. Start at https://dev.twitter.com/
Continue…

Packing The Stands at Nationals Park

Bryce Harper’s two home runs and Stephen Strasburg’s seven scoreless innings rightly earned the headlines in Monday’s opening-day win for the Washington Nationals. But the data journalist in me couldn’t help but want to apply a little percent change action to the proceedings.

So, I did, especially after I noticed in the boxscore that Nationals Park, on a Monday, was jammed to 108% capacity. A few minutes of research and Excel later, I had some findings:

  • The Nats’ opening day attendance of 45,274 was 11% higher than the team drew at last year’s home opener, when they beat the Reds 3-2 in 10 innings.
  • The attendance wasn’t a record for Nationals Park, but it was close — about 700 below the record set on the last, heartbreaking (if you’re a Nats fan) game of the 2012 NL division series against the St. Louis Cardinals.
  • The day was, however, a regular-season record for the park, which opened in 2008.

After I figured this out, I went a-Googling to see if anyone else had the same scoop. Didn’t find the percent change, but I did see a mention of the attendance record in a post on We Love DC and a mention on a MASN Sports blog that didn’t qualify it against the post-season record. Nothing from major sports media (please comment below if I missed some).

I’m not about to play a baseball writer — it’s one of the few jobs I haven’t had in journalism — but the basics of ballpark attendance gets too little attention, I think. And yet the money flowing through the turnstiles means a lot for a team and a city, as does the mental boost for the players who hear the cheers.

Can the Nats keep it up? I’ll be watching this chart at baseball-reference.com.

 

Get JSON from Excel using Python, xlrd

Powering interactive news applications off flat files rather than a call to a database server is an option worth considering. Cutting a production database and data access layer out of the mix eliminates a whole slice of complexity and trims development time. Flat files aren’t right for every situation, but for small apps they’re often all you need.

These days, most of the apps I help build at Gannett Digital consume JSON. Simpler apps — such as the table/modal displays we deployed in February for our Oscar Scorecard and Princeton Review Best Value Colleges — run off one or two JSON files. The more complex — e.g., last year’s Ghost Factories: Poison in the Ground — run off hundreds of them. Updating content is as easy as generating new JSON and uploading it to our file server.

I wrote last year how to use Python to generate JSON files from a SQL database. Lately, I’ve been using Python to make JSON out of Excel spreadsheets.

The key ingredient is the Python library xlrd. It’s part of a suite of Excel-related tools available from www.python-excel.org, including the xlwt library to create Excel files.

(Another choice is openpyxl, which has similar features and works with newer .xlsx formatted Excel files. I’ve used both with equal success. Take your pick.)

Basic xlrd operations

Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:

table

Here are some snippets of code — just scratching the surface — to interact with it programmatically:

# Demonstrates basic xlrd functions for working with Excel files
# (Excel 97-2003)
 
import xlrd
 
# Open the workbook
wb = xlrd.open_workbook('excel-xlrd-sample.xls')
 
# Print the sheet names
print wb.sheet_names()
 
# Get the first sheet either by index or by name
sh = wb.sheet_by_index(0)
 
# Iterate through rows, returning each as a list that you can index:
for rownum in range(sh.nrows):
    print sh.row_values(rownum)
 
# If you just want the first column:
first_column = sh.col_values(0)
print first_column
 
# Index individual cells:
cell_c4 = sh.cell(3, 2).value
# Or you can use:
#cell_c4 = sh.cell(rowx=3, colx=2).value
print cell_c4
 
# Let's say you want the same cell from x identical sheets in a workbook:
x = 2
while x >= 0:
    sh = wb.sheet_by_index(x)
    cell_x = sh.cell(2, 3).value
    print cell_x
    x = x - 1

From Excel to JSON

Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:

import xlrd
from collections import OrderedDict
import simplejson as json
 
# Open the workbook and select the first worksheet
wb = xlrd.open_workbook('excel-xlrd-sample.xls')
sh = wb.sheet_by_index(0)
 
# List to hold dictionaries
cars_list = []
 
# Iterate through each row in worksheet and fetch values into dict
for rownum in range(1, sh.nrows):
    cars = OrderedDict()
    row_values = sh.row_values(rownum)
    cars['car-id'] = row_values[0]
    cars['make'] = row_values[1]
    cars['model'] = row_values[2]
    cars['miles'] = row_values[3]
 
    cars_list.append(cars)
 
# Serialize the list of dicts to JSON
j = json.dumps(cars_list)
 
# Write to file
with open('data.json', 'w') as f:
    f.write(j)

Here’s the breakdown: We open the workbook, select the sheet and iterate through the available rows (which xlrd conveniently counts using its nrows method).

Add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Dump the list to JSON and write to a file.

Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter for something like this. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.

The 2012 Best-Selling Books

Watching book sales has been an endlessly fascinating piece of my journalistic life, and the recent analysis I published of 2012′s top-selling titles from USA TODAY’s Best-Selling Books list is no exception.

The top 10 (posted via a nifty datatables.js embedded table), reflects the year’s most prominent trend — two series from two authors accounted for seven of the top 10 slots and about 25% of the sales the list tracked.

The table also includes annual top 100 lists back to 2007.

One tidbit that didn’t get a lot of notice: constant favorite “To Kill a Mockingbird” passed 800 weeks on the USA TODAY list. From my view, the most welcome books news of the year.

 

Tracking Hurricane Sandy

Update 7:30 p.m. Tuesday, Oct. 30: At my place. Sandy brought a sleepless night in the basement, with winds howling like a freight train and a battery-powered backup sump pump the only thing standing between us and a flooded house. We are thankful that the battery held out all night, even as the water poured in.

All day Monday and Tuesday, my colleagues at USA TODAY did an incredible job with Sandy coverage: Check it out.
Continue…

Excel: Extract text with FIND and MID

Data analysis begins with usable data, and that means every piece organized nicely into its own field where we can count, sort and otherwise test it out.

What if you get a spreadsheet where the pieces of data are all packed in one field? Say, something like this (which I cobbled together from Major League Baseball data in honor of the Nationals’ first playoff appearance):

NAME: Sean Burnett POS: RP AGE: 30 WT: 200 BORN: Dunedin, FL SALARY: 2350000
NAME: Tyler Clippard POS: RP AGE: 27 WT: 200 BORN: Lexington, KY SALARY: 1650000
NAME: Ross Detwiler POS: SP AGE: 26 WT: 174 BORN: St. Louis, MO SALARY: 485000
NAME: Christian Garcia POS: RP AGE: 27 WT: 215 BORN: Miami, FL SALARY: N/A
NAME: Gio Gonzalez POS: SP AGE: 27 WT: 205 BORN: Hialeah, FL SALARY: 3335000
NAME: Mike Gonzalez POS: RP AGE: 34 WT: 215 BORN: Robstown, TX SALARY: N/A
NAME: Ryan Mattheus POS: RP AGE: 28 WT: 215 BORN: Sacramento, CA SALARY: 481000
NAME: Craig Stammen POS: RP AGE: 28 WT: 200 BORN: Coldwater, OH SALARY: 485000
NAME: Drew Storen POS: RP AGE: 25 WT: 180 BORN: Indianapolis, IN SALARY: 498750
NAME: Jordan Zimmermann POS: SP AGE: 26 WT: 218 BORN: Auburndale, WI SALARY: 2300000

Let’s say you want to extract the city of each player’s birth into a separate field. The varying length of each player’s name means the birth place isn’t always in the same position in the string, so a typical text-to-columns operation won’t work. So, how to do it?

The answer lies in two very handy Excel functions: FIND and MID.

FIND locates characters you specify and returns its numeric place in the string.

MID returns X characters from a string beginning at a location X you specify.

For example, we can locate the position where each city name begins by using FIND to locate the string “BORN:” in each cell. The city name itself always starts six characters after the position of that string, so we add six to the result:

=FIND("BORN:",A2)+6

In the first row above, the functions returns 50. In the second row, 52. We’ll feed that value to the MID function as the starting point for our extraction.

MID takes three arguments: Text or cell, position to start, number of characters to return. So, we use the above FIND function as the second argument and, for now, extract 10 characters:

=MID(A2,FIND("BORN:",A2)+6,10)

That gets us part of the way there. We’re starting in the right spot, but 10 characters isn’t always the length of the city and state, so it leads to choppy results:

Dunedin, F
Lexington,
St. Louis,
Miami, FL 
Hialeah, F
Robstown, 
Sacramento
Coldwater,
Indianapol
Auburndale

What we need to do is tell MID the exact number of characters to return each time even though the length of the city varies. We can figure this out using FIND again.
Continue…

Catch the Cat: A Simple Python Game

New cat in the house? Check.

Son interested in programming? Check.

Dad eager to learn more Python? Check.

Add it up and you get a little project the two of us worked up over a couple of nights this summer — a Python-based text game called Catch the Cat. You can download the code from its repository on Github. To use it, from a command prompt run:
 

python catchthecat.py

The game’s simple: The cat’s on the loose in the house, and you have to catch it. Every time you move to a new room, the cat moves too — at random. To catch the cat, you both have to arrive in the same place.

OK, so we’re not going to unseat EA Sports here, but we met our goals: we had fun and learned some tricks. While I was using the exercise to help my son work through the logic choices required in game play, I got to add few things to my Python toolkit:

– Using Python’s SQLite integration to build a leaderboard database.

– Writing classes for the cat and player objects.

Have you written Python-based games? Suggestions for improvement here?

Ghost Factories: Behind the Project

This is a cross-post of a recent item I wrote for Investigative Reporters and Editors’ On the Road blog. “Ghost Factories” was perhaps the most fun, interesting and well-executed project I’ve done at USA TODAY, largely because the people and process worked so well. This covers all the moving parts:

*  *  *

In April, after USA TODAY published its Ghost Factories investigation into forgotten lead smelters, we heard from several people who wanted to know more about how the project came together — particularly the online package that included details on more than 230 of the former factories.

The following is an expanded version of a post originally sent to IRE’s NICAR-L mailing list:

Alison Young was the lead reporter who conceived the idea for the project. In late 2010, she came to me with a couple of PDFs showing a list of suspected lead smelter sites, which I parsed into a spreadsheet and plotted on a Google map for her to research. Then she started digging, as one of our editors said, “Armed only with faded photographs, tattered phone directories, obscure zoning records, archival maps, fuzzy memories of residents and shockingly incomplete EPA studies.”

Ghost Factories

 
In December 2010, she began filing the first of more than 140 FOIA requests. The requests produced thousands of pages of government documents related to the sites, and to catalog them she created a project inside DocumentCloud. The product was extremely helpful both for organizing documents and for presentation. Brad Heath of our investigative team would later use the DocumentCloud API to integrate metadata from the documents — particularly their titles —  into our database so we could present them online. He also used the API to batch-publish all 372 documents that were included in the project. (He did most of the work using python-documentcloud, a Python wrapper by the Los Angeles Times’ Ben Welsh that makes it easy to interact with the API programmatically.)
Continue…

12 Tangents Later, I Publish a Django Site

Last week, I deployed my first live Django app. Time from start to finish: three years.

Cue the sound of snickers and a thousand eye-rolls. Go ahead. But I confess: From the moment I said, “I want to build something using Django” to the moment I restarted Apache on my WebFaction server and watched the site load for real in my browser, 36 months passed through the hourglass of time.

You see, I got diverted along the way. I’ll tell you why. But first, two things:

1. Learning is wonderful, thrilling, maddening and rewarding. If you’re a journalist and want to see new worlds, let me encourage you to take a journey into code.

2. The site is right here and the code is here. It falls way short in the Awesome Dept., and it will not save journalism. But that’s not why I built it, really.

* * *

The tale began March 2009 in Indianapolis at the Investigative Reporters and Editors Computer-Assisted Reporting conference. That’s the annual data journalism hoedown that draws investigative journalists, app coders and academics for a couple of days of nerdish talk about finding and telling stories with data.
Continue…

Generate JSON From SQL Using Python

Let’s say you want to generate a few hundred — or even a thousand — flat JSON files from a SQL database. Maybe you want to power an interactive graphic but have neither the time nor the desire to spin up a server to dynamically generate the data. Or you think a server adds one more piece of unnecessary complexity and administrative headache. So, you want flat files, each one small for quick loading. And a lot of them.

A few lines of Python is all you need.

I’ve gone this route lately for a few data-driven interactives at USA TODAY, creating JSON files out of large data sets living in SQL Server. Python works well for this, with its JSON encoder/decoder offering a flexible set of tools for converting Python objects to JSON.

Here’s a brief tutorial:

1. If you haven’t already, install Python. Here’s my guide to setup on Windows 7; if you’re on Linux or Mac you should have it already.

2. In your Python script, import a database connector. This example uses pyodbc, which supports connections to SQL Server, MySQL, Microsoft Access and other databases. If you’re using PostgreSQL, try psycopg2.

3. Create a table or tables to query in your SQL database and write and test your query. In this example, I have a table called Students that has a few fields for each student. The query is simple:
 

SELECT ID, FirstName, LastName, Street, City, ST, Zip
FROM Students

4. Here’s an example script that generates two JSON files from that query. One file contains JSON row arrays, and the other JSON key-value objects. Below, we’ll walk through it step-by-step.
Continue…

NICAR 2012: Words and Nerds

Briefly, some recaps from my week at the 2012 National Institute for Computer-Assisted Reporting conference, held in late February in St. Louis:

The basics: 2012 marked my 10th NICAR conference, an annual gathering of journalists who work with data and, increasingly, with code to find and tell stories. It’s sponsored by Investigative Reporters and Editors, a nonprofit devoted to improving investigative journalism. Panels ranged from data transparency to regular expressions.

Catch up: Best way to review what you learned (or find out what you missed) is by reading Chrys Wu’s excellent collection of presentation links and via IRE’s conference blog.

Busy times: Our USA TODAY data journalism team served on a half-dozen panels and demos. With Ron Nixon of The New York Times and Ben Welsh of the Los Angeles Times, I led “Making Sure You Tell a Story,” a reminder to elevate our reporting, graphics and news apps. (Here are the slides from me and Ben.) I also joined Christopher Groskopf for a demo of his super-utility csvkit, which I’ve written about. And, finally, I spoke about USA TODAY’s public APIs and how building them helps newsrooms push content anywhere.

Award!: Our team was excited to pick up the second-place prize in the 2011 Philip Meyer Awards for the Testing the System series by Jack Gillum, Jodi Upton, Marisol Bello and Greg Toppo. Truly an honor.

Surprise Award!: At the Friday evening reception, I received an IRE Service Award for my work contributing 2010 Census data to IRE for sharing data with members on deadline and eventually for use in IRE’s census.ire.org site. Colleague and master of all things Census Paul Overberg also was honored, along with the NYT’s Aron Pilhofer, the Chicago Tribune’s Brian Boyer and others. Out of the blue and humbling.

On the Radar: I ran into O’Reilly Radar’s Alex Howard at the conference — the side conversations are always a bonus of these things — and he later emailed me some questions about data journalism. My responses ended up in two pieces he wrote: “In the age of big data, data journalism has profound importance for society” and “Profile of the data journalist: the storyteller and the teacher.”
Continue…

The 2011 Best-Selling Books

In 2011, a year when consumers unboxed millions of e-readers, fiction dominated even more of USA TODAY’s Best-Selling Books list. Colleague Carol Memmott and I reported today that 78% of the titles in the weekly book lists last year were fiction, up from 67% in 2007. The finding is one of several covered in our annual look at trends off the book list:

“People are interested in escape,” says Carol Fitzgerald of the Book Report Network, websites for book discussions. “In a number of pages, the story will open, evolve and close, and a lot of what’s going on in the world today is not like that. You’ve got this encapsulated escape that you can enjoy.”

We’ve posted the 100 top-selling titles of 2011 in a handy data table that includes the annual lists back to 2007.