NoVa-Py Talk: Building a Python Package

One of the most popular uses of the API for DocumentCloud, the document research/publishing platform where I work, is to bulk-upload hundreds or thousands of documents. People usually hack their own code together to do this, sometimes using the Python or Ruby wrappers for the API.

After talking with users and hearing their thoughts about the workflow — a desire to have a record of each file’s URL once uploaded, for example — I saw an opportunity to add some luxury to the process. A couple of months, a lot of research, and a few bruises later, I had my first Python package: pneumatic.

pneumatic does a few things to make life easier. It grabs information about each uploaded file and saves it in a SQLite database, which you can dump to csv. It uses Python’s multiprocessing module to try to add some speed (recognizing that this is a network-bound task). And it scans all subfolders for files, which is handy when you obtain a collection of files organized that way.

Learning about Python packaging was as much a part of the project as creating the library itself. The folks at the Northern Virginia Python Users Group were kind enough to invite me to share what I learned recently. Click through the title card to view the slides.



Today’s weather in my inbox, via Python

In the category of “potentially useful but mostly just a learning exercise,” here’s a Python script that emails me the local weather report twice a day. I loaded it on a Raspberry Pi my family gave me as a gift last year, set up a cron task, and now each day when I wake up I have a forecast waiting in my inbox. Makes me feel special!

The script — compatible with Python 3.4 and Python 2.7 — uses the awesome Requests library to fetch two endpoints from the Weather Underground API. One provides a forecast, and the other offers a summary of yesterday’s weather. For emailing, it uses the standard Python smtplib.

The code’s available on Github, so fork it and make it your own. You’ll need to have the Requests and simplejson libraries installed. Contributions are welcome!

Here’s a quick overview on how to set it up:

First, you’ll need to sign up for a Weather Underground API key. The free developer level has more than enough calls per day for this app, so choose that unless you plan to obsess about the weather in an oversized manner.

The API key and your email parameters go into a file:

mail_settings = {
    'address': '',
    'pw': 'your-email-password',
    'smtp': '',
    'from': 'Mr. Weather Robot'
send_to_addresses = ['', '']
api_key = 'your-wunderground-api-key'


Setting up Python in Windows 10

Installing Python under Windows 10 follows a similar script to installs under older versions of the operating system. In fact, this post follows closely on my previous entries about installing Python under Windows 7 and under Windows 8.1. The biggest difference here is that we’re going to work with Python 3 instead of Python 2.

Ready? Here’s your quick guide:

Set up Python on Windows 10

1. Visit the official Python download page and grab the Windows installer for the latest version of Python 3. A couple of notes:

  • Python is currently available in two versions — Python 2 and Python 3. For beginners, that can be confusing. In short, Python 3 is where the language is going; Python 2 has a large base of existing users but isn’t developing beyond bug fixes. Read this for more.
  • By default, the installer provides the 32-bit version. There’s also a 64-bit version available. I’ve generally stuck with 32-bit for compatibility issues with some older packages, but installing is so easy you can experiment with either.

2. Run the installer. You’ll have two options — choose “Customize Installation.”

3. On the next screen, check all boxes under “Optional Features.” Click next.

App launch: 2014 elections forecast

Election Forecast


With more than 1,300 candidates, 507 races, top-line campaign finance data and poll averages for select races, the 2014 midterm elections forecast app we launched in early September is probably the most complex mash-up of data, APIs and home-grown content built yet by our Interactive Applications team at Gannett Digital.

We’re happy with the results — even more because the app is live not only at USA TODAY’s desktop and mobile websites but across Gannett. With the rollout of a company-wide web framework this year, we’re able to publish simultaneously to sites ranging from the Indianapolis Star to my alma mater, The Poughkeepsie Journal.

What’s in the forecast? Every U.S. House and Senate race plus the 36 gubernatorial races up in November with bios, photos, total receipts and current poll averages. For each race, USA TODAY’s politics team weighed in on a forecast for how it will likely swing in November. Check out the Iowa Senate for an example of a race detail page.

Finally, depending on whether you open the app with a desktop, tablet or phone, you’ll get a version specifically designed for that device. Mobile-first was our guiding principle.

Building the backend

This was a complex project with heavy lifts both on design/development and data/backend coding. As usual, I handled the data/server side for our team with assists from Sarah Frostenson.

As source data, I used three APIs plus home-grown content:

— The Project Vote Smart API supplies all the candidate names, party affiliations and professional, educational and political experience. Most of the photos are via Vote Smart, though we supplemented where missing.

— The Sunlight Foundation’s Realtime Influence Explorer API supplies total receipts for House and Senate candidates via the Federal Election Commission.

— From Real Clear Politics, we’re fetching polling averages and projections for the House (USAT’s politics team is providing governor and Senate projections).

The route from APIs to the JSON files that fuel the Backbone.js-powered app goes something like this:

  1. Python scrapers fetch data into Postgres, running on an Amazon EC2 Linux box.
  2. A basic Django app lets the USAT politics team write race summaries, projections and other text. Postgres is the DB here also.
  3. Python scripts query Postgres and spits out the JSON files, combining all the data for various views.
  4. We upload those files to a cached file server, so we’re never dynamically hitting a database.

Meanwhile, at the front

Front-end work was a mix of data-viz and app framework lifting. For the maps and balance-of-power bars, Maureen Linke (now at AP) and Amanda Kirby used D3.js. Getting data viz to work well across mobile and desktop is a chore, and Amanda in particular spent a chunk of time getting the polling and campaign finance bar charts to flex well across platforms.

For the app itself, Jon Dang and Rob Berthold — working from a design by Kristin DeRamus — used Backbone.js for URL routing and views. Rob also wrote a custom search tool to let readers quickly find candidates. Everything then was loaded into a basic template in our company CMS.

This one featured a lot of moving parts, and anyone who’s done elections knows there always are the edge cases that make life interesting. In the end, though, I’m proud of what we pulled off — and really happy to serve readers valuable info to help them decide at the polls in November.

NICAR ’14: Getting Started With Python

For a hands-on intro to Python at IRE’s 2014 NICAR conference, I put together a Github repo with code snippets just for beginners.

Find it here:

For more Python snippets I’ve found useful, see:

Finally, if you’d like an even deeper dive, check out journalist-coder Tom Meagher’s repository for the Python mini bootcamp held at this year’s conference.

Thanks to everyone who showed up!