Getting my flu shot this week reminded me about weekly surveillance data the Centers for Disease Control and Prevention provides on flu prevalence across the nation. I’d been planning to do some Python training for my team at work, so it seemed like a natural to write a quick Python scraper that grabs the main table on the site and turns it into a delimited text file.
So I did, and I’m sharing. You can grab the code for the CDC-flu-scraper on Github.
The code uses the Mechanize and BeautifulSoup modules for web browsing and html parsing, respectively. Much of what I demonstrate here I started learning via Ben Welsh’s fine tutorial on web scraping.
We’re still early in flu season, but if you watch this data each week you’ll see the activity pick up quickly.
Update 10/22/2011: Ben Welsh has lent some contributions to this scraper, adding JSON output and turning it into a function. Benefits of social coding 101 …