Let’s say you want to generate a few hundred — or even a thousand — flat JSON files from a SQL database. Maybe you want to power an interactive graphic but have neither the time nor the desire to spin up a server to dynamically generate the data. Or you think a server adds one more piece of unnecessary complexity and administrative headache. So, you want flat files, each one small for quick loading. And a lot of them.
A few lines of Python is all you need.
I’ve gone this route lately for a few data-driven interactives at USA TODAY, creating JSON files out of large data sets living in SQL Server. Python works well for this, with its JSON encoder/decoder offering a flexible set of tools for converting Python objects to JSON.
Here’s a brief tutorial:
1. If you haven’t already, install Python. Here’s my guide to setup on Windows 7; if you’re on Linux or Mac you should have it already.
2. In your Python script, import a database connector. This example uses pyodbc, which supports connections to SQL Server, MySQL, Microsoft Access and other databases. If you’re using PostgreSQL, try psycopg2.
3. Create a table or tables to query in your SQL database and write and test your query. In this example, I have a table called Students that has a few fields for each student. The query is simple:
SELECT ID, FirstName, LastName, Street, City, ST, Zip
4. Here’s an example script that generates two JSON files from that query. One file contains JSON row arrays, and the other JSON key-value objects. Below, we’ll walk through it step-by-step.
Given that median is such a valuable statistical measure, it’s baffling that Microsoft’s SQL Server and other relational databases (MySQL, PostgreSQL) don’t have a built-in MEDIAN function. Well, this week, after working through a data set in SQL Server — and deciding I didn’t want to push the data into SPSS to find medians — I hit the web to find a T-SQL workaround.
I found a ton of solutions (some from people with no clue about the difference between median and average), but the one below — adapted from a post by Adam Machanic at sqlblog.com — was the best. It produces accurate results and is fairly speedy to boot.
Here’s an example. Consider this table with student grades from two courses:
We’d like to find the median grade in each class. Here’s the script:
For a session of five-minute “lightning talks” at this week’s 2010 Investigative Reporters and Editors conference in Phoenix, I contributed “Five Essential Queries for SQL Server.” Aside from the basic SELECT statement, these are five techniques that, at least for me, either solved a tricky problem or made coding life more efficient. They came to me after some trial and error or from using the coder’s best friend, Google.
I realize that many journalists prefer the open source (free) MySQL to Microsoft’s product, so I’ve replicated the five queries below in MySQL syntax. You can download script files for either syntax here:
– Five essential queries (MS SQL Server)
– Five essential queries (MySQL)
Feedback and your ideas are welcome. Here they are:
1. Create a temporary table with identity column.
Temp tables are handy for storing and manipulating data when you need a table but don’t want to make it part of your actual schema. In SQL Server, the table variable is held in memory and disappears once the query finishes executing.
DECLARE @tmp TABLE (
id INT IDENTITY(1,1),
INSERT INTO @tmp (FirstName) VALUES ('Bob')
INSERT INTO @tmp (FirstName) VALUES ('Joe')
INSERT INTO @tmp (FirstName) VALUES ('Sally')
SELECT * FROM @tmp