Get JSON from Excel using Python, xlrd

Powering interactive news applications off flat files rather than a call to a database server is an option worth considering. Cutting a production database and data access layer out of the mix eliminates a whole slice of complexity and trims development time. Flat files aren’t right for every situation, but for small apps they’re often all you need.

These days, most of the apps I help build at Gannett Digital consume JSON. Simpler apps — such as the table/modal displays we deployed in February for our Oscar Scorecard and Princeton Review Best Value Colleges — run off one or two JSON files. The more complex — e.g., last year’s Ghost Factories: Poison in the Ground — run off hundreds of them. Updating content is as easy as generating new JSON and uploading it to our file server.

I wrote last year how to use Python to generate JSON files from a SQL database. Lately, I’ve been using Python to make JSON out of Excel spreadsheets.

The key ingredient is the Python library xlrd. It’s part of a suite of Excel-related tools available from www.python-excel.org, including the xlwt library to create Excel files.

(Another choice is openpyxl, which has similar features and works with newer .xlsx formatted Excel files. I’ve used both with equal success. Take your pick.)

Basic xlrd operations

Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:

table

Here are some snippets of code — just scratching the surface — to interact with it programmatically:

# Demonstrates basic xlrd functions for working with Excel files
# (Excel 97-2003)
 
import xlrd
 
# Open the workbook
wb = xlrd.open_workbook('excel-xlrd-sample.xls')
 
# Print the sheet names
print wb.sheet_names()
 
# Get the first sheet either by index or by name
sh = wb.sheet_by_index(0)
 
# Iterate through rows, returning each as a list that you can index:
for rownum in range(sh.nrows):
    print sh.row_values(rownum)
 
# If you just want the first column:
first_column = sh.col_values(0)
print first_column
 
# Index individual cells:
cell_c4 = sh.cell(3, 2).value
# Or you can use:
#cell_c4 = sh.cell(rowx=3, colx=2).value
print cell_c4
 
# Let's say you want the same cell from x identical sheets in a workbook:
x = 2
while x >= 0:
    sh = wb.sheet_by_index(x)
    cell_x = sh.cell(2, 3).value
    print cell_x
    x = x - 1

From Excel to JSON

Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:

import xlrd
from collections import OrderedDict
import simplejson as json
 
# Open the workbook and select the first worksheet
wb = xlrd.open_workbook('excel-xlrd-sample.xls')
sh = wb.sheet_by_index(0)
 
# List to hold dictionaries
cars_list = []
 
# Iterate through each row in worksheet and fetch values into dict
for rownum in range(1, sh.nrows):
    cars = OrderedDict()
    row_values = sh.row_values(rownum)
    cars['car-id'] = row_values[0]
    cars['make'] = row_values[1]
    cars['model'] = row_values[2]
    cars['miles'] = row_values[3]
 
    cars_list.append(cars)
 
# Serialize the list of dicts to JSON
j = json.dumps(cars_list)
 
# Write to file
with open('data.json', 'w') as f:
    f.write(j)

Here’s the breakdown: We open the workbook, select the sheet and iterate through the available rows (which xlrd conveniently counts using its nrows method).

Add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Dump the list to JSON and write to a file.

Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter for something like this. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.

One Response to “Get JSON from Excel using Python, xlrd”

  1. [...] I am able to import the xlsx with ´xlrd´, but not sure how to setup import when the numbers of rows and columns are not fixed. (I am working on the basis of this page [...]

Leave a Reply