Sorting Data in Excel: Simple Analysis

Sorting a data set helps answer a basic question journalists like to ask: “Which ____ has the highest (or lowest) ______?”

Excel (and other spreadsheets such as the open source Calc) make sorting data easy. In fact, I often make sorting my first step when “interviewing” data because it quickly reveals high and low values and often highlights some that may seem questionable.

Let’s work through a simple sort in Excel. I’ll be using Excel 2007, but older versions have similar functions. Start by downloading the file “sorting.xls” and saving it to your computer. Open it and follow along:

1. We have a table of Census data from the 2006-2008 American Community Survey. It shows the median age of the population for each of 79 school districts in Virginia plus the state itself.

We want to know which district has the oldest and youngest populations. Let’s sort it!

2. Click once on one cell anywhere in the table. This will help Excel auto-discover your table in the next step.

3. On the Excel ribbon, select the “Data” tab and click “Sort.”

4. Two things happened. One, your entire table was selected (or highlighted). Two, a dialog box popped up to offer sorting options. Check off “My data has headers.” That will prevent your header row from getting sorted with the data, and it will add the three column names under the “Sort by” drop down.

5. Under “Sort by,” select “Median.” Under “Order,” select “Largest to Smallest.”

6. Click “OK.” Excel sorts your table, ranking the districts by median age — from highest to lowest. Your first few rows should look like this:

Now, we can do a quick scan and look for patterns. For example, several of the “oldest” counties are in southern Virginia, far away from the Northern Virginia economic engine. Meanwhile, the district with the lowest age is Harrisonburg City Public Schools — with a median age of a barely-legal 22.8. Could the fact that the city hosts two universities have something to do with that?

Good fodder for reporting, all made possible by a simple Excel sort.

A couple of tips and cautions:

– A good general practice is to work on a copy of your original data. Because things happen.

– Excel does best at sorting when your table has a header row and is not contiguous to any unrelated data, such as footnotes. Insert blank rows and columns between the data you want to sort and any information you want to keep separate.

– I recommend selecting only one cell in your table before selecting the “Sort” button. If you grab more than one, Excel may attempt to sort only those cells rather than the whole table. The 2007 version asks if you want to expand the selection, but older versions sometimes do not. This creates the possibility that only some of your data would get sorted, which is a nightmare. Always make sure your entire table gets selected!

– You can sort by more than one field. In Excel 2007, click “Add level” in the sort dialog.

Questions? Tips of your own? Add them below …

3 Responses to “Sorting Data in Excel: Simple Analysis”

  1. [...] This post was mentioned on Twitter by Anthony DeBarros. Anthony DeBarros said: Tell your reporter friends that they must, must, must learn how to sort data in Excel. No excuses. Here's a tutorial: [...]

  2. When I speak to reporters about database journalism, I often get the hackneyed excuses about journalists not being good at math. But as your example illustrates, often some pretty good reporting can come out of an operation no more sophisticated than ranking things from largest to smallest.

    The only other tip I’d add is to insert a new first column of any spreadsheet you’re working with and label it “ID.” Then put a “1″ in the first row, a “2″ in the second, and drag the lower right corner of the cell down to autofill the rest of the column with a unique number for each row.

    This has two purposes: (1) If you’re sorting and you ever want to get the spreadsheet back to the original order the data came in, you can just sort by the ID row. And (2) if you ever graduate on to doing more sophisticated computer-assisted reporting, you’ll be glad you got in the habit. Trust me.

  3. Anthony says:

    Good point, Gregory. It’s always good to leave yourself a breadcrumb trail to get back to where you started, and an ID field works wonders for that.

Leave a Reply