<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Anthony DeBarros &#187; News technology</title>
	<atom:link href="http://www.anthonydebarros.com/category/news-technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.anthonydebarros.com</link>
	<description>Data, journalism, code &#38; life</description>
	<lastBuildDate>Sat, 21 Aug 2010 22:05:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Test Drive: Freebase Gridworks 1.1</title>
		<link>http://www.anthonydebarros.com/2010/06/06/freebase-gridworks-1-1/</link>
		<comments>http://www.anthonydebarros.com/2010/06/06/freebase-gridworks-1-1/#comments</comments>
		<pubDate>Sun, 06 Jun 2010 20:44:26 +0000</pubDate>
		<dc:creator>Anthony</dc:creator>
				<category><![CDATA[News technology]]></category>

		<guid isPermaLink="false">http://www.anthonydebarros.com/?p=660</guid>
		<description><![CDATA[Data journalists spend lots of time wrestling dirty data, so when I heard the News Applications team at the Chicago Tribune raving about the data-handling abilities of Freebase Gridworks, my interest was piqued. Anything that can lessen the pain of cleaning data is worth a closer look! Freebase Gridworks is a Java-based app that runs [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Data journalists</strong> spend lots of time wrestling dirty data, so when I heard the News Applications team at the <em>Chicago Tribune</em> <a href="http://blog.apps.chicagotribune.com/2010/05/17/the-gift-of-freebase-gridworks/" target="_blank">raving</a> about the data-handling abilities of <a href="http://code.google.com/p/freebase-gridworks/" target="_blank">Freebase Gridworks</a>, my interest was piqued. Anything that can lessen the pain of cleaning data is worth a closer look!</p>
<p>Freebase Gridworks is a Java-based app that runs locally in your web browser. The makers&#8217; pitch describes it best:</p>
<blockquote><p><strong> </strong>&#8230; A power tool that allows you to load data,        understand it, clean it up, reconcile it internally, augment it  with data        coming from <a href="http://www.freebase.com/">Freebase</a>, and        optionally contribute your data to Freebase for others to use. All  in        the comfort and privacy of your own computer.</p></blockquote>
<p>Installation is simple. I chose to load Gridworks on my Windows XP-based work laptop, although you can download Mac and Linux versions from the <a href="http://code.google.com/p/freebase-gridworks/wiki/Downloads?tm=2" target="_blank">code page</a>. I was up and running in about five minutes, which included loading a new version of Java. Once running, the opening screen looks like so (click for larger version):</p>
<p><a href="http://www.anthonydebarros.com/wp-content/uploads/2010/06/openscreen.jpg"><img class="alignnone size-medium wp-image-661" style="border: 0pt none;" title="Open Screen" src="http://www.anthonydebarros.com/wp-content/uploads/2010/06/openscreen-300x177.jpg" alt="" width="300" height="177" /></a></p>
<p>You can open an existing project or create a new one by importing a data file &#8212; and Gridworks hints at its utility by providing options to parse delimited or non-delimited files, limit the import to specific rows, etc. For testing, I grabbed the <a href="http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2010310" target="_blank">Academic Libraries: 2008  Public Use Data file</a> from the National Center for Education Statistics &#8212; a tab-delimited text file of about 4,100 rows.<br />
<span id="more-660"></span></p>
<p>Import was a cinch. Gridworks guessed correctly at the file format and split the columns perfectly:</p>
<p><a href="http://www.anthonydebarros.com/wp-content/uploads/2010/06/initload.jpg"><img class="alignnone size-medium wp-image-664" style="border: 0pt none;" title="Initial load" src="http://www.anthonydebarros.com/wp-content/uploads/2010/06/initload-300x176.jpg" alt="" width="300" height="176" /></a></p>
<p>First thing I tried was data cleanup. Some of the cities in the &#8220;CITY_M&#8221; field were in uppercase and some were capitalized normally. Each column header has a menu of manipulation options, so I chose Edit Cells &gt; Common Transforms &gt; To Titlecase:</p>
<p><a href="http://www.anthonydebarros.com/wp-content/uploads/2010/06/citytrans.jpg"><img class="alignnone size-medium wp-image-670" style="border: 0pt none;" title="City Transform" src="http://www.anthonydebarros.com/wp-content/uploads/2010/06/citytrans-300x177.jpg" alt="" width="300" height="177" /></a></p>
<p>Gridworks chugged along for a few seconds (a progress bar might be handy), but soon enough it returned all the cities in the correct case. Nice!</p>
<p>Next, the ZIP_M field (and ZIP also) had a mix of five-digit zips and some with the &#8220;plus 4&#8243; extension. To separate the plus 4&#8242;s into their own field, I chose Edit Column &gt; Split Into Several Columns. It produced this dialog:</p>
<p><a href="http://www.anthonydebarros.com/wp-content/uploads/2010/06/splitzip.jpg"><img class="alignnone size-medium wp-image-668" title="Split Zip" src="http://www.anthonydebarros.com/wp-content/uploads/2010/06/splitzip-300x176.jpg" alt="" width="300" height="176" /></a></p>
<p>I opted to split the column by field length and typed the values &#8220;5,4&#8243; for the string lengths. To preserve the leading zeros in the zips and extensions, I unchecked the box &#8220;guess cell type&#8221; to keep the fields as text. Gridworks chugged along again, then produced the result, automatically renaming the fields in the process:</p>
<p><a href="http://www.anthonydebarros.com/wp-content/uploads/2010/06/splitzip2.jpg"><img class="alignnone size-medium wp-image-669" style="border: 0pt none;" title="Split Zip 2" src="http://www.anthonydebarros.com/wp-content/uploads/2010/06/splitzip2-300x178.jpg" alt="" width="300" height="178" /></a></p>
<p>Another handy feature of Gridworks is its ability to edit field values en masse. If you hover your mouse over a cell, an &#8220;edit&#8221; button appears:</p>
<p><a href="http://www.anthonydebarros.com/wp-content/uploads/2010/06/editcell.jpg"><img class="alignnone size-medium wp-image-675" style="border: 0pt none;" title="Edit Cell" src="http://www.anthonydebarros.com/wp-content/uploads/2010/06/editcell-300x176.jpg" alt="" width="300" height="176" /></a></p>
<p>Clicking it brings up a dialog box where you can change the cell&#8217;s value &#8212; and apply that change to all other cells with the same content. Handy! Here&#8217;s how you could change all the state names of &#8220;AL&#8221; to &#8220;Alabama&#8221;:</p>
<p><a href="http://www.anthonydebarros.com/wp-content/uploads/2010/06/editcell2.jpg"><img class="alignnone size-medium wp-image-676" style="border: 0pt none;" title="Edit Cell 2" src="http://www.anthonydebarros.com/wp-content/uploads/2010/06/editcell2-300x177.jpg" alt="" width="300" height="177" /></a></p>
<p>Data cleanup is clearly a strength, but Gridworks also offers plenty of ways to explore data by creating <a href="http://code.google.com/p/freebase-gridworks/wiki/Faceting" target="_blank">facets</a>, or summaries of data (think using COUNT and GROUP BY in SQL). It produces summary tables that let you quickly find all the unique values in a column &#8212; and edit them if you need to create consistency (i.e. company names spelled several ways).</p>
<p>Finally, Gridworks lets you export your revised data back to Excel or tab/comma-delimited text files, among other options. Very, very useful.</p>
<p>Judging by its <a href="http://code.google.com/p/freebase-gridworks/wiki/WhatsNew" target="_blank">revision history</a>, Freebase Gridworks is very much an evolving tool but one worth keeping tabs on. This little test drive has probably just scratched the surface of the ways you can use it to standardize your data, but you can get more ideas via the demo videos on the product&#8217;s <a href="http://code.google.com/p/freebase-gridworks/" target="_blank">home page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.anthonydebarros.com/2010/06/06/freebase-gridworks-1-1/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Minkoff, Data Delvers and Yours Truly</title>
		<link>http://www.anthonydebarros.com/2010/03/08/minkoff-data-delvers-yours-truly/</link>
		<comments>http://www.anthonydebarros.com/2010/03/08/minkoff-data-delvers-yours-truly/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 04:11:46 +0000</pubDate>
		<dc:creator>Anthony</dc:creator>
				<category><![CDATA[News technology]]></category>
		<category><![CDATA[Workflow]]></category>

		<guid isPermaLink="false">http://www.anthonydebarros.com/?p=419</guid>
		<description><![CDATA[Michelle Minkoff, perhaps the hardest-working journalism student I&#8217;ve ever encountered, for the last few months has been writing up a series of interviews with hacker-journalists and newsroom data nerds at her web site. Her subjects include include designers, coders and data lovers of all stripes. Among them are Pulitzer winner Matt Waite of PolitiFact fame, [...]]]></description>
			<content:encoded><![CDATA[<p>Michelle Minkoff, perhaps the hardest-working journalism student I&#8217;ve ever encountered, for the last few months has been writing up a <a href="http://michelleminkoff.com/category/data-delvers/" target="_blank">series of interviews</a> with hacker-journalists and newsroom data nerds at her <a href="http://michelleminkoff.com/" target="_blank">web site</a>. Her subjects include include designers, coders and data lovers of all stripes. Among them are Pulitzer winner <a href="http://www.mattwaite.com/" target="_blank">Matt Waite</a> of <a href="http://www.politifact.com/" target="_blank">PolitiFact</a> fame, my Gannett colleagues <a href="http://gregorykorte.com/" target="_blank">Gregory Korte</a> and <a href="http://www.tubotu.com/" target="_blank">Matt Wynn</a>, and the St. Paul Pioneer Press&#8217;s <a href="http://michelleminkoff.com/2010/02/20/data-delver-maryjo-webster-pioneer-press/" target="_blank">Mary Jo Webster</a>, whom I worked with for several years at USA TODAY.</p>
<p><a href="http://michelleminkoff.com/2010/03/08/data-delver-tony-debarros-usa-today/" target="_blank">Now add me to the list</a>. Michelle interviewed me right after one of this winter&#8217;s east coast blizzards, and my cabin fever shows in the sheer verbosity of my responses. But it was fun reliving my early days &#8212; when I discovered the power of merging data and reporting. Here&#8217;s one quote:</p>
<blockquote><p>A reporter in the newsroom came to me and said, “Hey, it would be  really good if we could figure out what the most valuable properties are  in the city of Poughkeepsie.  And I thought to myself, “You know, this  might be a good opportunity for me to go and make friends with the IT  guy over in City Hall.”  I went over and visited him, he was down in the  basement of City Hall, in the computer room.  Back in those days, they  all had big mainframe computers in an air-conditioned room.</p>
<p>Actually,  what I first did was I went to the tax assessor’s office, and I said, “I  want a list of all the properties in the city of Poughkeepsie and how  much they’ve been assessed for.”  And they pointed me over to the  corner where there were these big books filled with computer printouts,  and they said, “Well, all the numbers are there, and you can just start  copying them down.”  And I thought to myself, “If they were printed on  this piece of paper that looks like computer paper, then certainly they  are in a computer somewhere in this building.  And I can get that data  on a disk that I can bring over and put into my computer.” And that’s  how I really started figuring out that we can do computer-assisted  reporting by going to the government and getting data.</p>
<p>That’s what I did.  I went to visit that guy in City Hall, and I  said, “Look, I know you’ve got a file on your computer.  I’d love to  have you put it on this floppy disk for me.”  And he had to check with  the local attorneys, and get their permission, and I called up a  sunshine advocate in New York state and got him to weigh in, and they  agreed that, “Yeah, the law says we can do this.”  The next thing I  know, I had that data on the computer and was going through it in  Paradox.  We wound up writing a couple of stories about different  properties.</p></blockquote>
<p>A hat tip to Michelle for a smart way to gain insight into our slice of journalism.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.anthonydebarros.com/2010/03/08/minkoff-data-delvers-yours-truly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The danger of thinking like it&#8217;s 1985</title>
		<link>http://www.anthonydebarros.com/2009/11/23/thinking-like-its-1985/</link>
		<comments>http://www.anthonydebarros.com/2009/11/23/thinking-like-its-1985/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 19:49:10 +0000</pubDate>
		<dc:creator>Anthony</dc:creator>
				<category><![CDATA[Journalism]]></category>
		<category><![CDATA[News technology]]></category>

		<guid isPermaLink="false">http://www.anthonydebarros.com/?p=209</guid>
		<description><![CDATA[For a devout music fan weaned on what&#8217;s now called classic rock, the &#8217;80s were miserable. Sure, we had U2 &#8212; they alone helped ease the pain of hair metal and synthpop. But from an audiophile&#8217;s perspective, for someone who thinks sound is as important as structure, the era made for painful listening. Why? Because [...]]]></description>
			<content:encoded><![CDATA[<p><strong>For a devout music fan</strong> weaned on what&#8217;s now called classic rock, the &#8217;80s were miserable. Sure, we had U2 &#8212; they alone helped ease the pain of hair metal and synthpop. But from an audiophile&#8217;s perspective, for someone who thinks sound is as important as structure, the era made for painful listening.</p>
<p>Why? Because most music recorded in the &#8217;80s &#8212; for all its supposed ambition and technical innovation &#8212; sounds more dated, more processed and more fake today than the music of the &#8217;60s and &#8217;70s, including disco. Line up <em>Abbey Road</em> or <em>Dark Side of the Moon</em> next to anything by <a href="http://www.youtube.com/watch?v=B5m24ST7rSw" target="_blank">Duran Duran</a> or <a href="http://www.youtube.com/watch?v=9EHpozHn-QA" target="_blank">Human League</a> and the point is made.</p>
<p>What hurt &#8217;80s music most was the rush to digital sounds. Musicians grabbed every gizmo they could find &#8212; synthesizers, drum machines, vocal effects, digital guitar processors &#8212; and abandoned their lovely analog gear. When <a href="http://en.wikipedia.org/wiki/Hugh_Padgham#The_.22gated_drum.22_sound" target="_blank">Phil Collins&#8217; engineer figured out how to use a noise gate to make his drums sound as big as a 747</a>, everyone copied. Songs now revolved not around good lyrics or melodies but the sounds of these machines. It all had a big wow factor, but it lacked one important quality:</p>
<p>None of it was <em>timeless</em>.</p>
<p>Oh, people thought it was. That&#8217;s what it feels like in the midst of every movement. &#8220;This will last forever.&#8221; Well &#8230;</p>
<p><span id="more-209"></span>In 1991, the music of the 1980s officially died. That&#8217;s when Nirvana released <em>Nevermind </em>and Pearl Jam exploded with <em>Ten</em>, both featuring a sound that was entirely a return to all that the &#8217;80s had abandoned &#8212; authentic instruments without a lot of gimmicks. Video may have killed the radio star, but it couldn&#8217;t kill what was timeless.</p>
<p>As a journalist who loves technology, I wonder whether we&#8217;ll look back in 20 years and have a similar take on this first decade of the 2000&#8242;s. The Twittersphere is filled daily with reports of new apps, new sites, new data visualizations. People Tweet every other sentence from conferences where digital gurus explain where the news business is heading, maybe. Having gorged on print profits far too long, we news types are running towards all things digital hoping for a cure for our indigestion.</p>
<p>A lot of it is interesting, some certainly carries the wow-factor, and some of it is going to be <a href="http://www.documentcloud.org/" target="_blank">truly useful</a>. But how much of it will last? How much is timeless? How can we even tell?</p>
<p>In the &#8217;80s, pop music became all about the technology and very little about the song. In journalism, we don&#8217;t have songs; we have stories.</p>
<p>In music, a  great song is timeless. In journalism, a great story is.</p>
<p>In music, the song transcends the instrument &#8212; it sounds great on guitar or piano or both. In journalism, the story transcends the medium &#8212; you can tell it with photo, graphic, app, text or all.</p>
<p>But an instrument without a song is nothing. So is a medium without a story.</p>
<p>I love apps. I love data. I love visualizations. But unless these toys of ours deliver a great story &#8212; one that moves me like the best, most authentic music &#8212; they&#8217;ll have all the lasting impact of <a href="http://www.youtube.com/watch?v=JrBoOd7JQtk" target="_blank">Wang Chung</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.anthonydebarros.com/2009/11/23/thinking-like-its-1985/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
