Alfredo Covaleda,
Bogota, Colombia
Stephen Guerin,
Santa Fe, New Mexico, USA
James A. Trostle,
Trinity College, Hartford, Connecticut, USA
Regular readers know that the IAJ has long been interested in the quality of the data in public records databases. The NY Times of 12 July 2006 carries a front-page story by Eric Lipton on just how bad the data is in the “National Asset Database.” As Lipton's story points out:
“The National Asset Database, as it is known, is so flawed, the inspector general found, that as of January, Indiana, with 8,591 potential terrorist targets, had 50 percent more listed sites than New York (5,687) and more than twice as many as California (3,212), ranking the state the most target-rich place in the nation….
“But the audit says that lower-level department officials agreed that some older information in the inventory “was of low quality and that they had little faith in it.
“The presence of large numbers of out-of-place assets taints the credibility of the data,” the report says.”
Sigh. This is not a new problem, or even one that we can hang on the Bush Administration. It started with the Clinton Administration in 1998. “In 1998, President Clinton issued Presidential Decision Directive No. 63 (PDD-63), Critical Infrastructure Protection, which set forth principles for protecting the nation by minimizing the threat of smaller-scale terrorist attacks against information technology and geographically-distributed supply chains that could cascade and disrupt entire sectors of the economy.” [Source here.]
Link to the PDF of the Inspector General's Report at http://www.nytimes.com/packages/pdf/politics/20060711_DHS.pdf
“The (Ongoing) Vitality of Mythical Numbers<http://www.slate.com/id/2144508/ >This article serves as a valuable reminder that we should viewall statistics, no matter how frequently they are used inpublic arguments, with skepticism until we know who producedthem and how they were derived.” From: Neat New Stuff I Found This Week <http://marylaine.com/neatnew.html> Copyright, Marylaine Block, 1999-2006.
Steve Bass, a PC World columnist, had an item this week that reminds us that a good analytic journalist is always thinking about what is NOT in the data. He writes:
Risky Business: Stealth Surfing at Work
Not long after I told my buddy about Anonymizer, I heard from another friend, an IT director for a fairly large company. It may not be such a good idea to surf anonymously at the office:
“I recently had an employee, an MIS employee at that, fired. He was using Anonymizer at work. We have a tracking system (Web Inspector) and I kept noticing that he was leaving no tracks.
“I consulted with my supervisor and he decided that I should analyze the employee's system. I found footprints, hacking, and a batch file he used to delete all Internet traces. So I sent the system off to forensics and they found all the bits, each and every one. We're now in legal limbo. The employee is being fired, not for the hacking or the batch file, but for using the Anonymizer.
“Thought maybe you'd be interested in hearing about the dangers of using the Anonymizer in the workplace. They claim the Anonymizer hides your tracks at work–but I guess not all of them.”
–Name Withheld, Network and Computer Systems Administrator
I asked George Siegel, my network guru, what he thought. Here's what he said: “It's interesting to note how the user was initially discovered — by the absence of anything incriminating. Network professionals have logs showing just about everything that goes on and they look for any deviation from the norm. I can always tell who is up to no good… their computers are scrupulously clean.“
We're pulling together the final pieces following the Ver 1.0 workshop in Santa Fe last week. Twenty journalists, social scientists, computer scientists, educators, public administrators and GIS specialists met in Santa Fe April 9-12 to consider the question, “How can we verify data in public records databases?”
The papers, PowerPoint slides and some initial results of three breakout groups are now posted for the public on the Ver1point0 group site at Yahoo. Check it out.
Late this afternoon, the 20 participants in Ver 1.0 will be gathering at the Inn of the Governors in Santa Fe, NM for the first session of the workshop. The first, set-the-tone speaker is George Duncan, professor of statistics at Carnegie Mellon University. George will be speaking on “Statistical Confidentiality: What Does It Mean for Journalists’ Use of Public Databases?“
We will post George's address as soon as possible, along with those of other participants in coming days.
We are very pleased with high-powered thinkers who are in or coming to Santa Fe to address the major problem of how do we verify the data in public records databases. The proceedings of the workshop will, we hope, be published by the end of the month and also available online.
Some new online resources for understanding, and engaging in, analytic journalism. See the BusinessJournalism.org site for:
A good learning opportunity in the Land of Lakes this summer….
Dear IPUMS Users,
I am pleased to announce the first annual IPUMS Summer Workshop, to be heldin Minneapolis on July 19th-21st. This training session will cover fourmajor databases: IPUMS-USA, IPUMS-International, IPUMS-CPS, and the NorthAtlantic Population Project (NAPP).
For more information, please visithttp://www.pop.umn.edu/training/summer.shtml.
I hope to see some of you in Minneapolis this summer.
Sincerely,
Steven RugglesPrincipal InvestigatorIPUMS Projects
We should have caught this on Friday, but….
Patrick Radden Keefe (The Century Foundation) offers up a good overview of the pros and cons of Social Network Analytis in last Friday's (12 March 2006) edition of The New York Times. In “Can Network Theory Thwart Terrorists?” he says that “the N.S.A. intercepts some 650 million communications worldwide every day.” Well, that's a nice round number, but one so large that we wonder how, for example, to account for basic variables such as the length of call? (You don't suppose the good folks at the N.S.A. have to wait while the “Please wait. A service technician will be with you shortly” messages are being replayed for 18 minutes, do we?)
We think Social Network Analysis is another of those tools in its infancy, but one with (a) great potential and (b) an equally great development curve.
Friend-of-IAJ Griff Palmer alerts us to an impressive series this week that examines the conduct of the DA's office in Santa Clara County, California. If nothing else, the series illustrates why good, vital-to-the-community journalism takes time and is expensive. Rick Tulsky, Griff and other colleagues spent three years — not not three days, but YEARS — on the story. Griff writes:
From Complexity Digest:
Excerpts: You like a certain song and want to hear other tracks like it, but don't know how to find them? Ending the needle-in-a-haystack problem of searching for music on the Internet or even in your own hard drive is a new audio-based music information retrieval system. Currently under development by the SIMAC project, it is a major leap forward in the application of semantics to audio content, allowing songs to be described not just by artist, title and genre but by their actual musical properties such as rhythm, timbre, harmony, structure and instrumentation. This allows comparisons between songs to be made (…). Source: Semantic Descriptors To Help Should this come to fruition, might there be stories in patterns — regional patterns — in music? How could we map this? And when?