Alfredo Covaleda,
Bogota, Colombia
Stephen Guerin,
Santa Fe, New Mexico, USA
James A. Trostle,
Trinity College, Hartford, Connecticut, USA
Nathan, the chap who curates the valuable blog Flowing Data, offers up a bit of hope for journalists who are worried about their employment futures and yet have invested in learning methods of data analysis. When thinking about re-inventing ourselves, consider the phrase “data scientist.”
As we've all read by now, Google's chief economist Hal Varian commented in January that the next sexy job in the next 10 years would be statisticians. Obviously, I whole-heartedly agree. Heck, I'd go a step further and say they're sexy now – mentally and physically.
However, if you went on to read the rest of Varian's interview, you'd know that by statisticians, he actually meant it as a general title for someone who is able to extract information from large datasets and then present something of use to non-data experts.
As a follow up to Varian's now-popular quote among data fans, Michael Discroll of Dataspora, discusses the three sexy skills of data geeks. I won't rehash the post, but here are the three skills that Michael highlights:
These skills actually fit tightly with Ben Fry's dissertation on Computational Information Design (2004). However, Fry takes it a step further and argues for an entirely new field that combines the skills and talents from often disjoint areas of expertise:
And after two years of highlighting visualization on FlowingData, it seems collaborations between the fields are growing more common, but more importantly, computational information design edges closer to reality. We're seeing data scientists – people who can do it all – emerge from the rest of the pack.
Think about all the visualization stuff you've been most impressed with or the groups that always seem to put out the best work. Martin Wattenberg. Stamen Design. Jonathan Harris. Golan Levin. Sep Kamvar. Why is their work always of such high quality? Because they're not just students of computer science, math, statistics, or graphic design.
They have a combination of skills that not just makes independent work easier and quicker; it makes collaboration more exciting and opens up possibilities in what can be done. Oftentimes, visualization projects are disjoint processes and involve a lot of waiting. Maybe a statistician is waiting for data from a computer scientist; or a graphic designer is waiting for results from an analyst; or an HCI specialist is waiting for layouts from a graphic designer.
Let's say you have several data scientists working together though. There's going to be less waiting and the communication gaps between the fields are tightened.
How often have we seen a visualization tool that held an excellent concept and looked great on paper but lacked the touch of HCI, which made it hard to use and in turn no one gave it a chance? How many important (and interesting) analyses have we missed because certain ideas could not be communicated clearly? The data scientist can solve your troubles.
This need for data scientists is quite evident in business applications where educated decisions need to be made swiftly. A delayed decision could mean lost opportunity and profit. Terabytes of data are coming in whether it be from websites or from sales across the country, but in an area where Excel is the tool of choice (or force), there are limitations, hence all the tools, applications, and consultancies to help out. This of course applies to areas outside of business as well.
Even if you're not into visualization, you're going to need at least a subset of the skills that Fry highlights if you want to seriously mess with data. Statisticians should know APIs, databases, and how to scrape data; designers should learn to do things programmatically; and computer scientists should know how to analyze and find meaning in data.
Basically, the more you learn, the more you can do, and the higher in demand you will be as the amount of data grows and the more people want to make use of it.
Good magazine often produces informative and innovative infographics. Eighty of those works are now on Flickr. Be sure to drill down into the thumbnails to see the work in detail. Go to: http://www.flickr.com/photos/goodmagazine/sets/72157618896371005/
Thumbnails Detail 2 comments
An archive of Transparencies that have run in past issues of GOOD and on our blog.
We post a new Transparency every Tuesday on www.good.is/
80 photos | 43,395 views
items are from between 04 Apr 2008 & 02 Jun 2009.
We've been noticing since the first of the year the results of some very creative and sometime brilliant aggregation sites. (Do we need a new phrase for this format?) These sites are richer than Google mash-ups in that they allow far more control by the user. Some, like Mint.com or TripIt.com, also require various degrees of data entry by the user, sometimes with with a surprising degree of detail, both personal and specific. Mapumental, below, pushes the limits of this evolution.
By Tom Steinberg on Monday, June 1st, 2009
We’ve been hinting for a while about a secret project that we’re working on, and today I’m pleased to be able to take the wraps off Mapumental. It’s currently in Private Beta but invites are starting to flow out.
Built with support from Channel 4’s 4IP programme, Mapumental is the culmination of an ambition mySociety has had for some time – to take the nation’s bus, train, tram, tube and boat timetables and turn them into a service that does vastly more than imagined by traditional journey planners.
In its first iteration it’s specially tuned to help you work out where else you might live if you want an easy commute to work.
Francis Irving, the genius who made it all work, will post on the immense technical challenge overcome, soon. My thanks go massively to him; to Stamen, for their lovely UI, and to Matthew, for being brilliant as always.
Words don’t really do Mapumental justice, so please just watch the video 🙂 Update: Now available here in HD too
Also new: We’ve just set up a TheyWorkForYou Patrons pledge to help support the growth and improvement of that site. I can neither confirm nor deny that pledgees might get invites more quickly than otherwise 😉
This entry was posted on Monday, June 1st, 2009 at 9:20am and is filed under Launches, News, Travel Time Maps. Follow responses to this entry (RSS2 feed).
From O'Reilly Radar….
Amazon Hosts TIGER Mapping Data
Posted: 29 May 2009 09:18 AM PDT
Last week at Ignite Where Eric Gundersen of Development Seed made a significant announcement for geohackers looking for easy access to open geodata. Amazon will be hosting a copy of TIGER data on EC2 as an EBS (Elastic Block Storage). Eric stated that this happened during the Apps For America contest in 2008 when they need open geo data for their entry Stumble Safely (which maps crime against bars).
Amazon is now hosting allUnited States TIGER Census data in its cloud. We just finished moving 140 gigs of shapefiles of U.S. states, counties, districts, parcels, military areas, and more over to Amazon. This means that you can now load all of this data directly onto one of Amazon’s virtual machines, use the power of the cloud to work with these large data sets, generate output that you can then save on Amazon’s storage, and even use Amazon’s cloud to distribute what you make.
Let me explain how this works. The TIGER data is available as an EBS storeEBS, or Elastic Block Storage, which is essentially a virtual hard drive. Unlike S3, there isn’t a separate API for EBS stores and there are no special limitations. Instead an EBS store appears just like an external hard drive when it’s mounted to an EC2 instance, which is a virtual machine at Amazon. You can hook up this public virtual disk to your virtual machine and work with the data as if it’s local to your virtual machine – it’s that fast.
The TIGER Data is one of the first Public Data Sets to be moved off of S3 and switched to an EBS. By running as an EBS users can mount the EC2 instance as a drive and easily run their processes (like rendering tiles with Mapnik) with the data remotely. If you're a geo-hacker this makes a rich set of Geo data readily available to you without consuming your own storage resources or dealing with the normally slow download process.
I love the idea of Amazon's Public Data Sets. It's an obvious win-win scenario. The public is able to get access to rich data stores at a relatively cheap price and Amazon is able to lure said public onto their service. Smart.
“Where 09” is a fine conference put on by O'Reilly Publishing. At this year's conference, Jack Dangermond, honcho at ESRI, talked about “Realizing Spatial Intelligence on the GeoWeb.” Take note of how he and a colleague use a command in Google Maps – “Greeley: mapservers” — to call up a bunch of map servers and their files for, in this case, Greeley, Colo.
That's a neat search tool that may give you quite mixed results depending on how GIS hip your local governments are. It seems to work for many non-U.S. cities, too. For example, “Amsterdam: mapserver” returned good results, but nothing for Mexico City or Berlin. Still, we think the search tool. while young, has a lot of promise, especially if you can find the time to drill down into the metadata for individual maps.
For the Dangermond presentation (15 min) go to: http://where.blip.tv/file/2151502/
Great opportunity for learning if one is in the New Orleans area.
The Tenth Crime Mapping Research Conference
http://guest.cvent.com/EVENTS/Info/Summary.aspx?e=c9e87fa2-759d-4bb3-a42d-91841ca7dfa2
Advancements in geographic-based technologies have brought a better understanding of crime, more efficient deployment of public safety resources and more critical examination of criminal justice policies. This is due to the reciprocation that occurs between research and practice, often resulting in better technology. Research provides a foundation of theories. Practice operationalizes the theories through technology. Policy decisions are then enacted with a more precise focus based on research and practical demonstration. Geography has been the constant in the expansion of each of these areas, and technology has been the facilitator.
The Crime Mapping Research Conference is not just about presenting where crime is. The conference is about understanding crime and public safety and their effect on community. It represents a range of research findings, practical applications, technology demonstrations and policy results.
Yet again, Nathan at FlowingData corrals a good application tool/tip….
Axiis, an open source data visualization framework in Flex, was released a few days ago under an MIT license. I haven't done much in Flex, but from what I hear, it's relatively easy to pick up. You get a lot of bang out of a few lines of code. Axiis makes things even easier, and provides visualization outside the built in Flex graph packages.
Axiis gives developers the ability to expressively define their data visualizations through concise and intuitive markup. Axiis has been designed with a specific focus on elegant code, where your code can be just as beautiful as your visual output.
Above is the wedge stack graph. Here's your standard area graphs:
See what other visualizations you can create with Axiis here.
Clipped from FlowingData….
Indieprojector Makes it Easy to Map Your Geographical Data
Posted: 21 May 2009 12:37 AM PDT
Axis Maps recently released indieprojector, a new component to indiemapper, their in-development mapping project to “bring traditional cartography into the 21st century.” Indieprojector lets you import KML and shapefiles and easily reproject your data into a selection of popular map projections. No longer do you have to live within the bounds of a map that makes Greenland look the same size as Africa.
Indieprojector was built by Axis Maps as the smarter, easier, more elegant way to reproject geographical data. It's platform independent, location independent and huge-software-budget independent. Indiemapper closes the gap between data and map by taking a visual approach to projections. See your data. Make your map. For the first time ever, it's just that simple.
Not only can you map your data; more importantly, you can also export your map in SVG format, which you can in turn edit in Adobe Illustrator or some other tool.
For those who frequently deal with geographical data and want something simpler than the big GIS packages, Axis Maps' indiemapper is a project to keep an eye on.
Google Launches Maps Data API
Posted by O'Reilly Radar : 20 May 2009 11:27 AM PDT
The crowd at Where 2.0 was expecting an API announcement and Google delivered one. Lior Ron and Steve Lee announced their Maps Data API, a service for hosting geodata. As they describe it on the site:
What is it?
The Google Maps Data API allows client applications to view, store and update map data in the form of Google Data API feeds using a data model of features (placemarks, lines and shapes) and maps (collections of features).
Why Use the Google Maps Data API?
Google is launching with some sample apps:
Geo data can get very large very quickly. Serving it can get expensive. This Data API will help NGOs, non-profits and developers make their data available without breaking the bank. Google's goals for doing this are obvious. If the data is on their servers they can index it easier and make it readily available to their users. There will be concern that Google will have too much of their data, but as long as Google does not block other search engines and allows developers to remove their data I think that this will be a non-issue.
The crowd was hoping for a formal Latitude API to be announced (knowing that they launched the hint of one at the beginning of May). When I asked Lior and Steve about it we got some smiles. I think we'll see some more movement in this area, but not *just* yet.
Here is another example of why journos need some training in basic math and statistics.
Thanks to Chris Feola for tipping us to this:
Swine flu alert! News/Death ratio: 8176
During the last 13 days, up to May 6, WHO has confirmed that 25 countries are affected by the Swine flu and 31 persons have died from Swine flu. WHO data indicates that about 60 000 persons died from TB during the same period. By a rough comparison with the number of news reports found by Google news search, Hans Rosling calculates a News/Death ratio and issue an alert for a media hype on Swine flu and a neglect of tuberculosis.
WHO TB data available at http://apps.who.int/globalatlas/dataQuery/default.asp
WHO Swine Flu data available at http://www.who.int/csr/disease/swineflu/updates/en/index.html