Monday, March 28, 2011

Pete Warden's Data Science Toolkit offers cool geo capabilities

I just had an interesting chat with Pete Warden, a fellow Brit who was living in Boulder for a while and is now out in San Francisco, and who has worked on various interesting development projects including quite a bit of geo stuff. He is most famous for his cool map of Facebook users, which led to Facebook threatening to sue him :( !!

He has just launched a new project called the Data Science Toolkit at the GigaOM Structure Big Data Conference.
Watch live streaming video from gigaombigdata at

It's an open source project and contains a variety of tools for analyzing data, including several geospatial ones. Everything is nicely packaged up as an Amazon AMI, including some large databases, so you can just fire up one or more Amazon machines and use the functionality (and you can do some basic testing here).
  • Geocoding (currently US only) - this uses the open source geocoder developed by geoIQ using TIGER data. The nice thing about this is that there are no transaction limits or restrictive terms associated with it. And you can run it offline if you like. Lack of good geocoding is currently a weakness of OpenStreetMap, so this is a nice complement to that. It found my home address very accurately - not an extensive test but a good start :). While this is all open source, it takes a good bit of effort to download the full TIGER database and get this all set up, so having a packaged version is a good thing.
  • Reverse geocoding - takes a point and gives you information about where it is - country, city, district, etc. Again this is not unique, but has the same advantages as the geocoding functionality in terms of setup and lack of restrictions.
  • GeoDict is a tool that emulates Yahoo's Placemaker and pulls location data out of unstructured English text - its API is identical to Placemaker.
  • IP address to location
It's a nicely packaged collection of free and open source tools to add to your kitbag!