Wednesday, August 13, 2008

Google Local Search better than Google Geocoding?

Google offers two (apparently) unrelated solutions for doing geocoding (converting a text string into a latitude-longitude and a structured address): the Google Geocoding API (which is part of the Google Maps API) and the Google Local Search API (which has JavaScript and non-JavaScript variants). This post discusses our experiences with both and the results of recent testing we have been doing, which has led us to a decision to switch from our current hybrid approach to using only local search.

Initially we were attracted to using Google Local Search for whereyougonnabe, as it lets you search for places of interest like "Wynkoop Brewing Company" or "Intergraph Huntsville", rather than having to know an address in all cases - whereas the geocoding API only works with addresses.

However, in our initial testing with local search we found a number of cases where it returned a location and address string successfully, but did not properly parse the address into its constituents correctly (for example, returning a blank city). For our application it was important to be able to separate out the city from the rest of the address. In our initial testing, the geocoding API seemed to do a better job of correctly parsing out the address. In addition, it returned a number indicating how accurate the geocoding was (for example street level or city level). So we ended up implementing a rather ugly hybrid solution, where we first used local search to allow us to search for places of interest in addition to addresses, and then passed the address string which was returned to the geocoding API to try to structure it more consistently. In most cases this was transparent to the user, but in a number of cases we hit problems where the second call would not be able to find the address returned by the first call, or where the address would get mysteriously "moved". With so much to do and so little time :) we elected to go with this rather unsatisfactory solution to start with, but I have recently been revisiting this area, and doing some more detailed testing of the two options.

Before I get into more details of the testing, I should just comment on a couple of other things. One is that a key reason we are revisiting our geocoding in general is that we recently introduced the ability to import activities from external calendar systems, which means we need to call geocoding functionality from our server from a periodic background process, whereas before that we just called it from the browser on each user's client machine. This is sigifnicant because the Google geocoding API restricts you to 15,000 calls per IP address per day, which is not an issue at all if you do all your geocoding on the client but quickly becomes an issue if you need to do server based geocoding. Interestingly, Google Local Search has a different approach, with no specified transaction limits (which is not to say that they could not introduce some of course, but it's a much better starting point than the hard limit on the geocoding API).

Secondly, a natural question to ask is whether we have looked at other solutions beyond these two, and the answer is yes, though not in huge detail. Most of the other solutions out there do not have the ability to handle points of interest in addition to addresses, which is a big issue for us. Microsoft Virtual Earth looks like the strongest competitor in this regard, but it seems that we would need to pay in order to use that, and we need to talk to Microsoft to figure out how much, which we haven't done yet - and obviously if we can get a free solution which works well we would prefer that. Several solutions suffer from lack of global coverage and/or even lower transaction limits than Google. We are using the open source geonames database for an increasing number of things, which I'll talk about more in a future post, but that won't do address level geocoding and points of interest are more limited than in Google (currently at least).

Anyway, on to the main point of the post :) ! I tested quite a variety of addresses and points of interest on both services. Some of these were fairly random, a number were addresses which specific users had reported to us as causing problems in their use of whereyougonnabe. In almost all the specific problem cases, we found that the issue was with the geocoding API rather than local search. The following output shows a sample of our test cases (mainly those where one or other had some sort of problem):

I=Input, GC=geocoding API result, LS=local search API result, C=comments

I: 1792 Wynkoop St, Denver
GC: 1792 Wynkoop St, Denver, CO 80202, USA
LS: 1792 Wynkoop St, Denver, CO 80202
C: LS did not include country in the address string returned, but it was included separately in the country field. The postcode field was not set in LS, even though it appeared in the address string. Other fields including the city and state (called "region") were broken out correctly. This was typical with other US addresses - LS did not set the zip code / postal code, but otherwise generally broke out the address components correctly.

I: 42 Latimer Road, Cropston
GC: 42 Latimer Rd, Cropston, Leicestershire, LE7 7, UK
LS: null
C: The main case where GC fared better was with relatively incomplete addresses, like this one with the country and nearest large town omitted (Cropston is a very small village)

I: 72 Latimer Road, Cropston, England
GC: 72 Latimer Rd, Cropston, Leicestershire, LE7 7, UK
LS: 72 Latimer Rd, Cropston, Leicester, UK
C: Both worked in this case, with slight variations. GC included the postal code (a low accuracy version) and LS did not. LS included the local larger town Leicester which is technically part of the mailing address.

I: 8225 Sage Hill Rd, St Francisville, LA 70775
GC: null
LS: 8225 Sage Hill Rd, St Francisville, LA 70775
C: One of several examples from our users which didn't work in GC but did in LS.

I: Wynkoop Brewing Company Denver
GC: null
LS: 1634 18th St, Denver, CO
C: As expected, points of interest like this do not get found by GC

I: Kismet, NY
GC: Kismet Ct, Ridge, NY 11961, USA
LS: Kismet, Islip, NY
C: Another real example from a user, where GC returned an incorrect location about 45 miles away from the correct location, which was found by LS.

I: 1111 West Georgia Street, Vancouver, BC, Canada
GC: 1111 E Georgia St, Vancouver, BC, Canada
LS: 1111 W Georgia St, Vancouver, BC, Canada
C: Another real example from a user which GC gets wrong - strangely it switches the street from W Georgia St to E Georgia St, which moves the location about 2 miles from where it should be.

I: London E18
GC: London, Alser Strafle 63a, 1080 Josefstadt, Wien, Austria
LS: London E18, UK
C: Another real user example. London E18 is a common way of denoting an area of London (the E18 is the high level portion of the postcode). GC gets it completely wrong, relocating the user to Austria, but LS gets it right.

So in summary, in our test cases we found a lot more addresses that could not be found or were incorrectly located by the Google Geocoding API, but were correctly located by Google Local Search, than vice versa. It is hard to draw firm conclusions without doing much larger scale tests, and it is possible that there is a bias in our problem cases as our existing application may have tended to make more problems visible from the Geocoding API than from Local Search (it is hard to tell whether this is the case or not). But nevertheless, based on these tests we feel much more comfortable going with Local Search rather than the Geocoding API, especially given its compelling benefits in locating points of interest by name rather than address. These point of interest searches can also take a search point, which is also a very useful feature for us, which I will save discussion of for a future post. Advantages of the geocoding API are that it returns an accuracy indicator and generally returns the zip/postal code also, neither of which are true for Local Search. Neither of these were critical issues for us, though we would really like to have the accuracy indicator in local search also. For our application we are not too concerned about precisely how the addresses align with the base maps - one reason I have heard given in passing for Google having these two separate datasets is that they want a geocoding dataset which uses the same base data as Google Maps uses, to try to ensure that locations returned by geocoding align as well as possible with the maps. If this is a high priority for your application, you might want to test this aspect in more detail. It also appears that the transaction limits are more flexible for server based geocoding with Local Search.

So we'll be switching to an approach which uses only Google Local Search in an upcoming release of whereyougonnabe. I'll report back if anything happens to change our approach, and I'll also talk more about what we're doing with geonames relating to geocoding in a future post.


Peter Barnes said...

I wonder if the results are in some way being affected by what the APIs understand to be the context of your query? ie. IP address (location), map extents, and perhaps history... Certainly context becomes more important for disambiguation of POIs vs nice structured addresses! If so it may have some bearing on how you implement calendar import - you'd want the context for a query to relate to the previous/next engagement on an individual user's calendar.

Peter Batty said...

Hi Peter, good question. The Local Search API does let you specify a search location (as a latitude-longitude) which influences the results, and we do take advantage of this in our calendar import. The results I discuss in this post are not using this feature, but I did do quite a bit of testing of this. I'll do another post with more about that shortly.

Incidentally, I didn't mention that the search on the main Google Maps site appears to use the Local Search API (or some variant of it) rather than the Geocoding API, based on trying out the same test cases there. I believe that there they do use IP address to influence the search results, I have noticed in the past getting different results in some cases if I am in the UK versus the US.

skwash said...

Hello Mr Batty,
I too have come across this recently and brought the question up with Google's enterprise support. The answer has to do with licensing.

The maps API (including geocoding) uses TeleAtlas as its data source. Since the Local Search API is content based rather than strictly Geo base (ie: you aren't dealing directly with raw Geo data), they can be a little more flexible with which data source they use. Local Search appears to be backed by NAVTEQ data. Since NAVTEQ seems to be far superior to TeleAtlas in detail and accuracy, you will tend to see a lot of inconsistencies between the two APIs.

Gerd Kamp said...

I also had some closer look at geocodind places and addresses found in news agency news.

First you can get rid of the 15000 limit by signing up to Google Maps premier (we had to do it because we use the geocoder in the intranet)

Secondly wrt. geocoding places, you should have a look at Yahoo GeoPlanet (formerly known as internet location platform) which tries to maintain a taxonomy of interesting places on earth (partially based on Flickr tags, for identifying new named places).

OpenStreetMap is definitely also a place to look for place and address data as well as an API to use (especially the name finder is interesting). They also include the geonames data as well as some other sources.

Peter Batty said...

Thanks all for the interesting comments.

@skwash, that's interesting but it doesn't especially explain why Google has two inconsistent ways of doing essentially the same thing (one being a subset of the other). If the underlying reason is to better align with the data in Google Maps as I theorized, it doesn't seem as though that should be a straight NAVTEQ / Tele Atlas split since the data in Google maps comes from both (a quick pan around Denver says (c) NAVTEQ while in London it says (c) Tele Atlas). It would really be nice if they focused on consolidating the two.

@gerd, I didn't know that Google Maps Premier removed the limit so that's useful to know (I didn't find that documented anywhere). I did do a bit of testing with Yahoo and their "place structure" is interesting but it seemed to have a pretty low hit rate on the places of interest I tried. Maybe I'll include Yahoo in the next round of tests and post about that too. And I talked to Steve and Nick at the ESRI conference last week about OpenStreetMap geocoding and think that is an interesting option for the future, but not to the point where it can compete directly with Google Local Search just yet (for what we need).

Pamela Fox said...

FYI: The Local Search API will soon use Teleatlas data, so don't rely on using Local Search just because the data there may be better for some of your geocodes.

- pamela

Peter Batty said...

Pamela, thanks for the heads up. The big driver for us to use Local Search is really the ability to search for points of interest in addition to addresses, so as long as geocoding of addresses is not significantly worse that would be our preferred approach. The fact that we happened to find more issues in our test cases with one versus the other wasn't the primary factor.

The fact that you are moving to use the same data in both places seems like a good move, one might deduce from this that you are moving towards combining the two services into one ... :) ? This would really be a good step forward from my perspective, as I said in my post.

GIS said...

Google no longer uses NAVTEQ data. All services use only TELEATLAS.

LC: Mighr include some context based on your ip/location or even browsign history/patterm

GC: Search against raw data.