Wednesday, April 27, 2011

A couple of discussions on location privacy and the iPhone

So my recent posts analyzing the iPhone location data log have gained a lot of traffic and attention over the past few days, from places including the Toronto Star, the Wall Street Journal, TUAW, PC World, MacDailyNews, Apfeltalk (German), Cisco, Pete Warden at O'Reilly, Business Insider, and more.

This led to me being invited to participate in a discussion on the Brian Lehrer Show yesterday on radio WNYC, the NPR affiliate in New York, together with Jennifer Valentino-Devries of the Wall Street Journal. We had a good sensible discussion, in contrast with a lot of the hysterical reporting that has been going on.

Today I will be taking part in a longer discussion on the KQED Forum discussion show in San Francisco. Also participating will be Congresswoman Jackie Speier, Jim Dempsey from the Center for Technology and Democracy, and Kara Swisher of All Things Digital. This will be at 10am mountain time, there should be a recording here sometime after that.

Apple issues Q&A on "Locationgate", and addresses key issues

Apple rather belatedly issued a Q&A on the whole "LocationGate" saga. This confirms what I said about the data being a cache of cell tower and wifi locations. The fact that this was kept for up to a year was a bug. Within the next few weeks they will reduce this to 7 days, they will not back up the cache any longer, and they will turn off the cache when you turn location services off, which addresses the issue reported by the Wall Street Journal and widely re-reported.. These are all good actions to take, and address the key issues in my opinion. It does reinforce the importance of developers being careful about location security, and Apple was slack in this case, even though the potential risks were much less dire than widely reported.

Note that in the short term if you are concerned, you can encrypt your iPhone database backup just by checking a box on the front page in iTunes (after plugging in your iPhone). If you do this, the current location log cannot be accessed by someone who hacks into your computer.

Sunday, April 24, 2011

The scoop: Apple's iPhone is NOT storing your accurate location, and NOT storing history

The Summary
So in my previous two posts I discussed how the data I was seeing in my iPhone location logs was actually not very accurate, and certainly didn't reveal where I lived or worked or had stayed on my travels - beyond showing the cities I had been to, including general areas I had visited, as well as some I hadn't. There had been some discussion that the data appeared to be, in a number of cases, the location of cell towers you had been in communication with, although in some cases locations were a long way from where you had been.

The quick summary: I believe I have confirmed that Apple is not storing your location, but the (actual or estimated) location of cell towers (and WiFi access points) that are close to you, to help locate you as you move (these are not necessarily towers that you have been in communication with). In the data I have examined there is nothing that is based on the accurate location of the iPhone. For a good example, see my previous post showing the location of cell equipment in Coors Field baseball stadium, and not revealing the location of my home which is very close to there. In my opinion, if Apple was storing this data in order to know where you had been, they would be storing different, more accurate location data that they have access to.

And, importantly, they are not storing history - the only thing that can be found from the files is when you last visited a general area, not if you made repeat visits. This is especially important as it means that many of the concerns expressed about this data are simply not valid: it cannot be used to determine where you live, or work, or go to school, or who your doctor is.

Here is a report of what Al Franken said:
Sen. Al Franken, a Minnesota Democrat, said it raises “serious privacy concerns,” especially for children using the devices, because “anyone who gains access to this single file could likely determine the location of a user’s home, the businesses he frequents, the doctors he visits, the schools his children attend and the trips he has taken — over the past months or even a year.”
The only part of this that is correct is that the data will show what cities you've visited, with some indication of which parts of a city you may have visited, though nothing definite - there will be records in areas you didn't visit. And it doesn't show repeated visits to the same location, only the last one.

Update: see below for a very interesting comment from "Anonymous", who includes a link to a document submitted by Apple to Congress in July 2010. This includes the following:
"When a customer requests current location information ... Apple will retrieve known locations for nearby cell towers and Wi-Fi access points from its proprietary database and transmit the data back to the device" ... "The device uses the information, along with GPS coordinates (if available), to determine its actual location. Information about the device's location is not transmitted to Apple, Skyhook or Google. Nor is it transmitted to any third-party application provider, unless the customer expressly consents". 
The data under discussion in this whole debate is clearly (in my opinion) a cache of the data mentioned here of nearby cell towers and Wi-Fi access points. I guess the remaining valid concern is that this cache is not stored as securely as it could be, and a fairly large amount of data is stored in the cache. But still this data provides only relatively coarse information as discussed here, and is stored only on the user's own computer, so the risks are relatively minor compared to many of the more dramatic scenarios that have been raised.

Update April 27: Apple has issued a Q&A document about all this, which confirms the conclusions I had drawn, and talks about changes they will make. See my thoughts here.

Read on to find out how I reached these conclusions.

The details
Last night someone called Jude commented on my last post, saying:
My Guess?

It's not a list of cell phone locations that you've been to, but the opposite, a list of cell phone locations near you downloaded to the iPhone from Apple in case you move into range of one of them. i.e. At a guess what is happening is location services identifies a cell tower and asks for its location, and is replied to with the list of locations that contains that cell tower, that list is then cached so that it does not need to be requested again.

Of course, this is only a guess based on the wide range of addresses people are seeing and how its near to, but not exactly where, the people have traveled.
Good thinking Jude! I thought this could explain a lot, so I investigated further. First I looked at some data from my fairly recent New York trip. I looked at the timestamps on some locations and did a query to display all the locations with the same timestamp. I found out that in general, quite a number of records shared the same timestamp, and they would be clustered in the same area. For example, this screen shot shows a set of records that were all loaded at exactly the same time:
Screen shot 2011-04-24 at 7.25.30 AM
This cluster of points is some way above where I drove, I was driving along the Long Island Expressway going east from LaGuardia Airport. The timestamp appears to be in seconds and has 7 decimal places, so it is apparent that this set of data must have been downloaded in a single transaction, it was not obtained by communicating with cell towers at each of these locations independently. It seems reasonable to assume that this data was downloaded to help locate me in the event that I drove into this area (which I didn't). You can observe similar clusters by clicking a dot at random, copying the timestamp, and running a filter in Google Fusion Tables to display all dots with the same timestamp.

What I really wanted to do now was to animate my data, to more easily visualize what was happening. I couldn't figure out an easy way to do this in Google Fusion tables - although it has some capability for this, it wasn't recognizing the timestamp field as a date-time. So I went to look at the data that Sean Gorman had posted of his logs at GeoCommons (my original file had been too large to visualize there without me doing a little more work). GeoCommons has a cool animation capability, which you can try out on Sean's map by dragging the sliders at the bottom left.

I found something really interesting when I zoomed in around the geoIQ office in Arlington, where Sean works. This screen shot shows that between November 11, 2010 and April 20, 2011, there is no record of Sean being at his office.
Screen shot 2011-04-24 at 8.12.15 AM
Now I know that Sean likes to escape for a spot of skiing in Colorado now and then, but that's a pretty long absence for a company President :) ! And I know I have met with him in the office during that time period.

If you drag the time slider a little further, then at the same instant, about 20 more locations appear on the map, covering a general area around the office, roughly half a mile square:
Screen shot 2011-04-24 at 8.12.31 AM
So from this data I can tell that Sean was somewhere in the general area of this half mile square (not necessarily inside it) on April 20. I know nothing about whether he was there before that, and I don't know anything about exactly where he went.

So, this data stored in the iPhone logs is much less revealing than it may initially seem. At a quick glance it does look like it is recording your location history, and I think that Pete Warden and Alasdair Allan were quite right to raise the concerns that they did. It takes some digging in the data to realize that the concerns are not nearly as bad as they appeared at first sight. By publicizing it as they did, and providing their tools and documentation on how to examine the data, they made it easy for others like myself, Sean Gorman and Will Clarke to analyze the data and figure out more about what is going on.

It's still not clear exactly what the data is for, but my guess, as Jude suggested, is that it is to aid in fast location determination - once the iPhone figures out that you're in an area, it downloads data for surrounding cell towers (and Wifi hotspots, a detail I haven't gone into here but the data is available for those too, as discussed in my previous post), so it can quickly locate you as you move around that area (update: see the first comment below, and my addition to the initial summary, which reference a document from Apple that confirms that this is the case).

So to summarize again, there are still some concerns with this data - it does give an approximate indication of places you've been, but not good enough to identify specific buildings or businesses. It doesn't record history - there is no way to tell if you've visited a location multiple times, you can just tell the last time you visited a general area (though there might be clues about multiple visits - for example data showing you visited a neighboring area on a different date, but nothing definitive or detailed about repeat visits). But it definitely doesn't reveal the sort of detailed information that many people have been concerned about.

Saturday, April 23, 2011

More on Apple recording your iPhone location history

In my previous post I discussed how the location data being recorded from my iPhone actually wasn't very accurate, and certainly not accurate enough to tell where I live or work (based on the data I've examined so far, which is in a table called CellLocation in the iPhone backup, and is the data discussed by Pete Warden and displayed by his iPhoneTracker app, which is what I used for the visualizations in my previous post). Pete's app aggregated data to a regular grid, partly to provide additional security.

However, I was sufficiently intrigued to follow Pete's instructions to get at the raw data. My investigations with this reinforced the conclusion of the previous post, that the data does not accurately represent your location. But it did show up some interesting new patterns. I loaded the data into Google Fusion Tables and have made it public, you can view it here (and feel free to play around with it).

Here is an interesting map of downtown Denver, where I live.

This shows all the raw point data, with no aggregation or changes. There are actually no dots at all in the block where I live. However, there is a noticeable cluster in Coors Field, the Colorado Rockies baseball stadium which is 3 blocks away from where I live. I haven't been to the Rockies stadium over the time period that this data was recorded. There's also a strong cluster in Mile High Stadium, home of the Denver Broncos.

I would assume that there is additional cell phone infrastructure in these stadiums, to help cope with the heavy concentration of people. A quick Google search found this article about AT&T infrastructure at Coors Field
AT&T at Coors Field
This reinforces the notion that at least some of these locations are the locations of cell equipment that your phone is communicating with. But I'm not sure that's the whole story.

Here's a map of Cropston, where I spent most of my time on my last two visits to England. It's a small village in a fairly rural area.

Here there are no locations shown in the village itself where I spent most of my time. A lot of the locations are clustered in towns or along streets, but some seem to be more in the middle of nowhere. Hard to draw any definite conclusions.

I just received a suggestion from Jonathan Barnes, via Pete Warden, that the HorizontalAccuracy field may be significant, with lower values indicated accurate locations via GPS. However, I did a quick test, for example this map filters to only show records with this field set to 500.0, the minimum value I found from a quick skim (Fusion tables seems to treat this as a string rather than a number), and while this reduces the number of records it doesn't offer any noticeable change in accuracy - it still includes all the readings from Coors Field and Mile High Stadium where I haven't actually been.

Pete also pointed me at another table in the backup called WifiLocation, which in my case was about 5 times larger than CellLocation. I have loaded this to Fusion Tables here. One interesting thing about this table is that data only shows up in North America (with one random exception in Munich, where I haven't been recently). It seems a little more focused on areas I've been to, but no more revealing in terms of showing specific locations where I've spent time.

As I said, feel free to play with the tables I uploaded, and let me know if you find anything interesting! But my conclusion remains that this data doesn't reveal where you've been with any degree of accuracy.

Update: see my latest post where I conclude that this is data being downloaded from Apple, rather than uploaded, and that detailed history is not stored - thanks to Jude in the comment below for his suggestion about this.

So actually, Apple isn't recording your (accurate) iPhone location

So over the past couple of days there has been mass hysteria, questions in Congress, etc, over the fact that Apple is apparently recording all the locations you've been to with your iPhone without telling you, and storing it without encryption. The news was broken by my friend Pete Warden at Where 2.0 last week and has escalated rapidly since then. As someone who publishes their location anyway (you can see where I am right now by checking the right hand panel on my blog) I was less concerned about this than many, though I agree that Apple should make it clear that they are recording this information and give you the option to turn it off, plus it should be stored more securely.

However, yesterday Sean Gorman posted that he had analyzed his data, and the interesting thing is that it wasn't accurate - it showed the general areas he'd been to, but didn't reveal where he lived or where he worked. And then I also found this post by Will Clarke, followed by this one, which also conclude that whatever the data is, it isn't your accurate location (though I think Will prematurely concludes that it is cell tower locations - Sean's analysis suggests that isn't the case, though it seems it may well be related to this).

I just had a good chat on the phone with Pete about these posts, and about my findings which I'll get onto in a moment, which similarly conclude that whatever is being tracked, it isn't your accurate location. Pete said that their conclusions were similar, but also that he didn't think it was simply cell towers. I know that my iPhone knows my location much more accurately than the locations that I see in the data I've looked at. For me, as for Sean, there was no cluster of points either at my home or my office. Pete asked me if I'm on WiFi rather than 3G at home and at work, and the answer is yes, so there may be some clue there.

But the main point of these posts, and mine, is that this data does NOT indicate where you live, where you work or any exact locations you've been to. This is not reflected in most of the reporting you see about the topic.

I thought I'd share some screen shots of maps that I got, which I actually thought were cool :). Since I travel quite a bit, I have a few interesting examples which might give some clues as to what this location data actually does represent. The detailed (larger scale) maps here show a grid of dots, which is something introduced by Pete's map display tool rather than how the underlying data is. I will try to play around a bit more to get at the raw data, but thought I would share these initial findings first.

So to start with, here's an overview of my world travels of the past few months, which seems pretty accurate, and goes back to at least September:
01 World

Here's a view zoomed in on the US. The interesting thing here is that New York has the largest bubble over it, but I only spent two days there on a recent trip (1 day in Manhattan, 1 day on Long Island). Denver where I live has a much smaller blob.
02 North America

Here's a map of Colorado - there seem to be quite a few outliers here on the south side of the map - I think that the closest I've been to these in recent months is Keystone, where you see a cluster of dots. Some of these dots are probably 50 miles away from where I was.
03 Colorado

Zooming in on Denver, you see a lot of activity. I'm sure I haven't covered Denver quite as comprehensively as the dots here suggest.
04 Denver

In this map of downtown Denver you see the gridding which somewhat obscures the underlying data. However, the largest dot is some way away from my home (which as I think everyone knows is above the famous Wynkoop Brewing Company), and the dots are fairly evenly spread - these certainly do not indicate where I spend most of my time downtown.
05 Denver downtown

Similarly, my office (where I usually work a couple of days a week, the other days I work at home) does not jump out on this map of the Denver Tech Center (as you can find out from our web site, the Ubisense office is at 5445 DTC Parkway).
06 Denver Tech Center

On to my UK travels - I think the data includes two trips there. There seem to be quite a few outliers here also, and some fairly large clusters in places I just passed through on the train. I spent most of my time on these trips at my mother's house in Cropston, just north of Leicester, which isn't reflected in the data. I spent some time in London, but it has a disproportionate representation on the map (as New York did in the US map).
07 UK

Zooming in to the Leicester area, you can see Cropston just to the north of the city, which is where I spent nearly all my time, and this has no readings. I didn't travel around Leicester nearly as much as the dots would suggest. So this map is very misleading in terms of where I spent my time in this area.
08 Leicester

This map of Zurich is interesting: I connected through Zurich airport in November, en route to Denmark. I spent maybe 3 hours in the airport and didn't leave it, but you can see lots of outliers, which are up to about 20 miles away.
09 Zurich Airport

Here's a map of my trip to Denmark, where I spent time in Copenhagen, Aarhus and Naestved. The interesting thing on this one is that I just drove straight across the island of Funen (Fyn) in the middle of the map, but you can see quite a scatter of readings on either side of the road, especially to the south on the east side of the island.
10 Denmark

Almost at the end ... I included this map of Paris as I thought it was interesting that we traveled from London to Paris and back on the Eurostar train, but no points show up along the route. There's an odd horizontal line of locations to the north of Paris, but nothing apart from that between Paris and London.
11 Paris

And finally an example from Sydney. This shows a disproportionate number of readings at the airport in the south, where I just arrived and left but didn't spend any time. It doesn't show that I spent a good amount of time downtown, and I gave a talk in Paramatta where there is just one isolated dot. I stayed with friends north of Sydney but again you can't tell where.
12 Sydney

While I don't want to be an apologist for Apple, and what they are doing here is careless at best, my general conclusion is that this is likely something unintentional, similar to the Google Street View WiFi data fiasco. If Apple wanted to track your location history, why wouldn't they use your accurate location, which I know my phone knows much more accurately than is shown in the data in these files.

The interesting question for us geo-geeks is exactly what the location data is - something related to cell towers seems plausible. I will try to poke around in the raw data a little more. Since I have a few interesting example cases, am happy to share my data if anyone wants to look at it. Pete just tweeted that there is another table with WiFi locations, that would be an interesting thing to explore.

Update: I've done a new post which includes maps with the raw data, using Google Fusion tables. Doesn't change the conclusion that the data doesn't accurately represent your actual location, but does show some interesting new patterns.

Wednesday, April 13, 2011

So long to the GITA "annual conference" and thanks for the memories

I just wrapped up the closing panel at the 34th and last GITA "annual conference" (officially known as the geospatial solutions conference these days), which was quite a sad moment for me. I attended my first GITA (then AM/FM, Automated Mapping and Facilities Management) conference in 1992, and have only missed one since then. Especially back in the 1990s, and into the early 2000s, it was always the highlight of the year in the part of the geospatial industry that I worked in, focused on organizations managing infrastructure like gas and electric utilities, telcos, water and waste water and local government. I should say before going too far that this does not mean that GITA the organization is going away. They already do a lot more than just the annual conference, and are actively looking at various new opportunities - I'll come back to this below.

Above is a portion of the Smallworld team that attended the 1993 conference, which is where we launched Smallworld in the US, and I moved over here from the UK later that year. From left to right you see Sean Newell, Jay Cadman, Ali Newell and me. Somehow we retrieved various signs at the end of the conference, in order to illustrate our predictions for the GIS industry :). In the late 90s attendance would be over 3000 people I think. It was the place to go for organizations selecting new GIS products. I remember sometime in the late 90s we upped the ante at Smallworld by giving away a car at the show (a new Volkswagen beetle) ... will have to try to dig out some of the (pre-digital) pics of that! And I gave many presentations at GITA over the years.

But since 2005 the GITA "annual conference" (under a couple of different names) has had declining attendance, and so GITA Executive Director Bob Samborski and the GITA Board, which I rejoined this year after a few years off it, reluctantly concluded that the right thing to do was to make this the last annual conference in its current form (a decision I agree with). Here's a picture of me and Bob with then Denver Mayor, now Colorado Governor, John Hickenlooper at the GIS in the Rockies conference in 2004...

GIS in the Rockies

GITA has always been associated with the annual conference which was historically its biggest event, but for quite some time now this has just been a part of its activities. It also runs the GIS for Oil and Gas conference, now in its 20th year, which remains strong and will continue. And international affiliates such as GITA Australia and New Zealand continue to be strong. This year it is also playing a leading role in organizing the FOSS4G conference, for which I am conference chair. It organizes a successful program called GECCo, Geospatially Enabling Community Collaboration, to encourage collaboration between infrastructure organizations, with a particular focus on emergency response. GITA recently received a grant from the Department of Homeland Security to expand the GECCo program and organize quarterly workshops over the next 3 years. GITA has an active network of regional chapters that organize local events, and those will likely receive more focus now. Also I think there will be a stronger focus on delivery of educational material via various online channels. And there are a number of other ideas under consideration for new events and alliances with other industry organizations.

So GITA will continue, but with a somewhat modified focus. Things are changing rapidly in the geospatial world and I agree it's the right decision, but it's still a bit sad for those of us who have had such a long association with the annual conference. Everyone involved can be proud of what the GITA conference has contributed to the growth of the industry over the last 34 years! (Yes, probably longer than a lot of you "neo" readers have been alive!).