Sunday, April 24, 2011

The scoop: Apple's iPhone is NOT storing your accurate location, and NOT storing history

The Summary
So in my previous two posts I discussed how the data I was seeing in my iPhone location logs was actually not very accurate, and certainly didn't reveal where I lived or worked or had stayed on my travels - beyond showing the cities I had been to, including general areas I had visited, as well as some I hadn't. There had been some discussion that the data appeared to be, in a number of cases, the location of cell towers you had been in communication with, although in some cases locations were a long way from where you had been.

The quick summary: I believe I have confirmed that Apple is not storing your location, but the (actual or estimated) location of cell towers (and WiFi access points) that are close to you, to help locate you as you move (these are not necessarily towers that you have been in communication with). In the data I have examined there is nothing that is based on the accurate location of the iPhone. For a good example, see my previous post showing the location of cell equipment in Coors Field baseball stadium, and not revealing the location of my home which is very close to there. In my opinion, if Apple was storing this data in order to know where you had been, they would be storing different, more accurate location data that they have access to.

And, importantly, they are not storing history - the only thing that can be found from the files is when you last visited a general area, not if you made repeat visits. This is especially important as it means that many of the concerns expressed about this data are simply not valid: it cannot be used to determine where you live, or work, or go to school, or who your doctor is.

Here is a report of what Al Franken said:
Sen. Al Franken, a Minnesota Democrat, said it raises “serious privacy concerns,” especially for children using the devices, because “anyone who gains access to this single file could likely determine the location of a user’s home, the businesses he frequents, the doctors he visits, the schools his children attend and the trips he has taken — over the past months or even a year.”
The only part of this that is correct is that the data will show what cities you've visited, with some indication of which parts of a city you may have visited, though nothing definite - there will be records in areas you didn't visit. And it doesn't show repeated visits to the same location, only the last one.

Update: see below for a very interesting comment from "Anonymous", who includes a link to a document submitted by Apple to Congress in July 2010. This includes the following:
"When a customer requests current location information ... Apple will retrieve known locations for nearby cell towers and Wi-Fi access points from its proprietary database and transmit the data back to the device" ... "The device uses the information, along with GPS coordinates (if available), to determine its actual location. Information about the device's location is not transmitted to Apple, Skyhook or Google. Nor is it transmitted to any third-party application provider, unless the customer expressly consents". 
The data under discussion in this whole debate is clearly (in my opinion) a cache of the data mentioned here of nearby cell towers and Wi-Fi access points. I guess the remaining valid concern is that this cache is not stored as securely as it could be, and a fairly large amount of data is stored in the cache. But still this data provides only relatively coarse information as discussed here, and is stored only on the user's own computer, so the risks are relatively minor compared to many of the more dramatic scenarios that have been raised.

Update April 27: Apple has issued a Q&A document about all this, which confirms the conclusions I had drawn, and talks about changes they will make. See my thoughts here.

Read on to find out how I reached these conclusions.

The details
Last night someone called Jude commented on my last post, saying:
My Guess?

It's not a list of cell phone locations that you've been to, but the opposite, a list of cell phone locations near you downloaded to the iPhone from Apple in case you move into range of one of them. i.e. At a guess what is happening is location services identifies a cell tower and asks for its location, and is replied to with the list of locations that contains that cell tower, that list is then cached so that it does not need to be requested again.

Of course, this is only a guess based on the wide range of addresses people are seeing and how its near to, but not exactly where, the people have traveled.
Good thinking Jude! I thought this could explain a lot, so I investigated further. First I looked at some data from my fairly recent New York trip. I looked at the timestamps on some locations and did a query to display all the locations with the same timestamp. I found out that in general, quite a number of records shared the same timestamp, and they would be clustered in the same area. For example, this screen shot shows a set of records that were all loaded at exactly the same time:
Screen shot 2011-04-24 at 7.25.30 AM
This cluster of points is some way above where I drove, I was driving along the Long Island Expressway going east from LaGuardia Airport. The timestamp appears to be in seconds and has 7 decimal places, so it is apparent that this set of data must have been downloaded in a single transaction, it was not obtained by communicating with cell towers at each of these locations independently. It seems reasonable to assume that this data was downloaded to help locate me in the event that I drove into this area (which I didn't). You can observe similar clusters by clicking a dot at random, copying the timestamp, and running a filter in Google Fusion Tables to display all dots with the same timestamp.

What I really wanted to do now was to animate my data, to more easily visualize what was happening. I couldn't figure out an easy way to do this in Google Fusion tables - although it has some capability for this, it wasn't recognizing the timestamp field as a date-time. So I went to look at the data that Sean Gorman had posted of his logs at GeoCommons (my original file had been too large to visualize there without me doing a little more work). GeoCommons has a cool animation capability, which you can try out on Sean's map by dragging the sliders at the bottom left.

I found something really interesting when I zoomed in around the geoIQ office in Arlington, where Sean works. This screen shot shows that between November 11, 2010 and April 20, 2011, there is no record of Sean being at his office.
Screen shot 2011-04-24 at 8.12.15 AM
Now I know that Sean likes to escape for a spot of skiing in Colorado now and then, but that's a pretty long absence for a company President :) ! And I know I have met with him in the office during that time period.

If you drag the time slider a little further, then at the same instant, about 20 more locations appear on the map, covering a general area around the office, roughly half a mile square:
Screen shot 2011-04-24 at 8.12.31 AM
So from this data I can tell that Sean was somewhere in the general area of this half mile square (not necessarily inside it) on April 20. I know nothing about whether he was there before that, and I don't know anything about exactly where he went.

So, this data stored in the iPhone logs is much less revealing than it may initially seem. At a quick glance it does look like it is recording your location history, and I think that Pete Warden and Alasdair Allan were quite right to raise the concerns that they did. It takes some digging in the data to realize that the concerns are not nearly as bad as they appeared at first sight. By publicizing it as they did, and providing their tools and documentation on how to examine the data, they made it easy for others like myself, Sean Gorman and Will Clarke to analyze the data and figure out more about what is going on.

It's still not clear exactly what the data is for, but my guess, as Jude suggested, is that it is to aid in fast location determination - once the iPhone figures out that you're in an area, it downloads data for surrounding cell towers (and Wifi hotspots, a detail I haven't gone into here but the data is available for those too, as discussed in my previous post), so it can quickly locate you as you move around that area (update: see the first comment below, and my addition to the initial summary, which reference a document from Apple that confirms that this is the case).

So to summarize again, there are still some concerns with this data - it does give an approximate indication of places you've been, but not good enough to identify specific buildings or businesses. It doesn't record history - there is no way to tell if you've visited a location multiple times, you can just tell the last time you visited a general area (though there might be clues about multiple visits - for example data showing you visited a neighboring area on a different date, but nothing definitive or detailed about repeat visits). But it definitely doesn't reveal the sort of detailed information that many people have been concerned about.

19 comments:

Anonymous said...

As has been lost in all the excitement, this isn't new information nor is it secret. Apple provided complete details/limitations of the Core Location data collection and transfer activities in July last year http://www.wired.com/images_blogs/gadgetlab/2011/04/applemarkeybarton7-12-10.pdf

This cache was also detailed in the WWDC 2010 session entitled “Using Core Location in iOS”.

Peter Batty said...

@Anonymous you're right, that information has been lost in all the fuss, this is the first time I'd seen it. Thanks for this document, it is very helpful.

The main items relating to this discussion are on p7, including the following:

"When a customer requests current location information ... Apple will retrieve known locations for nearby cell towers and Wi-Fi access points from its proprietary database and transmit the data back to the device" ... "The device uses the information, along with GPS coordinates (if available), to determine its actual location. Information about the device's location is not transmitted to Apple, Skyhook or Google. Nor is it transmitted to any third-party application provider, unless the customer expressly consents".

The data under discussion in this whole debate is clearly (in my opinion) a cache of the data mentioned here of nearby cell towers and Wi-Fi access points. I guess the remaining valid concern is that this cache is not stored as securely as it could be, and a fairly large amount of data is stored in the cache. But to me that is a relatively minor concern compared to many of the more dramatic ones that have been raised.

Harry Wood said...

Did you see Ollie Obrien's blog post on the matter? He managed to mix in the wifi data (also in the file) to give a more accurate view of where he's been around London, but maybe still not enough to figure out where his office and home are.

Peter Batty said...

@Harry thanks for the link. Yes I've looked at the WiFi data and didn't find it gave away anything more - if you can tell where I live from this map of the wifi locations in my log in Denver, I'll buy you a beer :)

Sebbi said...

Came to the same conclusion after investigating my own iphone "location log".

If ownly bloggers and news writers had down that bevor posting what others told them too. It does happen everytime something more technical happens ... nobody takes the time to properly investigate what they are writing about :(

Hal Hildebrand said...

Um, you do know the app purposefully adds distortion to the data by mapping to a grid, right? This is a distortion in the app's code not in the actual data.

Get the source and remove that distortion and you'll see how much more accurate the data actually is.

Xion said...

nice post.

for step by step guide to jailbreak your iphone visit

http://www.xionsms.com/2011/04/jailbreak-iphone-4-and-iphone-3gs-using.html

Peter Batty said...

@Hal read my previous posts - in the first one I used Pete Warden's app which does some aggregation and hides some detail. In the second one I extracted the raw data and uploaded to Google Fusion Tables, so there is no loss of info there. I show multiple examples of how the data does not indicate anything about individual addresses you've been to, just general areas.

Leslie said...

How does a non-Mac person get this info off their phone? I saw the app that was created but I am a PC person. I would love to see what I have been up to.

Peter Batty said...

@Leslie Pete Warden mentions here that there are some Windows ports of his app. I did a quick search and found this - haven't tried it as I'm a Mac guy :). There may be other options around too.

Anonymous said...

All this fuss about revealing your home address...why not just look it up in the white pages. For most people, it's still available there, on search engines, and nowadays, peoples business addresses are available in LinkedIn, etc.

Talk about making a mountain out of a molehill.

Anonymous said...

All this paranoia about the iPhone storing locations, who the hell cares, it some ways it could actually be a good thing, that way if criminals who have iPhones are caught, police could easily see if they were at the location of the crime at the exact time the crime was carried out. BUSTED

In fact a few years ago in South Florida, police were able to solve a murder because the scumbag took the victims iPhone and through cell tower records were able to determine where the iPhone was, yep right in the suspects home

Tim Beidel said...

GPS used to initialize faster if the chip had some idea of where you were when it began searching for satellites and began triangulating. Is it possible this cache enables faster map initialization? I've been amazed at how fast the iPhone picks up where I am - it's nearly instant.

Contrast that with my Garmin car GPS, which often has to ask me if I have moved a significant distance from the last time I was using it when it can't initialize.

Duncan said...

Good work Peter (and Sean).

Was my immediate conclusion after cracking my data open Wed morning http://twitter.com/#!/dunkmac/status/60815593326133248 but I was far too lazy to bother doing any sensible post on it like you and Sean!

I found the experience (esp since I was at Where 2.0) of watching this thing explode made me surprisingly quite angry. Watching some smart big data guys cross to the dark side and pump a story that in reality everyone in the industry knew wasn't really a story, was quite upsetting, not to mention potentially damaging for those working in the big data and location field.

@edparsons I thought said it well "a failure in transparency perhaps, but this is how LBS works!" Most of the guys at Where 2.0 should have had the sense to know there was a non story here, not to mention this info had been in the public domain for a long time…

Clearly the momentum of the media hysteria snowball out weighed any desire to be thorough in their research or opinions… That said it was one hell of a snowball and it was probably very difficult to stop it once they got caught up in it!

MichiKami said...

Thanks Peter, this is the best discussion thread I've come across regarding the iPhone location issue.

A couple of your posts mentioned the Timestamp. These are SQLite files, and time is in 'seconds since 1/1/2001'. Excel stores time in 'days since 1/1/1900'. Here's an explanation of the data extraction and mapping process for Windows users:
http://michikami.blogspot.com/2011/04/ispyphone-your-iphone-knows-where-youve.html

You said the iPhone isn't storing history ... but the Google Fusion map shows you were at LAX on 12/3/10 at 8:26pm, right? Or did I misunderstand ... that even if you've been back to LAX, these points wouldn't be changed?

Peter Batty said...

@MichiKami Thanks!

By not storing history I mean that if I go back to LAX, that record would be over-written and you would just see the most recent time I had been there. This is important as it reveals much less about your behavior patterns. See the example I gave about Sean Gorman's office location - there was only one timestamp for records in that area, although he goes there every working day. So there is no way for me to tell that this is somewhere he visits regularly from this data.

It could be possible in some cases that a slightly different set of cell tower data gets downloaded on a subsequent visit, so you might be able to find more than one date that I was in a very general area, but you won't get clear information about repeated visits (again as in the example with Sean).

Anonymous said...

The WWDC 2010 session mentioned in an earlier explains that the iOS pre-fetches "nearby" cell and wi-fi location data, so that it can still locate you if move location and also lose internet connectivity. Maybe this is why people are saying the location data in consolidated.db is "inaccurate".

Anonymous said...

Part of my comments have already been e-mailed to Mr. Batty.

Why would Apple be interested in even the approximate location? In my area which has a high population density, transmission and I assume receiving towers are going up all over the place. There is such a thing as signal strength and triangulation. As far as I'm concerned, ones' very good approximate location can be determined easily.

Peter Batty said...

@Anonymous I'm not sure I understand your point. The iPhone is downloading information about cell towers and wifi access points close to you to assist in quickly calculating your location when applications request this.