Saturday, April 23, 2011

More on Apple recording your iPhone location history

In my previous post I discussed how the location data being recorded from my iPhone actually wasn't very accurate, and certainly not accurate enough to tell where I live or work (based on the data I've examined so far, which is in a table called CellLocation in the iPhone backup, and is the data discussed by Pete Warden and displayed by his iPhoneTracker app, which is what I used for the visualizations in my previous post). Pete's app aggregated data to a regular grid, partly to provide additional security.

However, I was sufficiently intrigued to follow Pete's instructions to get at the raw data. My investigations with this reinforced the conclusion of the previous post, that the data does not accurately represent your location. But it did show up some interesting new patterns. I loaded the data into Google Fusion Tables and have made it public, you can view it here (and feel free to play around with it).

Here is an interesting map of downtown Denver, where I live.

This shows all the raw point data, with no aggregation or changes. There are actually no dots at all in the block where I live. However, there is a noticeable cluster in Coors Field, the Colorado Rockies baseball stadium which is 3 blocks away from where I live. I haven't been to the Rockies stadium over the time period that this data was recorded. There's also a strong cluster in Mile High Stadium, home of the Denver Broncos.

I would assume that there is additional cell phone infrastructure in these stadiums, to help cope with the heavy concentration of people. A quick Google search found this article about AT&T infrastructure at Coors Field
AT&T at Coors Field
This reinforces the notion that at least some of these locations are the locations of cell equipment that your phone is communicating with. But I'm not sure that's the whole story.

Here's a map of Cropston, where I spent most of my time on my last two visits to England. It's a small village in a fairly rural area.

Here there are no locations shown in the village itself where I spent most of my time. A lot of the locations are clustered in towns or along streets, but some seem to be more in the middle of nowhere. Hard to draw any definite conclusions.

I just received a suggestion from Jonathan Barnes, via Pete Warden, that the HorizontalAccuracy field may be significant, with lower values indicated accurate locations via GPS. However, I did a quick test, for example this map filters to only show records with this field set to 500.0, the minimum value I found from a quick skim (Fusion tables seems to treat this as a string rather than a number), and while this reduces the number of records it doesn't offer any noticeable change in accuracy - it still includes all the readings from Coors Field and Mile High Stadium where I haven't actually been.

Pete also pointed me at another table in the backup called WifiLocation, which in my case was about 5 times larger than CellLocation. I have loaded this to Fusion Tables here. One interesting thing about this table is that data only shows up in North America (with one random exception in Munich, where I haven't been recently). It seems a little more focused on areas I've been to, but no more revealing in terms of showing specific locations where I've spent time.

As I said, feel free to play with the tables I uploaded, and let me know if you find anything interesting! But my conclusion remains that this data doesn't reveal where you've been with any degree of accuracy.

Update: see my latest post where I conclude that this is data being downloaded from Apple, rather than uploaded, and that detailed history is not stored - thanks to Jude in the comment below for his suggestion about this.

9 comments:

Jude said...

My Guess?

Its not a list of cell phone locations that you've been to, but the opposite, a list of cell phone locations near you downloaded to the iPhone from Apple in case you move into range of one of them.
i.e. At a guess what is happening is location services identifies a cell tower and asks for its location, and is replied to with the list of locations that contains that cell tower, that list is then cached so that it does not need to be requested again.

f course, this is only a guess based on the wide range of addresses people are seeing and how its near to, but not exactly where, the people have traveled.

Peter Batty said...

Thanks Jude, great suggestion! See my latest post for more analysis of this theory (I agree with you).

peterburk said...

Thanks to seeing this on the news, I've written an AppleScript called iPhone Geotag. It uses the location data to tag Places for your pictures in iPhoto. Brings a happy ending to this scandal, eh? Check it out on: http://goo.gl/OQzfB

Don Cooke said...

I wish I knew more about how cell systems work, but here are a couple of possibly useful factoids:

1) back when they were designing ALI Phase 2, (Automatic Location Identification), someone ran some tests showing that this accuracy standard should be achievable using the "network solution" (trilaterating / triangulating using signal strength from several cell towers, not using a GPS in the device):

"For network-based solutions: 100 meters for 67 percent of calls, 300 meters for 95"

http://www.fcc.gov/pshs/services/911-services/enhanced911/archives/factsheet_requirements_012001.pdf

Now 95% of calls located within 300 meters gives you a "statistical guarantee" that 5% of the calls are going to be located more than 300 meters (1000 feet) from where they originated.

The second factoid is that individual cell towers are very smart about how they use the limited power they're permitted to use: it divides up the power between the phones using the tower. If there are very few phones using the tower, they get a lot of power, so call quality is better and they can use the tower from farther away. So this plus the vagueries of radio transmission may lead to your phone having occasional contact with cell towers 5-25 miles away, and at these ranges, the system's estimate of your location is going to be pretty bad.

From a wiki: "In cities, each cell site may have a range of up to approximately ½ mile, while in rural areas, the range could be as much as 5 miles. It is possible that in clear open areas, a user may receive signals from a cell site 25 miles away."

http://en.wikipedia.org/wiki/Cellular_network

I'll send you an image from ArcGIS displaying your points and cell towers. Your points clearly are not cell tower locations.

Peter Batty said...

Hi Don, thanks for your thoughts. My guess on these points is that some of them are cell towers with known and accurate locations (for example the dots inside Coors Field). However, some of the points are at locations where there clearly is no cell tower (for example Sean Gorman had one in the Pacific Ocean). My guess on these is that they are estimated locations based on some sort of calculation from signals received from iPhones in the field (in the Apple document I reference in my post following this, Apple specifically said it uploads this type of info). It would be interesting to get hold of any available data on cell tower locations and overlay it - anything you have on this front would be appreciated!

Peter Batty said...

P.S. Don, I assume the cell tower data is used as a last resort for location calculation, after GPS and WiFi. It may also be used for "assisted GPS" to help calculate a GPS location quickly.

Don Cooke said...

Actually, I think location arrives in the opposite order: cell trilateration or wifi first then GPS, as GPS can be slow to come up. GPS should always be the most accurate, and sometimes you can see the blue circle on the iPhone app shrink a couple of times as it refines the location -- probably by using a more precise source.

BTW, A-GPS (Assisted GPS) mainly works by sending the unit the satellites' ephemerides (orbital parameters) which otherwise would take 37+ seconds for the GPS unit to pick up. This allows the GPS to calculate where the satellites are and which direction they're moving, yielding the doppler shift of the signal to expect, which speeds up making sense of the very weak messages.

Lontra canadensis said...

You can get cell tower locations from the FAA I think. I had to get them a while back when a new airport was being planed here in my county. Cant remember if its a table of x,y or if it comes as a shapefile though.

Timbo said...

The Apple press release vindicates your post. As a Windows 7 user, I cracked my own data myself, and ran it through a few GIS programs, as well as Google Fusion Tables.

It was quickly obvious that the data was not tracking data. For starters, it only had 10 timestamps for 7 months! Also, the locations I most frequent did not show up, even after temporal filtering.

If you get the same experience as I did, you can select each timestamp cluster and see that the points often form a circle. Looks like a buffer select from Apple's database, and the timestamp is the SQL transaction COMMIT (download) time.

So it's just a big media beat-up again, especially when you can just click an iTunes checkbox to 'encrypt backup'

Now the Playstation network getting hacked, that's worth getting excited about. Massive...