The several privacy issues with so-called 'anonymised' data
Get a powerful Linux Dual-Core dedicated server for less than $2.67 a day!Tweet Share on Twitter.
September 22, 2014
A group of scientists and researchers partly supported by German software giant SAP has taken a look at one of the critical privacy issues with so-called 'anonymised' data.
Anonymised data is the manner in which spatial correlations in mobile data can be used to re-identify individuals in large data sets.
Of course, when it comes to privacy issues, location data is the huge problem, the Singapore-led group says.
Even if the resolution of a phone's GPS records is reduced in a stored dataset, following a user's tracking for long enough will easily identify that user, and that's a major problem for most people.
“To be sure, simply removing identifiers from location information, or reducing the granularity of the location or time, does not prevent disclosure of personally identifiable information,” the paper states. “Individuals are highly re-identifiable with only a few spatio-temporal points”.
Just how revealing location trajectories are in smartphones is revealed in their analysis of 56 million records-- “With two random points, about 61.3 percent of the trajectories are unique”, they write.
The researchers say anonymisation of mobile datasets is greatly improved if the “trajectories” – literally, the “where the user has been” location datasets – are reduced.
In that manner, anonymity can be better protected, and without trashing the utility of the dataset itself.
The researchers, Yi Song of the National University of Singapore, working under an SAP internship, Daniel Dahlmeier of SAP and Stephane Bressan of the National University of Singapore, note that the longer a user's location data can be strung into a trajectory, the easier it is to identify that user.
So it's important to note that just a couple of location data points is nowhere near as useful as 24-hours' worth of the user's movements.
The concept of that approach is that only one parameter needs to be adjusted to give users better anonymity IE-- the time window that trajectories are cut into.
That's a simple enough operation that they believe and they hope that it will be scalable to very large data sets.
Get a powerful Linux Dual-Core dedicated server for less than $2.67 a day!