Paul Ohm has written the first comprehensive law review article that incorporates an important new subspe- cialty of computer science, reidentifica- tion science, into legal scholarship. His research and findings unearth a tension that shakes a foundational belief about data privacy: Data can be either useful or perfectly anonymous but never both. The excerpt highlights his find- ings on the failures of anonymization, a long-standing privacy protection tech- nique that has become the bedrock of privacy legal frameworks, laws, and policies. While the author’s audience is regulators and the legal community, his findings are relevant to location data, technology, and application providers who collect, aggregate and distribute location data, and who seek to develop proactive policies to balance business objectives with privacy protections.
With the permission of Paul Ohm, we are reproducing an excerpt of his original article Broken Promises of Privacy: Responding To The Surprising Failure of Anonymization, published in the UCLA Law Review (57 UCLA Law Review 1701 (2010).
Computer scientists have recently undermined our faith in the privacy- protecting power of anonymization, the name for techniques that protect the privacy of individuals in large databases by deleting information like names and social security numbers. These scientists have demonstrated that they can often “reidentify” or “deanonymize” individuals hidden in anonymized data with astonishing ease. By understanding this research, we realize we have made a mistake, labored beneath a fundamental mis- understanding, which has assured us much less privacy than we have assumed. This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention.
Anonymization: the Purging of Personal Information
Anonymization plays a central role in modern data handling, forming the core of standard procedures for storing or disclosing personal informa- tion. Anonymization is a process by which information in a database is manipulated to make it difficult to identify data subjects. Database experts have developed scores of different anonymization techniques, which vary in their cost, complexity, ease of use, and robustness. A very common technique is suppression, whereby a data administrator suppresses data by deleting or omitting it entirely. For example, a hospital data administrator tracking prescriptions will suppress the names of patients before sharing data in order to anonymize it.
Data administrators anonymize to protect the privacy of data subjects when storing or disclosing data. They disclose data to three groups:
- THIRD PARTIES: For example, health researchers share patient data with other health researchers, websites sell transaction data to advertisers, and phone companies can be compelled to disclose call logs to law enforcement officials.
- THE PUBLIC: Increasingly, administrators do this to engage in what is called crowdsourcing—attempting to harness large groups of volunteer users who can analyze data more efficiently and thoroughly than smaller groups of paid employees.
- OTHERS WITHIN THEIR ORGANIZATION: Particularly within large organizations, data collectors may want to protect data subjects’ privacy even from others in the organization. For example, large banks may want to share some data with their marketing departments, but only after anonymizing it to protect customer privacy.
Reindentification: The “Reverse Engineering” of Anonymized Data
The reverse of anonymization is reidentification or deanonymization. Anonymized data is reidentified by linking anonymized records to outside information, hoping to discover the true identity of the data subjects. Advances in reidentification should trigger a sea change in the law because nearly every information privacy law or regulation grants a get-out-of-jail-free card to those who anonymize their data. In the United States, federal privacy statutes carve out exceptions for those who anonymize.
About fifteen years ago, researchers started to chip away at the robust anonymization assumption, the foundation upon which [privacy policies and laws] have been built. Recently, however, they have done more than chip away; they have essentially blown it up, casting serious doubt on the power of anonymization, proving its theoretical limits and estab- lishing what I call the easy reidentification result. This is not to say that all anonymization techniques fail to protect privacy—some techniques are very difficult to reverse—but researchers have learned more than enough already for us to reject anonymization as a privacy-providing panacea.
Examples of Anonymization Failure
A. THE AOL DATA RELEASE. On August 3, 2006, America Online (AOL) announced a new initiative called...
The complete article is available in the Spring 2012 Digital Issue.