Protecting Respondents' Identities in Microdata Release
Pierangela Samarati
Dipartimento di Informatica
Università degli Studi di Milano
pierangela.samarati@unimi.it
Abstract
Today's globally networked society places great demand on the
dissemination and sharing of information. While in the past released
information was mostly in tabular and statistical form, many
situations call today for the release of specific data (microdata).
In order to protect the anonymity of the entities (called respondents) to
which information refers, data holders often remove or encrypt
explicit identifiers such as names, addresses, and phone
numbers. De-identifying data, however, provides no guarantee of
anonymity. Released information often contains other data, such as
race, birth date, sex, and ZIP code, that can be linked to publicly
available information to re-identify respondents and inferring
information that was not intended for disclosure.
In this paper we address the problem of releasing microdata while
safeguarding the anonymity of the respondents to which the data
refer. The approach is based on the definition of
k-anonymity. A table provides k-anonymity if attempts to link
explicitly identifying information to its content map the information
to at least k entities. We illustrate how k-anonymity can be
provided without compromising the integrity (or truthfulness) of
the information released by using generalization and suppression
techniques. We introduce the concept of minimal
generalization that captures the property of the release process not
to distort the data more than needed to achieve k-anonymity, and
present an algorithm for the computation of such a generalization. We
also discuss possible preference policies to choose among different
minimal generalizations.
IEEE Transactions on Knowledge and Data Engineering, vol. 13, n. 6, November/December 2001, pp. 1010-1027
pdf file
|