Anonymization of Set Valued Data via Top Down, Local Generalization

Proceedings of International Conference on Very Large Database (VLDB) |

Set-valued data, in which a set of values are associated with
an individual, is common in databases ranging from market
basket data, to medical databases of patients’ symptoms
and behaviors, to query engine search logs. Anonymizing
this data is important if we are to reconcile the conflicting
demands arising from the desire to release the data for
study and the desire to protect the privacy of individuals
represented in the data. Unfortunately, the bulk of existing
anonymization techniques, which were developed for scenarios
in which each individual is associated with only one sensitive
value, are not well-suited for set-valued data. In this
paper we propose a top-down, partition-based approach to
anonymizing set-valued data that scales linearly with the input
size and scores well on an information-loss data quality
metric. We further note that our technique can be applied
to anonymize the infamous AOL query logs, and discuss the
merits and challenges in anonymizing query logs using our
approach.