Privacy-Aware Personalized Entity Representations for Improved User Understanding
- Levi Melnick ,
- Hussein Elmessilhy ,
- Vassilis Polychronopoulos ,
- Gilsinia Lopez ,
- Yuancheng Tu ,
- Omar Zia Khan ,
- Ye-Yi Wang ,
- Chris Quirk
PrivateNLP @ WSDM |
Representation learning has transformed the field of machine learning. Advances like ImageNet, word2vec, and BERT demonstrate the power of pre-trained representations to accelerate model training. The effectiveness of these techniques derives from their ability to represent words, sentences, and images in context. Other entity types, such as people and topics, are crucial sources of context in enterprise use-cases, including organization, recommendation, and discovery of vast streams of information. But learning representations for these entities from private data aggregated across user shards carries the risk of privacy breaches. Personalizing representations by conditioning them on a single user’s content eliminates privacy risks while providing a rich source of context that can change the interpretation of words, people, documents, groups, and other entities commonly encountered in workplace data. In this paper, we explore methods that embed user-conditioned representations of people, key phrases, and emails into a shared vector space based on an individual user’s emails. We evaluate these representations on a suite of representative communication inference tasks using both a public email repository and live user data from an enterprise. We demonstrate that our privacy-preserving light weight unsupervised representations rival supervised approaches. When used to augment supervised approaches, these representations are competitive with deep-learned multi-task models based on pre-trained representations.