Hermes: Clustering Users in Large-Scale E-mail Services
ACM Symposium on Cloud Computing 2010 (ACM SOCC 2010) |
Published by Association for Computing Machinery, Inc.
Hermes is an optimization engine for large-scale enterprise e-mail services. Such services could be hosted by a virtualized e-mail service provider, or by dedicated enterprise data centers. In both cases we observe that the pattern of e-mails between employees of an enterprise forms an implicit social graph. Hermes tracks this implicit social graph, periodically identifies clusters of strongly connected users within the graph, and co-locates such users on the same server. Co-locating the users reduces storage requirements: senders and recipients who reside on the same server can share a single copy of an e-mail. Co-location also reduces inter-server bandwidth usage.
We evaluate Hermes using a trace of all e-mails within a major corporation over a five month period. The e-mail service supports over 120,000 users on 68 servers. Our evaluation shows that using Hermes results in storage savings of 37% and bandwidth savings of 50% compared to current approaches. The overheads are low: a single commodity server can run the optimization for the entire system.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or [email protected]. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.