Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams
- Arnd Christian König ,
- Anshumali Srivastava ,
- Misha Bilenko
ACM SIGMOD Conference |
Published by ACM - Association for Computing Machinery
Obtaining frequency information of data streams, in limited space, is a well-recognized problem in literature. A number of recent practical applications (such as those in computational advertising) require temporally-aware solutions: obtaining historical count statistics for both time-points as well as time-ranges. In these scenarios, accuracy of estimates is typically more important for recent instances than for older ones; we call this desirable property as “Time Adaptiveness“. With this observation, earlier work introduced the “Hokusai” technique based on count-min sketches for estimating frequencies of any given item at any given time. The proposed approach is problematic in practice as its memory requirements grow linearly with time and it produces discontinuities in the estimation accuracy. In this work, we describe a new method, “Time Adaptive Sketches” (Ada-sketches), that overcomes these limitations, while extending and providing a strict generalization of several existing popular sketching algorithms.
The core idea of our method is inspired by the well-known digital Dolby noise reduction procedure that dates back to 1960s. Theoretical analysis presented could be of independent interest in itself, as it provides clear results for the time-adaptive nature of the errors. Experimental evaluation on real streaming datasets demonstrates the superiority of the described method over Hokusai in estimating point and range queries over time. The method is simple to implement and offers a variety of design choices for future extensions. The simplicity of the procedure and the method’s generalization of classic sketching techniques give hope for wide applicability of Ada-sketches in practice.
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.