On-The-Fly Progress Detection In Iterative Stream Queries
- Badrish Chandramouli ,
- Jonathan Goldstein ,
- David Maier
International Conference on Very Large Databases (VLDB), Lyon, France |
Multiple researchers have proposed cyclic query plans for evaluating iterative queries over streams or rapidly changing input. The Declarative Networking community uses cyclic plans to evaluate Datalog programs that track reachability and other graph traversals on networks. Cyclic query plans can also evaluate pattern-matching and other queries based on event sequences.
An issue with cyclic queries over dynamic inputs is knowing when the query result has progressed to a certain point in the input, since the number of iterations is data dependent. One option is a “strictly staged” computation, where the query plan quiesces between inputs. This option introduces significant latency, and may also “underload” inter-operator buffers. An alternative is to settle for soft guarantees, such as “eventual consistency”. Such imprecision can make it difficult, for example, to know when to purge state from stateful operators.
We propose a third option in which cyclic queries run continuously, but detect progress “on the fly” by means of a Flying Fixed-Point (FFP) operator. FFP sits on the cyclic loop and circulates speculative predictions on forward progress, which it then validates. FFP is always able to track progress for a class of queries we term strongly convergent. A key advantage of FFP is that it works with existing algebra operators, thereby inheriting their capabilities, such as windowing and dealing with out-of-order input. Also, for stream systems that explicitly model input-event lifetimes, we know exactly which values are in the query result at each point in time.
A key implementation decision is the method for speculating. Using the high-water mark of data events minimizes the number of speculative punctuations. Probing operators on the cyclic loop to determine their external progress circulates many more speculative messages, but tracks actual output progress more closely. We show how a hybrid approach limits predictions while coming close the progress-tracking ability of Probing.
All articles published in this journal are protected by copyright, which covers the exclusive rights to reproduce and distribute the article (e.g., as offprints), as well as all translation rights. No material published in this journal may be reproduced photographically or stored on microfilm, in electronic data bases, video disks, etc., without first obtaining written permission from Very Large Data Bases Endowment Inc.