Theory Seminar: Zero-One Laws for Sliding Windows and Universal Sketches

Alan Roytman (Tel-Aviv University)

Wednesday, 19.11.2014, 12:30

Taub 401

As the amount of data being generated continues to grow at a staggering rate, streaming algorithms are increasingly becoming more important as a practical tool to analyze and make sense of all the information. In practice, such applications generate vast amounts of data in a very short period of time, and hence it is infeasible to store everything. This presents a pressing question: when is it possible to summarize data while still providing approximate solutions with good theoretical guarantees?

In this talk, we strive to answer this question for frequency-based, monotonically increasing functions in the sliding window model. We make progress on two significant, open problems outlined by Nelson and by Sohler. Specifically, we formalize the notion of universality for streaming over sliding windows, and our main result is the construction of a universal algorithm that uses polylogarithmic space in the timestamp-based sliding window model for a broad class of monotonically increasing functions. That is, we define a class of functions and design a single streaming algorithm that produces a data structure with the following guarantee. When querying the data structure with a function G taken from the class, our algorithm approximates \sum_{i=1}^n G(m_i) without knowing G in advance (here, m_i represents the frequency that element i from a universe of size n appears in the window).

Along the way to achieving our main result, we design a zero-one law for a broader class of monotonically increasing functions G that specifies when the summation \sum_{i=1}^n G(m_i) can be approximated with high probability in one pass, using polylogarithmic memory. If G satisfies the conditions specified by the test, then given the function G we construct an explicit, general algorithm that is able to approximate the summation using polylogarithmic memory. If the function G does not pass the test, then we provide a lower bound which proves it is impossible to do so using polylogarithmic memory.

In this talk, we strive to answer this question for frequency-based, monotonically increasing functions in the sliding window model. We make progress on two significant, open problems outlined by Nelson and by Sohler. Specifically, we formalize the notion of universality for streaming over sliding windows, and our main result is the construction of a universal algorithm that uses polylogarithmic space in the timestamp-based sliding window model for a broad class of monotonically increasing functions. That is, we define a class of functions and design a single streaming algorithm that produces a data structure with the following guarantee. When querying the data structure with a function G taken from the class, our algorithm approximates \sum_{i=1}^n G(m_i) without knowing G in advance (here, m_i represents the frequency that element i from a universe of size n appears in the window).

Along the way to achieving our main result, we design a zero-one law for a broader class of monotonically increasing functions G that specifies when the summation \sum_{i=1}^n G(m_i) can be approximated with high probability in one pass, using polylogarithmic memory. If G satisfies the conditions specified by the test, then given the function G we construct an explicit, general algorithm that is able to approximate the summation using polylogarithmic memory. If the function G does not pass the test, then we provide a lower bound which proves it is impossible to do so using polylogarithmic memory.