Events
The Taub Faculty of Computer Science Events and Talks
Amir Abboud (M.Sc. Thesis Seminar)
Wednesday, 30.05.2012, 16:30
Advisor: Prof. A. Schuster and Prof. D. Keren
Many monitoring tasks over distributed data streams can be formulated
as a continuous query using a function that is defined over the
global average of data vectors derived from the streams.
The query will typically produce an alert when the value of
the function crosses a predefined threshold.
A fundamental problem in efficient scalable implementation of
such threshold queries is that the data streams are
distributed, sometimes over a wide geographical region.
Moving all the data to a centralized data center for query processing
may incur infeasible communication overheads and inflated data center
resource costs.
In some cases it may be prohibited altogether by the sheer
aggregated size of the data, or by privacy laws.
The goal is thus to enhance scalability by processing the query
locally, using as little communication and global coordination as
possible.
We present a novel scheme for communication reduction in distributed
monitoring using local constraints.
Communication and global coordination are required only in the
event that the local constraints are violated by the incoming data.
Our work improves on previous work in a few critical aspects.
First, whereas previous work required constructing a
``distributed cover'' of the entire convex hull of the local data vectors,
our work compiles constraints that are designed to cover only the global
average; further, they are directly matched and tailored
to fit the local data distribution at each stream.
The result is a dramatic decrease in the required volume of
communication compared to previous state of the art, up to two orders
of magnitude in our experiments with real-life data. Both the experiments
and theoretical study suggest that the improvement factor increases
with the dimension of the data.
Also, in contrast to previous work, which necessitated complicated
constraints and required enormous computational effort over each of
the streams, our scheme can use very simple constraints which incur
negligible local overhead.
This latter advantage makes our new approach applicable to thin,
battery-operated sensors and cellular devices.