Colloquia and Seminars
To join the email distribution list of the cs colloquia, please visit the list subscription page.
Computer Science events calendar in HTTP ICS format for of Google calendars, and for Outlook.
Academic Calendar at Technion site.
Upcoming Colloquia & Seminars
Wednesday, 26.5.2021, 18:00
Rafael will hold a Meetup meeting with the participation of Gidi Weiss, VP of Marketing and Business Development in the division, who will talk about the most advanced security technologies in the world.
The meeting will take place on Wednesday, May 26, 2021, at the Nola Socks Pub, Haifa, and participation requires pre-registration
Michael Leybovich, M.Sc. Thesis Seminar
Advisor: Prof. Oded Shmueli
There has been extensive research on data provenance. Previous works were concerned with annotating the results of database (DB) queries with provenance information which is useful in explaining query results at various resolution levels. In this work, we track the lineage of tuples throughout their database lifetime. That is, we consider a scenario in which tuples (records) that are produced by a query may affect other tuple insertions into the DB, as part of a normal workflow. As time goes on, provenance explanations for such tuples become deeply nested, increasingly consuming space, and resulting in decreased clarity and readability.
We present a novel approach for approximating lineage tracking. We use Machine Learning (ML) and Natural Language Processing (NLP) techniques; mainly, word embedding.
The basic idea is summarizing (and approximating) the lineage of each tuple via a small set of constant-size vectors (the number of vectors per-tuple is a hyperparameter). For explicitly (and independently of DB contents) inserted tuples - the vectors are obtained via a pre-trained word vectors model over their underlying database domain “text”. During the execution of a query, we construct the lineage vectors of the final (and intermediate) result tuples in a similar fashion to that of semiring-based exact provenance calculations. We extend the + and * operations to generate sets of lineage vectors, while emphasizing the ability to propagate information and preserve the compact representation. Therefore, our solution does not suffer from space complexity blow-up over time. Another significant benefit of our approach is a "natural ranking" of explanations to the existence of a tuple in the DB.
We devise a genetics-inspired improvement to our basic method. The columns of an entity are a tuple’s basic properties, i.e., the “genes” that combine to form its genetic code. In this setting, finding the lineage of a tuple in the DB is analogous to finding its predecessors via DNA examination. We design an alternative lineage tracking mechanism, that of keeping track of and querying lineage (via embeddings) at the column (“gene”) level. This way, we manage to better distinguish between the provenance features and the textual characteristics of a tuple.
We further introduce several improvements and extensions to the basic method:
* Emphasizing important columns with a query-dependent column weighting.
* Filtering non-contributing tuples using Bloom Filters of queries.
* Filtering non-contributing tuples by a tuple creation timestamp.
* Similarity weighting with query dependency DAG.
* Extending the method to track where provenance.
We integrate our lineage computations into the PostgreSQL system via an extension (ProvSQL) and experimentally exhibit useful results in terms of accuracy against exact, semiring-based justifications. In the experiments, we focus on tuples with multiple generations of tuples in their lifelong lineage and analyze them in terms of direct and distant lineage. The examples we present suggest a high usefulness potential for the proposed approximate lineage methods and the further suggested enhancements. This especially holds for the column-based vectors method which exhibits high precision and per-level recall.
Wednesday, 16.6.2021, 12:30
You are invited to the finals event of the Best Project Contest, to be held on Wednesday, June 16, 2021, starting at 12:30, at the CS Taub Lobby.
The event will take place in the format of a project fair, and in accordance with the guidelines of the green pass instructions.
You are all invited to come and meet the best researchers and researches!