To join the email distribution list of the cs colloquia, please visit the list subscription page.
Computer Science events calendar in HTTP ICS format for of Google calendars, and for Outlook.
Academic Calendar at Technion site.
Taub 9
Suppose that n forecasting experts are providing daily rain/no-rain predictions, and the best among them is mistaken in at most k many days. For how many days will an optimal learner allowed to observe the predictions mis-predict? This is a fundamental problem in online learning, and other important classification problems can be reduced to it. It was studied in the 90’s by Cesa-Bianchi, Freund, Helmbold, and Warmuth who gave fine-grained bounds for the case where the learner may not use randomness. We study this problem without this restriction, and provide fine-grained bounds.
There are many generalizations of this problem where the learner only receives partial feedback on its predictions. One suggested model is “apple tasting feedback”, in which the learner only knows the true outcome when it predicts a rainy day. Another very common model is “bandit feedback”, in which there are L > 2 possible outcomes, and the learner only receives “correct” or “incorrect” feedback on its predictions. We study these models in various settings and provide nearly-tight mistake bounds.
In this talk, we will discuss some of the techniques used to prove those mistake bounds. The mistake bound for the most classical full-information problem is proved via a reduction of the problem to calculating the (average) depth of binary trees, by using novel complexity measures of the set of experts. The bounds for the bandit feedback problem are proved by using minimax duality to obtain a dual variation of the problem, which is easier to handle.
Machinery for data analysis often requires a numeric representation of the input. Towards that, a common practice is to embed components of structured data into a high-dimensional vector space. We study the embedding of the tuples of a relational database, where existing techniques are often based on optimization tasks over a collection of random walks from the database. The focus of this paper is on the recent FoRWaRD algorithm that is designed for dynamic databases, where walks are sampled by following foreign keys between tuples. Importantly, different walks have different schemas, or “walk schemes,” that are derived by listing the relations and attributes along the walk. Also importantly, different walk schemes describe relationships of different natures in the database.
We show that by focusing on a few informative walk schemes, we can obtain tuple embedding significantly faster, while retaining the quality. We define the problem of scheme selection for tuple embedding, devise several approaches and strategies for scheme selection, and conduct a thorough empirical study of the performance over a collection of downstream tasks. Our results confirm that with effective strategies for scheme selection, we can obtain high-quality embeddings considerably (e.g., three times) faster, preserve the extensibility to newly inserted tuples, and even achieve an increase in the precision of some tasks.
Flow matching models, ODE-based generative models, generate samples by gradually morphing a simple source distribution into a target distribution. In practice, these models still fall short of perfectly replicating the target distribution, mainly due to imperfections of the learned mapping. Previous work mainly focus on alleviating discretization error, which rises from sampling a continuous trajectory with a finite number of steps. In this work we focus on prediction error, an error that is inherent in the model. Our main contribution is identifying a trajectory that complies with the imperfect flow model and leads exactly to the target distribution. Based on this finding, we propose Marginal Matching---a simple inference-time correction scheme to steer the generated samples in the direction of the data. This scheme proves to reduce a bound on the distance between the data and the learned distribution, motivating two different implementations for the correction function. We show that our proposed method improves sample quality on CIFAR-10 and ImageNet-64, with minimal overhead in computation time, or non at all when applying approximated correction.
Taub 601
DNA data storage presents an efficient solution for archiving, though synthesis time and cost pose challenges.
This seminar focuses on cyclic synchronized synthesis technologies like photolithography, introducing performance metrics based on synthesis cycles.
We extend prior work on channel capacity, achieving higher rates and capacities through improved encoding.
Additionally, we analyze cost bounds and explore alphabet sizes larger than the standard four, inspired by recent advancements in DNA synthesis methods.