יום שני, 20.1.2020, 12:30
The task of synthetic data generation can be, roughly, stated as follows: A learner gets to observe examples, sampled from some unknown distributions, and needs to output "synthetic" examples that "look similar". For example, think of an algorithm that receives examples of music tunes from some genre (e.g. jazz, classical, etc.) and in turn generates new synthetic pieces of music from that genre. This task has won considerable attention lately, due to the introduction of the algorithmic framework of Generative Adversarial Networks (GANs) which exhibit impressive empirical performance. From a theoretical perspective, the problem of synthetic data generation raises many interesting open problems. In fact, even to model the problem turns out to be challenging: on the one hand, the task requires us to generate examples that resemble the observed examples, but on the other hand, we are required not to copy the empirical sample but to actually generate new and unobserved examples.
I will introduce two mathematical frameworks for the task of generating synthetic data. The first model we consider is inspired by GANs, and the learning algorithm has only indirect access to the target distribution via a discriminator. The second model, called DP–Foolability, exploits the notion of differential privacy as a criterion for "non-memorization". We characterize learnability in each of these models as well as discuss the interrelations. As an application, we prove that privately PAC learnable classes are DP-foolable. As we will discuss, this can be seen as an analog of the equivalence between uniform convergence and learnability in classical PAC learning.
Roi is a senior lecturer at Department of Electrical Engineering at Tel Aviv University. His research focuses on Theoretical Machine Learning. He received his Ph.D. from the Center for Brain Sciences (ELSC) at The Hebrew University of Jerusalem under the supervision of Amir Globerson.