CS Lecture: Learning From Dependent Data And Its Modeling Through The Ising Model

Yuval Dagan

Monday, 01.01.2024, 10:30

Auditorium 012, floor 0

I will present a theoretical framework for analyzing learning algorithms which rely on dependent, rather than independent, observations. While a common assumption is that the learning algorithm receives independent datapoints, such as unrelated images or texts, this assumption often does not hold. An example is data on opinions across a social network, where opinions of related people are often correlated, for example as a consequence of their interactions. I will present a line of work that models the dependence between such related datapoints using a probabilistic framework in which the observed datapoints are assumed to be sampled from some joint distribution, rather than sampled i.i.d. The joint distribution is modeled via the Ising model, which originated in the theory of Spin Glasses in statistical physics and was used in various research areas. We frame the problem of learning from dependent data as the problem of learning parameters of the Ising model, given a training set that consists of only a single sample from the joint distribution over all datapoints. We then propose using the Pseudo-MLE algorithm, and provide a corresponding analysis, improving upon the prior literature which necessitated multiple samples from this joint distribution. Our proof benefits from sparsifying a model's interaction network, conditioning on subsets of variables that make the dependencies in the resulting conditional distribution sufficiently weak. We use this sparsification technique to prove generic concentration and anti-concentration results for the Ising model, which have found applications beyond the scope of our work.

Based on joint work with Constantinos Daskalakis, Anthimos Vardis Kandiros, Nishanth Dikkala, Siddhartha Jayanti, Surbhi Goel and Davin Choo.

Yuval Dagan is a postdoctoral researcher at the Simons Institute for the Theory of Computing at UC Berkeley and at the Foundations of Data Science Institute (FODSI). He received his PhD from the Electrical Engineering and Computer Science Department at MIT, advised by Professor Constantinos Daskalakis (2018-2023). He received his Bachelor’s and Master’s degrees from the Technion, where he was advised by Professor Yuval Filmus (2011-2017). During his PhD, he received the Meta Research Fellowship in Machine Learning (2021-2022). Further, he was a visitor of the Simons Foundation at the Causality program (2022) and a research intern at Google Mountain View, hosted by Vitaly Feldman (2019). Prior to his PhD, he was a research assistant of Professor Ohad Shamir at Weizmann Institute (2018).

Based on joint work with Constantinos Daskalakis, Anthimos Vardis Kandiros, Nishanth Dikkala, Siddhartha Jayanti, Surbhi Goel and Davin Choo.

Yuval Dagan is a postdoctoral researcher at the Simons Institute for the Theory of Computing at UC Berkeley and at the Foundations of Data Science Institute (FODSI). He received his PhD from the Electrical Engineering and Computer Science Department at MIT, advised by Professor Constantinos Daskalakis (2018-2023). He received his Bachelor’s and Master’s degrees from the Technion, where he was advised by Professor Yuval Filmus (2011-2017). During his PhD, he received the Meta Research Fellowship in Machine Learning (2021-2022). Further, he was a visitor of the Simons Foundation at the Causality program (2022) and a research intern at Google Mountain View, hosted by Vitaly Feldman (2019). Prior to his PhD, he was a research assistant of Professor Ohad Shamir at Weizmann Institute (2018).