Information Theory of Deep Learning

Naftali Tishby - COLLOQUIUM LECTURE

Tuesday, 21.11.2017, 14:30

Room 337 Taub Bld.

I will present a novel comprehensive theory of large scale learning with
Deep Neural Networks, based on the correspondence between Deep Learning
and the Information Bottlneck framework. The theory is based on the
following components: (1) rethinking Learning theory. I will prove a new
generalization bound, the input-compression bound, which shows that
compression of the input variable is far more important for
generalization than the dimension of the hypothesis class, an ill
defined notion for deep learning. (2) I will than prove that for large
scale Deep Neural Networks the mutual information on the input and the
output variables, for the last hidden layer, provide a complete
characterization of the sample complexity and accuracy of the network.
This put the information Bottlneck bound as the optimal trade-off
between sample complexity and accuracy with ANY learning algorithm.
(3) I will then show how stochastic gradient descent, as used in Deep
Learning, actually achieves this optimal bound. In that sense, Deep
Learning is a method for solving the Information Bottlneck problem for
large scale supervised learning problems. The theory gives concrete
predictions for the structure of the layers of Deep Neural Networks, and
design principles for such Networks, which turns out to depend solely on
the joint distribution of the input and output and the sample size.
Based partly on joint works with Ravid Shwartz-Ziv and Noga Zaslavsky.
Short Bio:
==========
Dr. Naftali Tishby is a professor of Computer Science, and the incumbent of the
Ruth and Stan Flinkman Chair for Brain Research at the Edmond and Lily Safra
Center for Brain Science (ELSC) at the Hebrew University of Jerusalem. He is one
of the leaders of machine learning research and computational neuroscience in
Israel and his numerous ex-students serve at key academic and industrial research
positions all over the world. Prof. Tishby was the founding chair of the new
computer-engineering program, and a director of the Leibnitz research center in
computer science, at the Hebrew university. Tishby received his PhD in theoretical
physics from the Hebrew university in 1985 and was a research staff member at MIT
and Bell Labs from 1985 and 1991. Prof. Tishby was also a visiting professor at
Princeton NECI, University of Pennsylvania, UCSB, and IBM research.
His current research is at the interface between computer science, statistical physics,
and computational neuroscience. He pioneered various applications of statistical physics
and information theory in computational learning theory. More recently, he has been
working on the foundations of biological information processing and the connections
between dynamics and information. He has introduced with his colleagues new theoretical
frameworks for optimal adaptation and efficient information representation in biology,
such as the Information Bottleneck method and the Minimum Information principle for
neural coding.