Ido Hakimi, Ph.D. Thesis Seminar
Tuesday, 11.1.2022, 13:00
Advisor: Prof. Assaf Schuster
Training deep neural networks in the distributed asynchronous setting is complicated. In the distributed asynchronous setting, the computational devices run in parallel, causing a delay in the propagation of information between the different computational devices. The delay is often referred to as staleness, which harms the training process and the quality of the deep neural network. This staleness is one of the main difficulties in scaling asynchronous settings to a large number of computational devices since the staleness worsens as the number of computational devices grows. Training deep neural networks in the distributed asynchronous setting without addressing the staleness can have a devastating impact on the deep neural network's quality, even when training with a relatively small number of computational devices. In this work, we present three novel algorithms for training neural networks in different asynchronous configurations. These algorithms efficiently scale to large clusters while maintaining high accuracy and fast convergence. We show for the first time that momentum can be fully wncorporated in distributed asynchronous training with almost no ramifications to final accuracy. Finally, we demonstrate that our algorithms can train faster than an optimized synchronous algorithm while achieving similar, in some cases even better, final test accuracy.