דלג לתוכן (מקש קיצור 's')
Logo of Technion
Logo of CS Department
אירועים

אירועים

Predicting a Better Future for Asynchronous SGD with DANA
event speaker icon
עידו חכימי, הרצאה סמינריונית למגיסטר
event date icon
יום שני, 24.12.2018, 11:00
event location icon
טאוב 601
Distributed training can significantly reduce the training time of neural networks. Despite its potential, however, distributed training has not been widely adopted due to the difficulty of scaling the training process. Existing methods suffer from slow convergence and low final accuracy when scaling to large clusters, and often require substantial re-tuning of hyper-parameters. We propose DANA, a novel approach that scales to large clusters while maintaining similar final accuracy and convergence speed to that of a single worker. DANA estimates the future value of model parameters by adapting Nesterov Accelerated Gradient to a distributed setting, and so mitigates the effect of gradient staleness, one of the main difficulties in scaling SGD to more workers. Evaluation on three state-of-the-art network architectures and three datasets shows that DANA scales as well as or better than existing work without having to tune any hyperparameters or tweak the learning schedule. For example, DANA achieves 75.73\% accuracy on ImageNet when training ResNet-50 with 16 workers, similar to the non-distributed baseline.
[בחזרה לאינדקס האירועים]