Skip to content (access key 's')
Logo of Technion
Logo of CS Department


SMEGA2: Distributed Deep Learning Using a Single Momentum Buffer
event speaker icon
Refael Cohen, M.Sc. Thesis Seminar
event date icon
Monday, 24.1.2022, 10:00
event location icon
Zoom Lecture: 92984244781
event speaker icon
Advisor:  Prof. A. Schuster
As the field of deep learning progresses, and models become larger and larger, training deep neural networks has become a demanding task. The task requires a huge amount of compute power, and can still be very time consuming - especially when using just a single GPU. To tackle this problem, distributed deep learning has come into play, with various asynchronous training algorithms. However, most of these algorithms suffer from decreased accuracy as the number of workers increases. We introduce a new method - Single MomEntum Gradient Accumulation ASGD (SMEGA2), which outperforms existing methods in terms of final test accuracy and scales up to as much as 64 asynchronous workers.
[Back to the index of events]