דלג לתוכן (מקש קיצור 's')
אירועים

אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב

event speaker icon
בוריס גינסבורג (NVIDIA) - הרצאת אורח מיוחד במדעי המחשב
event date icon
יום חמישי, 19.10.2017, 14:30
event location icon
חדר 337, בניין טאוב למדעי המחשב
A common way to speed up training of large convolutional networks is to add computational units. Training is then performed using data-parallel synchronous Stochastic Gradient Descent (SGD) with a mini-batch divided between computational units. With an increase in the number of nodes, the batch size grows. However, training with a large batch often results in lower model accuracy. We argue that the current recipe for large batch training (linear learning rate scaling with warm-up) is not general enough and training may diverge. To overcome these optimization difficulties, we propose a new training algorithm based on Layer-wise Adaptive Rate Scaling (LARS). Using LARS, we scaled AlexNet and ResNet-50 to a batch size of 16K without loss in accuracy