Accurate, Large Minibatch SGD

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

When the minibatch size is multiplied by \(k\), how should the learning rate be scaled?

Multiply the learning rate by \(k\) (Linear Scaling Rule)
What is the technique used to train large minibatch SGD.

Linear scaling rule.

Machine Learning Research Flashcards is a collection of flashcards associated with scientific research papers in the field of machine learning. Best used with Anki. Edit MLRF on GitHub.