Accurate, Large Minibatch SGD

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

When the minibatch size is multiplied by \(k\), how should the learning rate be scaled?

Multiply the learning rate by \(k\) (Linear Scaling Rule)

What is the technique used to train large minibatch SGD.

Linear scaling rule.