How does LoRA work?
LoRA freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture.
![](https://raw.githubusercontent.com/Mxbonn/MLRF/master/flashcards/metadata/media/paste-93f70700ed8f25149460afe0c85505cd00ee1d46.jpg)
![](https://raw.githubusercontent.com/Mxbonn/MLRF/master/flashcards/metadata/media/paste-93f70700ed8f25149460afe0c85505cd00ee1d46.jpg)
They use a random Gaussian initialization for \(A\) and zero for \(B\), so \(\Delta W = BA\) is zero at the beginning of training.