How does LoRA work?
LoRA freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture.
data:image/s3,"s3://crabby-images/0f465/0f4653eaf5802afa364b249ba176d38f43d47c61" alt=""
data:image/s3,"s3://crabby-images/0f465/0f4653eaf5802afa364b249ba176d38f43d47c61" alt=""
They use a random Gaussian initialization for \(A\) and zero for \(B\), so \(\Delta W = BA\) is zero at the beginning of training.