Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't want to get into the weeds of the subtleties of evaluation, hyperparameter-tuning and model comparisons, but let's just say that subsequent studies have shown that LoRA (consistent with most parameter-efficient tuning methods) underperform full fine-tuning: https://arxiv.org/abs/2203.06904

As simple way to think about it is this: if LoRA really gives full fine-tuning performance, why would anyone ever fully fine-tune a model?



>why would anyone ever fully fine-tune a model?

You're asking it as if it were a rhetorical question, but I think it carries more weight than many people seem to believe.


To balance my view a little, it is definitely a valid question to ask "how far can we get with parameter-efficient tuning", and I firmly believe that as models get larger, the answer is "very, very far".

That said, I also dislike it when it is carelessly claimed that parameter-efficient tuning is as good as full fine-tuning, without qualifications or nuance.


I agree that it does carry weight.

It is not apparent to me that fine tuning should be better, especially since the LoRA method seems like it could be robust against catastrophic forgetting.


I 100% agree with you, but I (we) need more evidence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: