Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice but you are leaving some performance on the table (if you have a GPU)

Exllama + GPTQ is the way to go

llama.cpp && GGUF are great on CPUs

More data: https://oobabooga.github.io/blog/posts/gptq-awq-exl2-llamacp...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: