Nice but you are leaving some performance on the table (if you have a GPU) Exlla...

		jokethrowaway on Nov 29, 2023 \| parent \| context \| favorite \| on: Llamafile lets you distribute and run LLMs with a ... Nice but you are leaving some performance on the table (if you have a GPU) Exllama + GPTQ is the way to go llama.cpp && GGUF are great on CPUs More data: https://oobabooga.github.io/blog/posts/gptq-awq-exl2-llamacp...