I've found the local models useful for non-coding tasks, however the 8B parameter models so far have proven lacking enough for coding tasks that I'm waiting another few months for whatever the Moore's law equivalent of LLM power is to catch up. Until then, I'm sticking with Sonnet 3.7.
No, that sounds right. 24GB isn’t enough to feasibly run 27B parameters. The rule of thumb is approximately 1GB of ram per billion parameters.
Someone in another comment on this post mentioned using one of the micro models (Qwen 0.6B I think?) and having decent results. Maybe you can try that and then progressively move upwards?
That rule of thumb is only related to 8 bit quants at low context. The default for ollama is 4 bit, which puts it roughly about 14GB.
The vast majority of people run between 4-6 bit depending on system capability. The extra accuracy above 6 tends to not be worth it relative to the performance hit.
deepseek-r1:8b screams on my 12gb gpu. gemma3:12b-it-qat runs just fine, a little faster than I can read. Once you exceed GPU ram it offloads a lot of the model to the CPU and splitting between gpu and cpu is dramatically (80? 95%?) slower
Yes my wife and I were watching a group of hikers one time and we both looked at each other and talked about how none of them had even seen a demo on using an ice axe. It felt like walking into a kitchen and seeing the chefs juggling knives
At the end of my three full day avalanche training the instructor said “now remember, you are now the least qualified people to go into the backcountry.