Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would it be realistic to buy and self-host the hardware to run, for example, the latest Llama 4 models, assuming a budget of less than $500,000?


Yes - I'm able to run Llama 3.1 405B on 3x A6000 + 3x 4090.

Will have Llama 4 Maverick running in 4bit quantization (typically results in only minor quality degradation) once llama.cpp support is merged.

Total hardware cost well under $50,000.

The 2T Behemoth model is tougher, but enough Blackwell 6000 Pro cards (16) should be able to run it for under $200k.


Llama scout is a 17B x 16 MOE. So that 17B active parameters. That makes it faster to run. But the memory requirements are still large. They claim it fits on an H100. So under 80GB. A mac studio at 96GB could run this. By run i mean inference, Ollama is easy to use for this. 4x3090 nvidia cards would also work but its not the easiest pc build. The tinybox https://tinygrad.org/#tinybox is 15k and you can do Lora fine tuning. Could also do a regular pc with 128gb of ram, but its would be quite slow.


A box of AMD MI300x (1.5TB of memory) is much less than $500k and AMD made sure to have day zero support with vLLM.

That said, I'm obviously biased but you're probably better off renting it.


You can do it with regular gpus for less




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: