I asked Claude Opus about it and got an estimate. It's going to take over 100GB ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

labrador on May 1, 2024 | parent | context | favorite | on: Run llama3 locally with 1M token context

I asked Claude Opus about it and got an estimate. It's going to take over 100GB (A 128 GB RAM stick can cost around $1,500) to run locally and probably mulitple Nvidia A100's ($10,000 per) to run locally, so if you're rich or work is supplying the hardware or cloud it's possible.

What are the hardware requirements to run the model I gave you?

Based on the provided document, the hardware requirements for running the llama3-gradient model depend on the desired context window size:

To use a 256k context window, at least 64GB of memory is required. To use a 1M+ context window, significantly more memory (100GB+) is needed.

The exact requirements are not specified, but it's clear that running this model with very large context windows requires a machine with a substantial amount of RAM, likely a high-end workstation or server rather than a typical consumer PC. The computational demands also likely necessitate a powerful CPU and/or GPU. CopyMWwhat type of CPU and what type of GPU's and how many would it take?The document does not provide specific details on the CPU or GPU requirements for running the llama3-gradient model. However, based on the memory requirements mentioned and the nature of the model, we can make some educated guesses:

CPU:

For optimal performance, a high-end, multi-core server-grade CPU would likely be necessary, such as an Intel Xeon or AMD EPYC processor with a high core count (e.g., 16+ cores).

GPU:

Running large language models often benefits greatly from the use of one or more powerful GPUs. High-end consumer GPUs like the Nvidia GeForce RTX 3090 (24GB VRAM) or data center GPUs like the Nvidia A100 (40GB-80GB VRAM) would likely be suitable.

The number of GPUs required would depend on the specific implementation and the desired performance. A single high-end GPU might be sufficient, but using multiple GPUs could speed up processing.

Keep in mind that these are estimates based on the provided information. The actual hardware requirements may vary depending on the specific implementation, optimizations, and performance requirements. Running such a model with very large context windows would likely require a powerful, expensive system with ample RAM, fast storage, and robust cooling. Detailed benchmarks or recommendations from the model's developers would provide more definitive guidance on the exact hardware needed.

boredemployee on May 1, 2024 [–]

Great. Thanks!

I wish there was something like "can you run it" but for LLMs.

EarthAmbassador on May 1, 2024 | [–]

I second that! The lack of a tool for this is nuts.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact