Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am impressed. Your personal website is down. HN doesn't allow private messages.

I'm Jeff Carr. I co-founded digital ocean. I assume I can't post email addresses here, but I will try. lets see how smart things are from banning me. I am: wit AT wit com



State of the art of local models is even further.

For example, look into https://github.com/kvcache-ai/ktransformers, which achieve >11 tokens/s on a relatively old two socket Xeon servers + retail RTX 4090 GPU. Even more interesting is prefill speed at more than 250 tokens/s. This is very useful in use cases like coding, where large prompts are common.

The above is achievable today. In the mean time Intel guys are working on something even more impressive. In https://github.com/sgl-project/sglang/pull/5150 they claim that they achieve >15 tokens/s generation and >350 tokens/s prefill. They don't share what exact hardware they run this on, but from various bits and pieces over various PRs I reverse-engineered that they use 2x Xeon 6980P with MRDIMM 8800 RAM, without GPU. Total cost of such setup will be around $10k once cheap Engineering samples hit eBay.


It's not impressive nor efficient when you consider batch sizes > 1.


All of this is for batch size 1.


I know. That was my point.

Throughput doesn't scale on CPU as well as it does on GPU.


We both agree. Batch size 1 is only relevant to people who want to run models on their own private machines. Which is the case of OP.


Pretty sure you can post email addresses here, this is mine: saagar@saagarjha.com. It's more about avoiding spam.


You can post emails fine, you just might get spammed (because it's a public forum).


You can put your email in your profile


fyi, your website is also down... wit.com doesn't resolve for me


Bold of you to assume that an email domain needs a web server listening on port 80 for http packets..


I went to his linkedin which has a link to wit.com as his website


You don’t even need an A/AAAA record on the domain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: