From the author: > at some point we started benchmarking on wikipedia-scale data...

groby_b · 2026-01-06T01:12:58 1767661978

But the bottleneck is generating embeddings either way.

memchunk has a throughput of 164 GB/s. A really fast embedder can deliver maybe 16k embeddings/sec, or ~1.6GB/s (if you assume 100 char sentences)

That's two orders of magnitude difference. Chunking is not the bottleneck.

It might be an architectural issue - you stuff chunks into a MQ, and you want to have full visibility in queue size ASAP - but otherwise it doesn't matter how much you chunk, your embedder will slow you down.

It's still a neat exercise on principle, though :)

viraptor · 2026-01-06T10:32:19 1767695539

It doesn't matter if A takes much more time than B, if B is large enough. You're still saving resources and time by optimising B. Also, you seem to assume that every chunk will get embedded - they may be revisiting some pages where the chunks are already present in the database.

groby_b · 2026-01-06T20:12:28 1767730348

Amdahl's law still holds, though. If A and B differ in execution times by orders of magnitude, optimising B yields minimal returns (assuming streaming, vs fully serial processing)

And sure, you can reject chunks, but a) the rejection isn't free, and B) you're still bound by embedding speed.

As for resource savings.... not in the Wikipedia data range. If you scale up massively and go to a PB of data, going from kiru to memchunk saves you ~25 CPU days. But you also suddenly need to move from bog-standard high cpu machines to machines supporting 164GB/s memory throughput, likely full metal with 8 memory channels. I'm too lazy to do the math, but it's going to be a mild difference at O($100)

Again, I'm not arguing this isn't a cool achievement. But it's very much engineering fun, not "crucial optimization".