More

threeducks · 2026-01-07T00:21:37 1767745297

A while ago, I also implemented a dense eigenvalue solver in Python following a similar approach, but found that it did not converge in O(n^3) as sometimes claimed in literature. I then read about the Divide-and-conquer eigenvalue algorithm, which did the trick. It seems to have a reasonable Wikipedia page these days: https://en.wikipedia.org/wiki/Divide-and-conquer_eigenvalue_...

subset · 2026-01-07T00:55:20 1767747320

Ooh, thanks for sharing that algorithm! Somehow, I didn't come across this and jumped straight into using the QR algorithm cited everywhere.

I found it hard to find a good reference that had a clean implementation end to end (without calling BLAS/LAPACK subroutines under the hood). It also wasn't easy to find proper convergence properties for different classes of matrices, but I fear I likely wasn't looking in the right places.

threeducks · 2026-01-05T08:23:40 1767601420

Could you explain how? I can't seem to figure it out.

DeepSeek-V3.2-Exp has 37B active parameters, GLM-4.7 and Kimi K2 have 32B active parameters.

Lets say we are dealing with Q4_K_S quantization for roughly half the size, we still need to move 16 GB 30 times per second, which requires a memory bandwidth of 480 GB/s, or maybe half that if speculative decoding works really well.

Anything GPU-based won't work for that speed, because PCIe 5 provides only 64 GB/s and $2000 can not afford enough VRAM (~256GB) for a full model.

That leaves CPU-based systems with high memory bandwidth. DDR5 would work (somewhere around 300 GB/s with 8x 4800MHz modules), but that would cost about twice as much for just the RAM alone, disregarding the rest of the system.

Can you get enough memory bandwidth out of DDR4 somehow?

threeducks · 2026-01-04T22:27:20 1767565640

> Are .pdfs and .epub safe these days?

Depends on the viewer. Acrobat Reader? Probably not. PDF.js in some browser? Probably safe enough unless you are extremely rich.

threeducks · 2026-01-04T21:37:15 1767562635

Did you have to use any special prompts when using LLMs for writing assistance, or did it just work?

fristovic · 2026-01-04T21:47:25 1767563245

GPT-5.2 was not out at the time this was finished unfortunately...but, GPT-4o mini was used throughout in order to make the points in the book "hit" a little better. See - I'm not a native english speaker, so making something "sound" the way it sounds in my native language is hard, so I felt AI could help reasonably well with that in a book that is supposed to feel very opinionated.

But if you are insinuating AI made all this up on it's own, I have to disappoint you. My points and my thoughts are my own and I am a very human.

threeducks · 2026-01-04T22:04:16 1767564256

> But if you are insinuating AI made all this up on it's own, I have to disappoint you.

No worries, I am not a native English speaker myself. I was genuinely interested in whether commercial LLMs would use "bad" words without some convincing.

fristovic · 2026-01-04T22:09:53 1767564593

Oh, it was a hassle for sure! It kept rewriting the sentences I fed to it, trying to style them properly and it kept throwing out words and changing the rebellious tone I wanted in the book. It was worth it for some pieces, they really became more punchy and to the point, but for others looking back at it - I could have just saved the time and just published it as-is. So it's a medium success for me.

threeducks · 2026-01-04T22:22:39 1767565359

That was my experience as well. Sometimes, LLMs were a big help, but other times, my efforts would have been better spent writing things myself. I always tell myself that experience will make me choose correctly next time, but then a new model is released and things are different yet again.

dizhn · 2026-01-04T22:32:52 1767565972

Try some Made In PRC models. They do not give a shit.

threeducks · 2026-01-04T22:47:52 1767566872

I have tried a few Qwen-2.5 and 3.0 models (<=30B), even abliterated ones, but it seems that some words have been completely wiped from their pretraining dataset. No amount of prompting can bring back what has never been there.

For comparison, I have also tried the smaller Mistral models, which have a much more complete vocabulary, but their writing sometimes lacks continuity.

I have not tried the larger models due to lack of VRAM.

dizhn · 2026-01-04T23:46:41 1767570401

You can give their hosted versions a go using one of the free clis. (qwen coder cli has qwen models, opencode has a different selection all the time. it was glm recently. there's also deepseek which is quite cheap)

threeducks · 2026-01-04T09:07:21 1767517641

The last data point is from January 2026, which has just begun. If you extrapolate the 321 questions by multiplying by 10 to account for the remaining 90 % of the month, you get to within the same order of magnitude as December 2025 (3862). The small difference is probably due to the turn of the year.

threeducks · 2026-01-04T08:56:02 1767516962

Those popups were a big contributor for me to stop using SO. I stopped updating my uBlock origin rules when LLMs became good enough. I am now using the free Kimi K2 model via Groq over CLI, which is much faster.

threeducks · 2025-12-29T22:43:13 1767048193

> This is from the man who has no finished open source projects

To be fair, which open source project can really claim that it is "finished", and what does "finished" even mean?

The only projects that I can truly call "finished" are those that I have laid to rest because they have been superseded by newer technologies, not because they have achieved completeness, because there is always more to do.

bgwalter · 2025-12-29T23:08:05 1767049685

Then replace "finished" with "production software".

bdangubic · 2025-12-29T23:06:21 1767049581

> not because they have achieved completeness, because there is always more to do.

this is because SWEs love bloat and any good idea eventually needs to balloon into some ever-growing monstrosity :)

bdangubic · 2025-12-29T23:04:41 1767049481

> To be fair, which open source project can really claim that it is "finished", and what does "finished" even mean?

https://github.com/left-pad

threeducks · 2025-12-28T08:10:46 1766909446

> it always work[s]

That was not my experience, at least for very large files (100+ GB). There was a workaround (that has since been patched) where you could link files into your own Google drive and circumvent the bandwidth restriction that way. The current workaround is to link the files into a directory and then download the directory containing the link as an archive, which does not count against the bandwidth limit.

ekianjo · 2025-12-28T10:04:48 1766916288

I see. I never had to download such large files from Drive. For files up to 10Gb I never had any issue though.

threeducks · 2025-12-24T07:47:14 1766562434

HTML to PNG:

    chromium --headless --disable-gpu --screenshot=output.png --window-size=1920,1080 --hide-scrollbars index.html

Also works great for HTML to PDF:

    chromium --headless --disable-gpu --no-pdf-header-footer --run-all-compositor-stages-before-draw --print-to-pdf=output.pdf index.html

threeducks · 2025-12-20T12:20:53 1766233253

Lets take the Samsung 9100 Pro M.2 as an example. It has a sequential read rate of ~6700 MB/s and a 4k random read rate of ~80 MB/s:

https://i.imgur.com/t5scCa3.png

https://ssd.userbenchmark.com/ (click on the orange double arrow to view additional columns)

That is a latency of about 50 µs for a random read, compared to 4-5 ms latency for HDDs.

mgerdts · 2025-12-20T16:19:26 1766247566

Datacenter storage will generally not be using M.2 client drives. They employ optimizations that win many benchmarks but sacrifice on consistency multiple dimensions (power loss protection, write performance degrades as they fill, perhaps others).

With SSDs, the write pattern is very important to read performance.

Datacenter and enterprise class drives tend to have a maximum transfer size of 128k, which is seemingly the NAND block size. A block is the thing that needs to be erased before rewriting.

Most drives seem to have an indirection unit size of 4k. If a write is not a multiple of the IU size or not aligned, the drive will have to do a read-modify-write. It is the IU size that is most relevant to filesystem block size.

If a small write happens atop a block that was fully written with one write, a read of that LBA range will lead to at least two NAND reads until garbage collection fixes it.

If all writes are done such that they are 128k aligned, sequential reads will be optimal and with sufficient queue depth random 128k reads may match sequential read speed. Depending on the drive, sequential reads may retain an edge due to the drive’s read ahead. My own benchmarks of gen4 U.2 drives generally backs up these statements.

At these speeds, the OS or app performing buffered reads may lead to reduced speed because cache management becomes relatively expensive. Testing should be done with direct IO using libaio or similar.

OptionOfT · 2025-12-20T14:19:30 1766240370

At the 4K random reads impacted by the fact that you still cannot switch Samsung SSDs to 4K native clusters?

diroussel · 2025-12-20T15:15:31 1766243731

I think that is a bigger impact on writes than reads, but certainly means there is some gap from optimal.

To me a 4k read seems anachronistic from a modern application perspective. But I gather 4kb pages are still common in many file systems. But that doesn’t mean the majority of reads are 4kb random in a real world scenario.

hinkley · 2025-12-21T02:55:28 1766285728

That’s literally faster to do a full table scan below a particular table size.