More

robertknight · 2025-12-27T11:07:45 1766833665

The major constraint is that the compiler needs to guarantee that transformations produce semantically identical results to the unoptimized code, with the exception of undefined behavior or specific opt-outs (eg. `-ffast-math` rules).

An ML model can fit into existing compiler pipelines anywhere that heuristics are used though, as an alternative to PGO.

robertknight · 2025-08-28T17:16:50 1756401410

The custom scroll behavior on this page is infuriating and distracts from the content. It is like scrolling in treacle. If someone who works at Microsoft sees this and is able to file an issue, could you please do so.

robertknight · 2025-08-16T20:44:17 1755377057

I've seen talks on this topic at Rust conferences which seemed strongly influenced by Swift's approach, so that will probably be the direction this ends up going in.

IshKebab · 2025-08-16T22:08:58 1755382138

The WASI Component Model also has a richer ABI. I wonder if that could be copied to native platforms somehow.

robertknight · 2025-06-24T11:45:36 1750765536

I use fish + atuin. I leave the "Up" arrow set to use fish's default history search (see https://docs.atuin.sh/faq/#how-do-i-remove-the-default-up-ar...), which keeps the UI minimal when just going back one or two commands, then use atuin via Ctrl+R when I need to find a command from earlier in my history. At that point Atuin provides a nicer UI for searching the history.

robertknight · 2025-05-22T12:12:00 1747915920

Good post! The inefficient code for comparing pairs of 16-bit integers was an interesting find.

ohr · 2025-05-22T12:22:12 1747916532

Thanks! Would be interesting to see if Rust/LLVM folks can get the compiler to apply this optimization whenever possible, as Rust can be much more accurate w.r.t memory initialization.

adgjlsfhk1 · 2025-05-22T13:17:10 1747919830

I think rust may be able to get it by adding a `freeze` intrinsic to the codegen here. that would force LLVM to pick a deterministic value if there was poison, and should thus unblock the optimization (which is fine here because we know the value isn't poison)

kukkamario · 2025-05-22T15:14:18 1747926858

I think in this case Rust and C code aren't equivalent which maybe caused this slow down. Union trick also affects the alignment. C side struct is 32 bit aligned, but Rust struct only has 16bit alignment because it only contains fields with 16bit alignment. In practice the fields are likely anyway correctly aligned to 32bits, but compiler optimizations may have hard time verifying that.

Have you tried manually defining alignment of Rust struct?

Ygg2 · 2025-05-22T12:35:10 1747917310

Would be great, but wouldn't hold my breath for it. LLVM and Rustc can be both be kinda slow to stabilize.

pornel · 2025-05-22T14:14:13 1747923253

It varies. New public APIs or language features may take a long time, but changes to internals and missed optimizations can be fixed in days or weeks, in both LLVM and Rust.

robertknight · 2025-05-20T10:52:15 1747738335

> Does Hypothes.is have a self-hosting option? https://web.hypothes.is/sales/

The code for both the client and server are open source (https://github.com/hypothesis/h) so this is possible. The server is designed to support the needs of large scale deployments, so this does come with some complexity compared to a system you would design for smaller scale usage.

The text on https://web.hypothes.is/ mostly targets schools and universities, because Hypothesis pays for itself by selling integrations with online learning platforms (Canvas, D2L, Blackboard etc.) and associated support.

robertknight · on Feb 4, 2025

One interesting thing I discovered comparing various matrix multiplication implementations used in ML libraries is that several of them (ONNX Runtime, XNNPack, any others?) skip the step, from BLIS's textbook algorithm, of packing the LHS matrix. Instead they pack only the RHS. Since those are the weights, this can be done once ahead of time and then an inference pass does not need to do any packing at all.

From skimming various papers it seems like the motivation for packing the LHS originally, even though a single element is broadcast from it at a time (nb. this is opposite to the order in this post, where the row count in the microkernel is a multiple of the register size, rather than the column count), was to reduce TLB misses. Apparently this is not a problem in practice on modern CPUs and for problem sizes common in ML inference.

robertknight · on Dec 6, 2024

For context, ultralytics is the Python package for YOLO v8 and YOLO v11, two of the most widely used object detection models. The GitHub repo has 33K stars.

robertknight · on Nov 7, 2024

A couple of suggestions:

1. Try not to stress too much. The point of review is to improve the quality of the code that lands, and the health of the codebase as a whole, not to grade the author.

2. When you think your work is ready for an initial review, take a short break and then review it line by line yourself, looking for any obvious mistakes or possible simplifications. I recommend doing this in a different editor or view than the one you authored the code in originally.

When reviewing someone else's code, it is annoying if there are silly mistakes that the author could obviously have caught themselves. Issues where the author was not aware of a subtle detail or a coding practice in some other part of the code are less of an issue. Identifying these are exactly what reviews are for.

robertknight · on Oct 13, 2024

> This effectively means that if I were to bundle ffmpeg and ffprobe executables within my app, I would have to make the app open-source as well and provide it under the same license.

This is a misunderstanding of the LGPL license requirements, as they are usually interpreted. LGPL requires that if your application dynamically links to a modified version of the library, then you must make the source code _for the modified version of the library_ open source.

The original use case for the LGPL license was the C runtime, which practically every binary on a system will link to, proprietary or otherwise. The idea is that the binary can be closed source, but any modifications made to the C runtime itself must be distributed. The idea is that an end user could take your modified version of the C library and further customize it to their needs.

In the context of bundled ffmpeg and ffprobe executables, the user could replace them with their own versions. You should make clear which version of the source was used to build them and if you have made any modifications, those must be open source.

ano-ther · on Oct 13, 2024

I can’t speak on the interpretation of the LGPL, but there are several copies of ffmpeg on my computer because people interpret it as in the article and install a new copy for their application.

Interesting that the MacOS protections can be subverted by not setting an attribute.