More

fmstephe · 2026-02-10T04:10:17 1770696617

Last time this was asked I was working on this

https://github.com/fmstephe/simd_explorer

A little TUI app for interactively running different SIMD instructions and seeing the outputs.

Since then I have completed the tool for AVX/2. At this stage that's as far as I intend to go.

It's potentially valuable as an interactive quick reference guide for SIMD instructions.

It works on Windows, Linux and with the right environment variables it will successfully pretend to be AMD64 running on an Apple M chip.

Arm NEON instructions are not supported at all, currently Go's assembler does not include these instructions directly, so I didn't attempt to build for them. Maybe one day.

Next up, learn Zig - be happy.

fmstephe · 2026-01-05T22:40:04 1767652804

Can some clarify this part of the article for me

"if you search forward, you need to scan through the entire window to find where to split. you’d find a delimiter at byte 50, but you can’t stop there — there might be a better split point closer to your target size. so you keep searching, tracking the last delimiter you saw, until you finally cross the chunk boundary. that’s potentially thousands of matches and index updates."

So I understand that this is optimal if you want to make your chunks as large as possible for a given chunk size.

What I don't understand is why is it desirable to grab the largest chunk possible for a given chunk limit?

Or have I misunderstood this part of the article?

snyy · 2026-01-05T22:48:24 1767653304

You have the right understanding.

We've found that maximizing chunk size gives the best retrieval performance and is easier to maintain since you don't have to customize chunking strategy per document type.

The upper limit for chunk size is set by your embedding model. After a certain size, encoding becomes too lossy and performance degrades.

There is a downside: blindly splitting into large chunks may cut a sentence or word off mid-way. We handle this by splitting at delimiters and adding overlap to cover abbreviations and other edge cases.

fmstephe · 2025-12-14T23:17:03 1765754223

Working on a TUI tool which demonstrates the behaviour of X86 SIMD instructions. This is all done in Go assembly, and is probably most valuable for Go programmers.

The problem for me was trying to read and understand the implementation of a swiss map implementation. The SIMD instructions were challenging to understand and the documentation felt difficult to read. I thought that if I had an interactive tool where I could set the inputs to a SIMD instruction and then read the outputs, understanding the instructions would be much easier.

This turned out to be true.

Building this tool for all AVX/AVX2 instructions turned out to be a larger task than I had expected. Naively I just went off a Wikipedia page on AXV and assumed it had listed all the instructions (this was a bad assumption).

I am nearly there. Looking forward to completing this project so I can actually use it to do some fun stuff processing text and maybe even get back to that swiss map implementation.

https://github.com/fmstephe/simd_explorer

(This is also my first attempt at a TUI app)

fmstephe · 2025-12-15T00:04:52 1765757092

If anyone wants to try it out (the UI is a bit rough). I will try fix up any issues that are uncovered.

wonger_ · 2025-12-15T04:41:30 1765773690

UI seems fine to me! It's easy to understand and use. A screenshot in the README would be nice.

fmstephe · 2025-12-06T05:32:40 1764999160

Location: New Zealand, Manawatu Remote: Yes Willing to relocate: No Technologies: Go, Java, Git, Erlang, Postgres, Linux Resume: https://www.linkedin.com/in/francis-stephens/ Email: francisstephens@gmail.com I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust. Some interesting open-source projects include https://github.com/fmstephe/memorymanager An exploratory manual memory allocator for building large in-memory data structures with near zero GC cost. https://github.com/fmstephe/matching_engine A financial trading matching engine with a somewhat novel red+black tree implementation. https://github.com/fmstephe/flib A set of packages primarily in support of a lock-free single-producer single-consumer queue. My ideal position would be working on backend systems primarily in Go.

fmstephe · 2025-11-04T02:18:34 1762222714

Location: New Zealand, Manawatu Remote: Yes Willing to relocate: No Technologies: Go, Java, Git, Erlang, Postgres, Linux Resume: https://www.linkedin.com/in/francis-stephens/ Email: francisstephens@gmail.com I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust. Some interesting open-source projects include

https://github.com/fmstephe/memorymanager An exploratory manual memory allocator for building large in-memory data structures with near zero GC cost.

https://github.com/fmstephe/matching_engine A financial trading matching engine with a somewhat novel red+black tree implementation.

https://github.com/fmstephe/flib A set of packages primarily in support of a lock-free single-producer single-consumer queue.

My ideal position would be working on backend systems primarily in Go.

fmstephe · 2025-06-03T00:14:55 1748909695

    Location: New Zealand, Manawatu
    Remote: Yes
    Willing to relocate: No
    Technologies: Go, Java, Git, Erlang, Postgres, Linux
    Resume: https://www.linkedin.com/in/francis-stephens/
    Email: francisstephens@gmail.com

I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust.

Some interesting open-source projects include

https://github.com/fmstephe/memorymanager An exploratory manual memory allocator for building large in-memory data structures with near zero GC cost.

https://github.com/fmstephe/matching_engine A financial trading matching engine with a somewhat novel red+black tree implementation.

https://github.com/fmstephe/flib A set of packages primarily in support of a lock-free single-producer single-consumer queue.

My ideal position would be working on backend systems primarily in Go.

fmstephe · 2025-05-27T21:19:04 1748380744

In New Zealand, where I live, the Salvation Army (charity second hand shop) offers a service where they will come and clear out a house for you. They will take everything and dispose of the trash and keep and resell anything of value.

This is really used to clear out houses of deceased relatives etc.

This doesn't resolve your problem of generally selling your used goods conveniently. But I always found it to be a really interesting service. Because it identifies that there is real practical difficulty in simply giving away a lot of goods, and the solution is to provide this complete service to make it easier.

fmstephe · 2025-05-26T01:34:34 1748223274

I've been working on an offheap allocator for Go.

In contrast to the popular arena based allocators (which target quickly allocating/freeing short lived per-request allocations), I am targeting an allocator for build very large in-memory dbs or caches with almost no garbage collection cost.

There's a little no-gc string interner package in there as well.

https://github.com/fmstephe/memorymanager

It's somewhat on pause right now as I have just started a new job. (but it has been a very fun project, nerdy joy).

Related to the memorymanager, as in intending to support it are

https://github.com/fmstephe/fuzzhelper A library for setting up fuzz tests for complex data structures.

https://github.com/fmstephe/gossert A library for adding runtime assertions to Go code. It's developed so that when the assertions are switched off the compiler should be able to completely eliminate the assertions. But this requires build tags to switch the assertions on.

fmstephe · 2025-04-22T03:48:24 1745293704

This article is a fun read.

If you enjoyed this, or if you need more control over some memory allocations in Go, please have a look at this package I wrote. I would love to have some feedback or have someone else use it.

https://github.com/fmstephe/memorymanager

It bypasses the GC altogether by allocating its own memory separately from the runtime. It also disallows pointer types in allocations, but replaces them with a Reference[T] type, which offers the same functionality. Freeing memory is manual though - so you can't rely on anything being garbage collected.

These custom allocators in Go tend to be arena's intended to support groups of allocations which live and die together. But the offheap package was intended to build large long-lived datastructures with zero garbage collection cost. Things like large in-memory caches or databases.

kbolino · 2025-04-22T13:55:05 1745330105

Do you think the problem that is addressed by offheap could also have been addressed with a generational garbage collector?

fmstephe · 2025-04-24T10:19:26 1745489966

For the problems that arena allocators solve, relatively short lived allocations which die soon, yes. A generational collector would allow for faster allocation rates (a thread local bump allocator would become easy to use).

But very long lived data structures, like caches and in memory databases still need to be marked during full heap garbage collection cycles. These are less frequent with a generational collector though.

fmstephe · 2025-04-07T20:05:06 1744056306

    Location: New Zealand, Manawatu
    Remote: Yes
    Willing to relocate: No
    Technologies: Go, Java, Git, Erlang, Postgres, Linux
    Resume: https://www.linkedin.com/in/francis-stephens/
    Email: francisstephens@gmail.com

I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust.

Some interesting open-source projects include

https://github.com/fmstephe/memorymanager An exploratory manual memory allocator for building large in-memory data structures with near zero GC cost.

https://github.com/fmstephe/matching_engine A financial trading matching engine with a somewhat novel red+black tree implementation.

https://github.com/fmstephe/flib A set of packages primarily in support of a lock-free single-producer single-consumer queue.

My ideal position would be working on backend systems primarily in Go.