A little TUI app for interactively running different SIMD instructions and seeing the outputs.
Since then I have completed the tool for AVX/2. At this stage that's as far as I intend to go.
It's potentially valuable as an interactive quick reference guide for SIMD instructions.
It works on Windows, Linux and with the right environment variables it will successfully pretend to be AMD64 running on an Apple M chip.
Arm NEON instructions are not supported at all, currently Go's assembler does not include these instructions directly, so I didn't attempt to build for them. Maybe one day.
"if you search forward, you need to scan through the entire window to find where to split. you’d find a delimiter at byte 50, but you can’t stop there — there might be a better split point closer to your target size. so you keep searching, tracking the last delimiter you saw, until you finally cross the chunk boundary. that’s potentially thousands of matches and index updates."
So I understand that this is optimal if you want to make your chunks as large as possible for a given chunk size.
What I don't understand is why is it desirable to grab the largest chunk possible for a given chunk limit?
We've found that maximizing chunk size gives the best retrieval performance and is easier to maintain since you don't have to customize chunking strategy per document type.
The upper limit for chunk size is set by your embedding model. After a certain size, encoding becomes too lossy and performance degrades.
There is a downside: blindly splitting into large chunks may cut a sentence or word off mid-way. We handle this by splitting at delimiters and adding overlap to cover abbreviations and other edge cases.
Working on a TUI tool which demonstrates the behaviour of X86 SIMD instructions. This is all done in Go assembly, and is probably most valuable for Go programmers.
The problem for me was trying to read and understand the implementation of a swiss map implementation. The SIMD instructions were challenging to understand and the documentation felt difficult to read. I thought that if I had an interactive tool where I could set the inputs to a SIMD instruction and then read the outputs, understanding the instructions would be much easier.
This turned out to be true.
Building this tool for all AVX/AVX2 instructions turned out to be a larger task than I had expected. Naively I just went off a Wikipedia page on AXV and assumed it had listed all the instructions (this was a bad assumption).
I am nearly there. Looking forward to completing this project so I can actually use it to do some fun stuff processing text and maybe even get back to that swiss map implementation.
Location: New Zealand, Manawatu Remote: Yes Willing to relocate: No Technologies: Go, Java, Git, Erlang, Postgres, Linux Resume: https://www.linkedin.com/in/francis-stephens/ Email: francisstephens@gmail.com I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust. Some interesting open-source projects include
https://github.com/fmstephe/memorymanager An exploratory manual memory allocator for building large in-memory data structures with near zero GC cost.
https://github.com/fmstephe/matching_engine A financial trading matching engine with a somewhat novel red+black tree implementation.
https://github.com/fmstephe/flib A set of packages primarily in support of a lock-free single-producer single-consumer queue.
My ideal position would be working on backend systems primarily in Go.
Location: New Zealand, Manawatu
Remote: Yes
Willing to relocate: No
Technologies: Go, Java, Git, Erlang, Postgres, Linux
Resume: https://www.linkedin.com/in/francis-stephens/
Email: francisstephens@gmail.com
I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust.
Some interesting open-source projects include
Location: New Zealand, Manawatu
Remote: Yes
Willing to relocate: No
Technologies: Go, Java, Git, Erlang, Postgres, Linux
Resume: https://www.linkedin.com/in/francis-stephens/
Email: francisstephens@gmail.com
I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust.
In New Zealand, where I live, the Salvation Army (charity second hand shop) offers a service where they will come and clear out a house for you. They will take everything and dispose of the trash and keep and resell anything of value.
This is really used to clear out houses of deceased relatives etc.
This doesn't resolve your problem of generally selling your used goods conveniently. But I always found it to be a really interesting service. Because it identifies that there is real practical difficulty in simply giving away a lot of goods, and the solution is to provide this complete service to make it easier.
In contrast to the popular arena based allocators (which target quickly allocating/freeing short lived per-request allocations), I am targeting an allocator for build very large in-memory dbs or caches with almost no garbage collection cost.
There's a little no-gc string interner package in there as well.
https://github.com/fmstephe/gossert
A library for adding runtime assertions to Go code. It's developed so that when the assertions are switched off the compiler should be able to completely eliminate the assertions. But this requires build tags to switch the assertions on.
If you enjoyed this, or if you need more control over some memory allocations in Go, please have a look at this package I wrote. I would love to have some feedback or have someone else use it.
It bypasses the GC altogether by allocating its own memory separately from the runtime. It also disallows pointer types in allocations, but replaces them with a Reference[T] type, which offers the same functionality. Freeing memory is manual though - so you can't rely on anything being garbage collected.
These custom allocators in Go tend to be arena's intended to support groups of allocations which live and die together. But the offheap package was intended to build large long-lived datastructures with zero garbage collection cost. Things like large in-memory caches or databases.
For the problems that arena allocators solve, relatively short lived allocations which die soon, yes. A generational collector would allow for faster allocation rates (a thread local bump allocator would become easy to use).
But very long lived data structures, like caches and in memory databases still need to be marked during full heap garbage collection cycles. These are less frequent with a generational collector though.
Location: New Zealand, Manawatu
Remote: Yes
Willing to relocate: No
Technologies: Go, Java, Git, Erlang, Postgres, Linux
Resume: https://www.linkedin.com/in/francis-stephens/
Email: francisstephens@gmail.com
I work primarily on backend systems, with a strong focus on performance and system stability/resilience. I worked as a performance engineer at the mobile add-attribution company Adjust.
https://github.com/fmstephe/simd_explorer
A little TUI app for interactively running different SIMD instructions and seeing the outputs.
Since then I have completed the tool for AVX/2. At this stage that's as far as I intend to go.
It's potentially valuable as an interactive quick reference guide for SIMD instructions.
It works on Windows, Linux and with the right environment variables it will successfully pretend to be AMD64 running on an Apple M chip.
Arm NEON instructions are not supported at all, currently Go's assembler does not include these instructions directly, so I didn't attempt to build for them. Maybe one day.
Next up, learn Zig - be happy.
reply