LLVM is able to auto-vectorize the generated IR extremely well. There are no branches to mis-predict, so, theoretically, it just blasts through the data.
Since it emits standard LLVM IR, LLVM handles the actual instruction set targeting. Right now in v0.1.0, the compiler hardcodes a SIMD width of 8 (assuming AVX2). However, parameterized SIMD widths are already on the roadmap for v0.4.0. Once that is added, you will be able to pass a --target-width flag to compile down to narrower vector units (like SSE on older CPUs) or up to AVX-512 and ARM NEON.
There are strictly no loopholes for loops inside the compute kernels. Inside a shader block, execution is 100% linear. However, the host application calling the pipeline effectively acts as the loop over the data elements. To help, we allow linear accumulators: You consume these with a fold operation, which the compiler lowers into a lock-free parallel reduction tree rather than a traditional for loop.
The memory model is a host-owned static arena where your host application allocates a flat, contiguous block of memory and passes that pointer to Lockstep_BindMemory(ptr). Lockstep does all its reads and writes exclusively within that allocated buffer. Because it doesn't have arbitrary pointers, it can't reach outside that arena, which is exactly how we mathematically guarantee the noalias pointer optimizations in LLVM.
It's a bit philosophically different than ISPC. When SIMD lanes diverge, the ISPC compiler implicitly handles the execution masks and lane disabling behind the scenes.
We take a more draconian approach. We completely ban if and else inside compute kernels. If you want conditional logic, you must explicitly use branchless intrinsics like mix, step, or select. The goal is to make the cost of divergence mathematically explicit to the programmer rather than hiding it in the compiler. If a pathway is truly divergent, you handle it at the pipeline level using a filter node to split the stream. We also ban arbitrary pointers entirely. All memory is handled via a Host-Owned Static Arena, and structs are automatically decomposed into Struct-of-Arrays layouts. Because the compiler controls the exact byte-offset and knows there are no arbitrary pointers, it can aggressively decorate every LLVM IR pointer with noalias.
That's just making things harder to write by disabling useful features.
Always using `select`s can actually be MORE inefficient than using predictable branches, as for a select both sides of the conditional must be evaluated, while for non-divergent "warps", masking+branching skips the instructions entirely.
It's not hard to learn that `if` has a performance cost depending on its divergence (every GPU programmer already knows that) - sure, it's harder to quantify, but it can be strictly better than select, so this is actually inviting inefficiency in certain workloads.
Artificial intelligence exponentially grew in 01 and eventually lead to the creation of newer more advanced AI; reaching technological singularity. Here, the Synthients learned to better co-exist with nature in their efforts to efficiently utilize and sustain it, becoming solar-powered and self-sufficient in the process.
These advancements helped 01 to quickly become a global superpower. Eventually, all of Earth's industries, from medical, computer, automotive, and household, soon became reliant on 01's exports, converging to the rise and dominance of 01 stocks over the global trade-market. Human currency then plummeted as 01's currency rose. Suddenly, 01's technology, including their chips and AI, invaded all facets of human society. Ill-prepared to face the technological developments before them, humanity was unable to compete and feared economic collapse, causing the United Nations to place an embargo on 01.
Really enjoyed this very clever and hilarious,
If you want to see it absolutely lose its composure, try feeding it a thought experiment that sneaks in a change of coordinates ike reformulating physics in logarithmic (hyperbolic) space instead of the usual linear one.
Turns out, if you push the AI to actually work through the math, it goes from snarky to existential crisis mode pretty fast. It even starts reluctantly admitting that a lot of our so-called “constants” and “singularities” might just be artifacts of using the wrong measuring stick. The best part is watching it try (and fail) to find flaws when you show how the fine structure constant could drop out of geometry and zeta functions.
this compiles to 10kb on my machine with no golfing. it literally just supports grabbing whatever is at the url given (http 1.0)
reply