x86 specific optimization for a language so focused on \*portability\*, heavy ab...

Cthulhu_ · on May 31, 2023

The language is still portable; this is a change in the JVM, the runtime, which should have all the optimizations. I don't understand your issue.

Java is a higher level language, you just want to call sort on a list without having to worry about low level performance characteristics, because there's people much smarter that can polish that.

Kipters · on May 31, 2023

I think what he's saying is that instead of writing that in platform-specific C++ they could have worked on a Vector API and use that instead to automatically work in other (future) SIMD implementations of the same width.

A poster in another comment mentioned such API is being worked on, and what I described above is exactly how .NET is tackling this: they built a Vector API and are building optimizations like that in C# on top of that API, giving also developers the ability to write SIMD-oriented code in C# rather than resorting to platform-specific C++ and interop/JNI

In my opinion that's a better approach, it's discussed in great detail here https://devblogs.microsoft.com/dotnet/performance_improvemen...

xmcqdpt2 · on May 31, 2023

The Vector API exists to write SIMD code in Java. This is an intrinsic inserted by the JIT compiler. HotSpot intrinsics are always written in assembly or compiler IR because they are inserted in the generated assembly.

Arrays.sort() could very conceivably be called in a hot loop, so you really don't want to allocate Java objects in it.

Kipters · on May 31, 2023

Yeah, that work in C# required a lot of other things to minimize allocations.

What I was thinking is something similar to how they implemented things like IndexOf [0] which is a pure C# implementation that gets translated by the JIT in C++ equivalent code. The advantage is of doing this kind of things this way is that when ARM adds a 256-bit wide SIMD extensions they will only need to support that as a Vector256 implementation to get that code working with no other changes.

[0]: https://github.com/dotnet/runtime/blob/2a1b52a1b691c42a7f407...

ww520 · on May 31, 2023

SIMD is not Intel only. ARM has SIMD support. So does AMD.

Portability is not a problem. The C/C++ compilers have nice wrappers on them to let JVM take advantage of them. And there’s always the non-simd version to fall back to.

JVM is the correct abstraction layer to implement this for portability. Any Java program doing sorting benefited from this on all supported platforms.

usrusr · on May 31, 2023

A precedent for x86 SIMD in those low level performance building blocks would also set a precedent for the inclusion of ARM equivalents. A heavy abstraction environment is exactly the right spot to place a set of ergonomic, long SIMD levers, one for each architecture.

janwas · on May 31, 2023

Or a portable one that already works on Arm, RISC-V, AVX2 etc :) See the vqsort link above.