Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

x86 specific optimization for a language so focused on *portability*, heavy abstractions and business logic is kind of ehh. Especially with ARM is rearing its head.

If you desire performance close to the chip, you chose the wrong language and should write code in a language closer to the chip. Unless the abstractions and concepts required for your primary work are so different from what you are using for day-to-day work (data science, ML python and C++ bindings for interacting with the GPU)



The language is still portable; this is a change in the JVM, the runtime, which should have all the optimizations. I don't understand your issue.

Java is a higher level language, you just want to call sort on a list without having to worry about low level performance characteristics, because there's people much smarter that can polish that.


I think what he's saying is that instead of writing that in platform-specific C++ they could have worked on a Vector API and use that instead to automatically work in other (future) SIMD implementations of the same width.

A poster in another comment mentioned such API is being worked on, and what I described above is exactly how .NET is tackling this: they built a Vector API and are building optimizations like that in C# on top of that API, giving also developers the ability to write SIMD-oriented code in C# rather than resorting to platform-specific C++ and interop/JNI

In my opinion that's a better approach, it's discussed in great detail here https://devblogs.microsoft.com/dotnet/performance_improvemen...


The Vector API exists to write SIMD code in Java. This is an intrinsic inserted by the JIT compiler. HotSpot intrinsics are always written in assembly or compiler IR because they are inserted in the generated assembly.

Arrays.sort() could very conceivably be called in a hot loop, so you really don't want to allocate Java objects in it.


Yeah, that work in C# required a lot of other things to minimize allocations.

What I was thinking is something similar to how they implemented things like IndexOf [0] which is a pure C# implementation that gets translated by the JIT in C++ equivalent code. The advantage is of doing this kind of things this way is that when ARM adds a 256-bit wide SIMD extensions they will only need to support that as a Vector256 implementation to get that code working with no other changes.

[0]: https://github.com/dotnet/runtime/blob/2a1b52a1b691c42a7f407...


SIMD is not Intel only. ARM has SIMD support. So does AMD.

Portability is not a problem. The C/C++ compilers have nice wrappers on them to let JVM take advantage of them. And there’s always the non-simd version to fall back to.

JVM is the correct abstraction layer to implement this for portability. Any Java program doing sorting benefited from this on all supported platforms.


A precedent for x86 SIMD in those low level performance building blocks would also set a precedent for the inclusion of ARM equivalents. A heavy abstraction environment is exactly the right spot to place a set of ergonomic, long SIMD levers, one for each architecture.


Or a portable one that already works on Arm, RISC-V, AVX2 etc :) See the vqsort link above.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: