Did I get this right: the JVM also uses a JIT, so basically first you compile your java program into java byte code, and then once it's running there's also a JIT running to further optimise the bytecode ?
First of all there are many JVM implementations, the commercial ones used to offer AOT compilation as well, going back to early 2000.
Then the ones that have JITs offer multiple flavours.
One way is to initially interpret the bytecodes, after enough information it is gathered, the first level JIT gets into action and compiles that block into native code, here block is usually a function, but can be something else.
This first level compiler is rather simple and does only basic optimizations.
The application keeps profiling execution and eventually notices that the already compiled block (into native code) keeps being used significantly, now it is time to bring the big brother JIT, which is somehow equivalent to -O3 on gcc, and recompile again to native code using all major optimizations.
Other JVMs (like JRockit) never interpret, when they start the first level is already the dumb level one compiler to native code.
Then all of them now support JIT caches, meaning after a run, the JITed methods get saved and re-used by next execution, so the profiler gets to learn from previous runs, and execution of the system already starts from a much better performance state.
Yes, although HotSpot has multiple layers, not just two.
It initially interprets, and when a specific threshold is reached (you can configure it), the C1 compiler gets called into action doing basic optimizations.
After awhile if that native generated code keeps getting even more hot, the C2 compiler (the one with -O3 capabilities) gets called into action.
In both cases the optimized code gets safety guards to validate that the assumptions made by the JIT are still valid. For example if a dynamic dispatch always lands on the same method, then it gets replaced by a direct call instead. Even that is proven wrong, then the JIT throws the optimized code away and starts with the new assumptions.
Then in what concerns OpenJDK, you have actually 2 C2 JIT compilers available, HotSpot written in C++ and still the default, and Graal written in Java taken from GraalVM (nee MaximeVM) project. Currently Graal is much better than HotSpot in escape analysis for example, but worse in other scenarios.
In both cases, OpenJDK has inherited the JIT cache infrastructure from JRockit, so you also get to save the native code between runs, and start much faster in consequent runs.
As note, even though it is usually not a good idea, if you set the interpreter threshold to zero, then C1 kicks right at the beginning, but it won't have any information available, so the generated code is going to be most likely worse than just interpreting.