Can someone ELI5 (or at least ELI20): Why does Forth do better on a different CPU architecture than other languages? Or, why does Forth do worse on the architecture that most other languages do fine on?
Forth is a stack-based language, so it does really well on a stack-based CPU. Optimizers for common languages like C perform analysis on which values in a function are more commonly used, and want to control whether those values live in registers or RAM; register based CPU ISAs can give them more fine-grained control.
But if you're working in Forth, everything is stack ops anyway, you wouldn't benefit much from that kind of analysis, and really a stack-based CPU is going to give you the most speed because the most common operations boil down to a single instruction per.
Chuck Moore doesn't mind this because he writes literally everything in Forth. Most of us who work in other languages would mind the performance hit.
Is there a need, with WASM implementations generating native code? Jazelle tried that for the JVM on ARM, but I also get the impression that compilers made Jazelle unnecessary.