To add on to what the sibling said, ignoring that CISC chips have a separate fro...

Joker_vD · on Aug 27, 2024

x86 instruction lengths range from 1 to 15.

> a line of cache and 4 byte instructions you could start decoding 32 instructions in parallel

In practice, ARM processors decode up to 4 instructions in parallel; so do Intel and AMD.

adgjlsfhk1 · on Aug 27, 2024

Apple's m1 chips are 8 wide. and AMD and Intel's newest chips are also doing more fancy things than 4 wide

vlovich123 · on Aug 27, 2024

Any reading resources? I’d love to learn better the techniques they’re using to get better parsllelism. The most obvious solution I can imagine is that they’d just try to brute force starting to execute every possible boundary and rely on it either decoding an invalid instruction or late latching the result until it got confirmed that it was a valid instruction boundary. Is that generally the technique or are they doing more than even that? The challenge with this technique of course is that you risk wasting energy & execution units on phantom stuff vs an architecture that didn’t have as much phantomness potential in the first place.

adgjlsfhk1 · on Aug 28, 2024

https://chipsandcheese.com/2024/08/14/amds-ryzen-9950x-zen-5... is a pretty good overview of the microarchitecture. I don't think they say how they get there, because trade secrets.