Others have pointed out adding bits to identify instruction types eats into your instruction length, so let's go stupid big time: what if you had the instructions as described here, without any instruction length being part of the instruction, but have that stored separately? (3 bits would be plenty per word) You might put it 1. as a contiguous bit string somewhere, or you might 2. put it at the bottom of the cache line that holds the instructions (the cache line being 512 bits I assume).
Okay, for 1. you'd have to do two fetches to get stuff into the I-cache (but not if it's part of the same cache line, option 2.) and of course you're going to reduce instruction density because you're using up cache, but there's nothing you can do about that, but at least it would allow n-bit instructions to be genuinely n-bits long which is a big advantage.
That this hasn't been done before to my knowledge is proof that it's a rotten idea, but can the experts here please explain why – thanks
I think this is the big downside. You're effectively taking information which will always be needed at the same time, and storing it in two different places.
There is never a need for one piece of information without the other, so why not store it together.
Interesting idea. Effectively moving the extra decode stage in front of the Icache, making the Icache a bit like a CISC trace/microOp cache. On a 512b line you would add 32 bits to mark the instruction boundaries. At which point you start to wonder if there is anything else worth adding that simplifies the later decode chain. And if the roughly 5% adder to Icache size (figuring less than 1/16th since a lot of shared overhead) is worth it.
Which Unicode encoding are you talking about? It sounds a bit like you're talking about UTF-16 conjugate pairs, but that's not how those work. It's not how UTF-8 or UTF-32 work. So, which encoding is this?
If I understand you correctly, the guy I'm responding to is proposing allowing the mixing of different sized instructions. Your suggestion effectively says "I'm starting a run of compressed instructions/I'm finishing a run of compressed instructions" which is a different proposition. Just my take though.
Okay, for 1. you'd have to do two fetches to get stuff into the I-cache (but not if it's part of the same cache line, option 2.) and of course you're going to reduce instruction density because you're using up cache, but there's nothing you can do about that, but at least it would allow n-bit instructions to be genuinely n-bits long which is a big advantage.
That this hasn't been done before to my knowledge is proof that it's a rotten idea, but can the experts here please explain why – thanks