Why would you need a 64 bit instruction; what kinds of things are going to be us...

camel-cdr · on Oct 24, 2023

From the RVI thread on 48 bit instructions, 64 bit ones would probably look similar:

> There are several 48-bit instruction possibilities.

> 1. PC-relative long jump

> 2. GP-relative addressing to support large small data area, effectively giving GP-relative access to entire data address space of most programs

> 3. Load upper 32-bits of 64-bit constants or addresses

> 4. Or lower 32-bits of 64-bit constants or addresses

> 5. And with 32-bit mask

> 6. More effective ins/ext of 64-bit bit fields

Another thing thats offten discussed is moving the vtype and setvl into each vector instructions, I'm not sure if that requries 48 or 64 bit instructions.

_a_a_a_ · on Oct 24, 2023

I was really asking about 64-bit instructions specifically, but going with what you've put, if you don't mind...

> 1. PC-relative long jump

My understanding is that these are rare

> 2. GP-relative addressing to support large small data area, effectively giving GP-relative access to entire data address space of most programs

What is 'GP' here? but "...access to entire data address space of most programs" In this case you are just going to be bouncing all over the address space, substantially missing any level of cache much of the time, surely?. Maybe you get a little extra code density but you aren't going to get any extra speed to speak of.

> 3. Load upper 32-bits of 64-bit constants or addresses

> 4. Or lower 32-bits of 64-bit constants or addresses

> 5. And with 32-bit mask

Well yeah, but how common is this? I understand the alpha architecture team looked at this and found it uncommon which is why they were okay with less-than-32-bit constants. If it really speeded things up you might build a specific cache to store constants (a kind of larger, stupider, register set). It would seem a simpler solution.

I'm not sure what you mean with 6, and I'm not familiar with vtype/setvl

dzaima · on Oct 25, 2023

On vtype/setvl: in the RISC-V V extension (aka RVV / Vector (≈SIMD)), due to the 32-bit instruction length, there's a separate instruction that does some configuration (operated-on element size, register group size, masked-off element behavior, target element count), which arith/etc operations afterwards will work by. So e.g. if you wanted to add vectors of int32_t-s, you'd need something like "vsetvli x0,x0,e32,m1,ta,ma; vadd.vv dst,src1,src2"

Often one vsetvl stays valid for multiple/most/all instructions, but sometimes there's a need to toggle it for a single instruction and then toggle it back. With 48-bit or 64-bit instructions, such temporary changes could be encoded in the operation instruction itself.

Additionally, masked instructions always mask by v0, which could be expanded to allow any register (and perhaps built-in negation) by more instruction bits too.

classichasclass · on Oct 24, 2023

> My understanding is that these are rare

Depends on how many bits you had to start with. On Power ISA they aren't common either, but when they happen you need up to seven instructions (lis, ori, rldicl, oris, ori, then for branches mtctr/b(c)ctr) to specify the new address or larger value. Most other RISCs are similar when full 64-bit values must be specified. This is a significant savings.

benj111 · on Oct 24, 2023

Well you can embed longer immediates directly in the opcode.

You could have a lot more registers.

The first example, I'm not sure you'd want a full 64bit encoding space. You still aren't going to be able to load a 64bit immediate directly so I'd rather see an instruction that uses the next instruction as the immediate. But then 50% of the time you're still going to be padding this to 64bit alignment, so it's unclear to me that this is a benefit over 2 lots of the same but with 32bit immediates.

The second option is interesting. But if you've got 256 addressable registers say, what use are the 32 and 16 bit instructions that can only address a tiny proportion of those registers.

Joker_vD · on Oct 24, 2023

How do you even use all those registers? Serious question. I've toyed with a couple of 256-register ISAs, and the moment you hit function calls/parameter passing you realize that to utilize those efficiently, you really need some way to indirectly refer to registers, be it register windows, or MMIX's register slide, or Am29k's IPA/IPB/IPC registers; the only other option seems to be to perform global register allocation but that hardly works in scenarios with separate compilation/dynamic code loading.

benj111 · on Oct 24, 2023

Off the top of my head I don't really know. But then if you had asked me 20 years ago if we'd need multi core multi GHz multi GB computers to display a web page I'd probably have said no.

I suppose the os could reserve registers for itself to save swapping in and out quite so often.

Register windows for applications/functions/threads.

Or maybe something radically different, like get rid of the stack, and treat them conceptually like a list?

_a_a_a_ · on Oct 24, 2023

> I've toyed with a couple of 256-register ISAs, and the moment you hit function calls/parameter passing you realize that to utilize those efficiently

Very revealing, thanks, this had never occurred to me

hajile · on Oct 24, 2023

The sweet spot for scalar code is about 24 registers, but that leads to weird offset-bits (there's an ISA that does this, but I forget what it's called), so 32 registers is easier to implement and provides a mild improvement in the long tail of atypical functions.

On the flip side, the ability to have more registers is very good for SIMD/GPU applications.

benj111 · on Oct 24, 2023

Absolutely, I'm not saying a 64bit instruction length with 5/6/7/8 bits of registers would be bad per se. In fact I'd be interested to see where it leads.

But if you have a processor that also uses 16 bit instructions those extra registers become unusable. Thumb can't encode all registers in all instructions so you have the high registers that are significantly less useful than the low registers.

X86 is the same, never really done 64bit ASM so I don't know if they improved that.

So then you may aswell just divide up the registers so you've got 16 general purpose registers and 16 registers for simd or whatever.

classichasclass · on Oct 24, 2023

Power10 added "prefixed" instructions, which are effectively 64-bit instructions in two 32-bit halves (the nominal instruction size). They are primarily used for larger immediates and branch displacements.

https://www.talospace.com/2021/04/prefixed-instructions-and-...

TanjB · on Oct 24, 2023

MIPS had load const to high or low half. More that 40 years ago Transputer had shift-and-load 8 bit constants. Lots of ancient precedents for rare big constants.

classichasclass · on Oct 24, 2023

So does classic PowerPC, SPARC, and many other ISAs. It's the most common way to handle it on RISC. The Power10 prefixed instruction idea just expands on it.

hajile · on Oct 24, 2023

Personally, I like the idea of doubling the instruction length every time -- 16, 32, 64, 128, etc. There's a big use case on the longer instruction end for VLIW/DSP/GPU applications.

Pet_Ant · on Oct 24, 2023

AFAIK you want short instructions for VLIW because you want to pack multiple of them into a single word.