It’s interesting that there is a zero stored in a register and used for hours - ...

Veliladon · on Nov 22, 2022

Intel has never really needed to have a zero register because xor register, register as a zeroing idiom is so fast and so recognized that Intel have optimized the hell out of it. In Sandy Bridge and onward it doesn't even go through an execution port, even for the vector registers.

The problem is really whether to indulge bad programmers who don't respect the ABI at the cost of a minimal sliver of performance (even though it's not taking up an execution port the extra instruction still takes up cache space, bandwidth, and decode). Yeah they should probably zero the register before they zero the pointer but they shouldn't have to if other people respected the ABI.

kazinator · on Nov 22, 2022

I think it's not just the xor trick, but that Intel has lots of addressing modes, including ones with immediate operands that you can use in many situations.

In RISC-like machines, most of the operations are register-register, and you have load/store instructions for referencing memory.

To use an immediate operand (literal constant in the code itself), you may have to load it into a register, like

   move r7, #42
   add r1, r1, r7  ;; ok, now we have 42 in r7, we can increment r1 by 42.

Whereas in a CISC you would have

   add r1, #42  ;; two operand form

or maybe

   add r1, r1, #42 ;; three operand form

When you need a zero, you just use the immediate operand zero, and thus you don't need to to pick some register to clear.

In summary, zero registers in RISC-like instruction set architectures effectively provide a literal zero that can be used wherever a register is required, which helps because only register operands can be used in many instructions.

Veliladon · on Nov 22, 2022

That's a good point. But all of the x86 SIMD stuff is register/register and we don't have xmm0/ymm0/zmm0 being 0 like we'd expect on a load store style RISC architecture.

CodesInChaos · on Nov 22, 2022

RISC-V has such a register. It returns zero when read, and ignored writes.

gumby · on Nov 22, 2022

Many architectures do have a 0 register because the value is useful. Others have a zero instruction (or both).

What would an "actual zero" be -- a literal?

pitaj · on Nov 22, 2022

I don't know anything about x86, but on Arm you can use an immediate 0 that is stored in the instruction data.

bombcar · on Nov 22, 2022

Yeah, something like "mov ax, 0h" but I suppose that is way more memory intensive as you have to load a 0 into memory somewhere and then copy it into the register.

It strikes me as somehow the compiler is making assumptions that aren't being enforced by the ... OS? Language? not sure what, but it's assuming functions restore registers used but that isn't enforced by anything. From my (long ago) time there was PUSHA and POPA but I assume those take quite a bit of "oomph" and are avoided if possible.

gumby · on Nov 22, 2022

> It strikes me as somehow the compiler is making assumptions that aren't being enforced by the ... OS? Language?

One case of this problem was in a handwritten assembly file. The other was a compiler bug.

This is a case where the ABI requires that if you use a certain register you must save its previous value and restore it afterwords; the two independent bugs were cases of forgetting to look after a certain register.

An ABI is simply an agreement as to how things should work: what registers you are free to clobber, which you must look after when you use, how certain data must be laid out in memory, etc. ABIs are typically language specific, though there may be a lot of commonality at the very high level (i.e. how you use sections in an ELF file) and low (anybody using unboxed integers probably will do the same thing).

You are welcome to violate the ABI as you see fit in your own code. The OS doesn't care; it has its own constraints (how to make a system call, how to pass arguments to each -- though cf above when I talked about ints). So, say, a Lisp compiler can lay out stack frames differently from a C++ compiler because of the languages' different semantics) but if your Lisp program wants to call a library written in C++ it must make sure memory at the call site follows the C++ ABI because that's what the C++ compiler will have assumed.

brucedawson · on Nov 23, 2022

Both bugs were programming errors in assembly language files. One was inline assembly that was missing entries from a clobber list, the other was an assembly function that lacked invocations of the macros that were supposed to be used to preserve/restore the registers. There was no compiler bug.

bombcar · on Nov 22, 2022

It seems to me something that could be found by some kind of valgrind-like tool - it'd be much slower than normal code but "ABI exception detected" or something.

gumby · on Nov 22, 2022

Not worth checking for. The few people who write assembly code these days know what they are doing, and, bugs aside, the compiler knows what to do.

titzer · on Nov 22, 2022

The old Russian proverb is "Trust but verify."

I, for one, hate debugging asm. I do it a lot, and would prefer bugs be caught automatically, preferably soon after they are introduced.

gumby · on Nov 22, 2022

You're part of a small population, and so you might not even trust the tool.

It's not clear to me how to write such a tool as assembly code is the opposite of structured.

gpderetta · on Nov 22, 2022

Also compilers violate the ABI when they know the violation can't be observed, so the external tool would have too many positives.

brucedawson · on Nov 24, 2022

Interesting conclusion given that I found two functions and a (presumed) third-party driver/what-not that were violating the ABI. One of these was causing crashes, and the other one was going to. The crashes went on for over a year and a half, so, ...

brucedawson · on Nov 23, 2022

The compiler is making assumptions (which it is supposed to make) but nobody is enforcing the assumptions. The only player who could reasonable enforce the assumptions would be the compiler, in a special checking mode. I am not aware of a compiler that does this. Pity.

AnimalMuppet · on Nov 22, 2022

Why would you have to load 0 into memory somewhere (other than in the instruction itself)? Or did you mean that it is in fact in the instruction?

bombcar · on Nov 23, 2022

Because you’d either need a special “always zero” register (some chips have this), or a menomic for some or all of the instructions that assume zero as an operable (some chips have this), or wipe a register (this is the problem here - uses a register) or use memory.

Adding an implicit zero may make sense for some instructions but probably not all.

AnimalMuppet · on Nov 23, 2022

Look at the format of the 68000's MOVEQ instruction. The zero is part of the instruction, and does not take an extra four bytes to hold it. There's no memory that's used (other than the instruction itself), no extra memory to hold the argument, and no "always zero" register.

MOVEQ can move more than a zero. It can move any small number (-128 to 127), so 0 is not "special" here.

Also check out the CLR instruction (though that may be what you meant by "a mnemonic for some or all of the instructions that assume zero").

Iwan-Zotow · on Nov 24, 2022

common (and actually recommended) idiom on intel/amd is to do 'xor reg,reg'

ddingus · on Nov 22, 2022

Where it's not optimized away, getting "an actual zero" requires a memory operation of some kind. Register ops are faster in that they are right there, no fetch needed.

AnimalMuppet · on Nov 22, 2022

Depends on the instruction architecture. 68000 had some ways of burying a small literal operand in the instruction. If I recall correctly, MOVEQ.L would let you move zero to a register without touching memory (other than the instruction fetch), and it wasn't a long instruction.

ddingus · on Nov 23, 2022

Yes, that is true.

However, moving a zero to a register does take time. Time that would otherwise be used operating with the zero value already present in the zero register.

The second best is what moto did. As you point out, there is the instruction fetch, which could be the intended operation, rather than developing the zero itself.

On par with that is having enough registers to just hold a zero, and whether that made sense depended on the need and developer strategy.

I am a big fan of the moto CPU's, starting with the 6809. Just to be clear.

AnimalMuppet · on Nov 23, 2022

But moving it from the zero register to another register would also take time. If what you want is a zero in a register other than the zero register (say, one that is going to serve as the index of a loop, which the zero register cannot do), then MOVEQ should not take any longer than a MOVE from the zero register to another register.

ddingus · on Nov 23, 2022

Right. There are different cases, yes?

Say we are zeroing memory. No advantage there. Coupla cycles right at the start, then a ton of writes.

Say we are forming a bitmask. Could be an advantage there in that having a zero handy in a register means no fetching one. When a lot of dynamically created masks are needed, this can be a nice gain.

I'm sure we can come up with more. It's not always important, and like you mention with the moto designs, may not matter too much due to many other optimizations possible given a good instruction set.

Some people would rather have the register free for general use! I'm one of those, but if there is a zero register, I use it to get the benefit of it when I can. On the devices I've seen, there are generally a lot of registers so the marginal impact of having a zero register isn't significant. There are plenty to work with.

Maybe I should be clear here too. I personally don't care whether there is one. If it's there, I do things in ways that leverage it, and was just pointing out why devices that have one, ahem... have one! Those that don't may or may not have options that make sense. The way moto did it is very good, and there are other pretty great optimizations possible with their ISA, abusing the stack to write memory, etc...

If not, then I do other things. It's assembly language! Work the chip, right?