It’s interesting that there is a zero stored in a register and used for hours - is that significantly faster than just using some actual zero each time? Perhaps CPUs need a “always zero” register or some similar menomic to help harden.
Intel has never really needed to have a zero register because xor register, register as a zeroing idiom is so fast and so recognized that Intel have optimized the hell out of it. In Sandy Bridge and onward it doesn't even go through an execution port, even for the vector registers.
The problem is really whether to indulge bad programmers who don't respect the ABI at the cost of a minimal sliver of performance (even though it's not taking up an execution port the extra instruction still takes up cache space, bandwidth, and decode). Yeah they should probably zero the register before they zero the pointer but they shouldn't have to if other people respected the ABI.
I think it's not just the xor trick, but that Intel has lots of addressing modes, including ones with immediate operands that you can use in many situations.
In RISC-like machines, most of the operations are register-register, and you have load/store instructions for referencing memory.
To use an immediate operand (literal constant in the code itself), you may have to load it into a register, like
move r7, #42
add r1, r1, r7 ;; ok, now we have 42 in r7, we can increment r1 by 42.
Whereas in a CISC you would have
add r1, #42 ;; two operand form
or maybe
add r1, r1, #42 ;; three operand form
When you need a zero, you just use the immediate operand zero, and thus you don't need to to pick some register to clear.
In summary, zero registers in RISC-like instruction set architectures effectively provide a literal zero that can be used wherever a register is required, which helps because only register operands can be used in many instructions.
That's a good point. But all of the x86 SIMD stuff is register/register and we don't have xmm0/ymm0/zmm0 being 0 like we'd expect on a load store style RISC architecture.
Yeah, something like "mov ax, 0h" but I suppose that is way more memory intensive as you have to load a 0 into memory somewhere and then copy it into the register.
It strikes me as somehow the compiler is making assumptions that aren't being enforced by the ... OS? Language? not sure what, but it's assuming functions restore registers used but that isn't enforced by anything. From my (long ago) time there was PUSHA and POPA but I assume those take quite a bit of "oomph" and are avoided if possible.
> It strikes me as somehow the compiler is making assumptions that aren't being enforced by the ... OS? Language?
One case of this problem was in a handwritten assembly file. The other was a compiler bug.
This is a case where the ABI requires that if you use a certain register you must save its previous value and restore it afterwords; the two independent bugs were cases of forgetting to look after a certain register.
An ABI is simply an agreement as to how things should work: what registers you are free to clobber, which you must look after when you use, how certain data must be laid out in memory, etc. ABIs are typically language specific, though there may be a lot of commonality at the very high level (i.e. how you use sections in an ELF file) and low (anybody using unboxed integers probably will do the same thing).
You are welcome to violate the ABI as you see fit in your own code. The OS doesn't care; it has its own constraints (how to make a system call, how to pass arguments to each -- though cf above when I talked about ints). So, say, a Lisp compiler can lay out stack frames differently from a C++ compiler because of the languages' different semantics) but if your Lisp program wants to call a library written in C++ it must make sure memory at the call site follows the C++ ABI because that's what the C++ compiler will have assumed.
Both bugs were programming errors in assembly language files. One was inline assembly that was missing entries from a clobber list, the other was an assembly function that lacked invocations of the macros that were supposed to be used to preserve/restore the registers. There was no compiler bug.
It seems to me something that could be found by some kind of valgrind-like tool - it'd be much slower than normal code but "ABI exception detected" or something.
Interesting conclusion given that I found two functions and a (presumed) third-party driver/what-not that were violating the ABI. One of these was causing crashes, and the other one was going to. The crashes went on for over a year and a half, so, ...
The compiler is making assumptions (which it is supposed to make) but nobody is enforcing the assumptions. The only player who could reasonable enforce the assumptions would be the compiler, in a special checking mode. I am not aware of a compiler that does this. Pity.
Because you’d either need a special “always zero” register (some chips have this), or a menomic for some or all of the instructions that assume zero as an operable (some chips have this), or wipe a register (this is the problem here - uses a register) or use memory.
Adding an implicit zero may make sense for some instructions but probably not all.
Look at the format of the 68000's MOVEQ instruction. The zero is part of the instruction, and does not take an extra four bytes to hold it. There's no memory that's used (other than the instruction itself), no extra memory to hold the argument, and no "always zero" register.
MOVEQ can move more than a zero. It can move any small number (-128 to 127), so 0 is not "special" here.
Also check out the CLR instruction (though that may be what you meant by "a mnemonic for some or all of the instructions that assume zero").
Where it's not optimized away, getting "an actual zero" requires a memory operation of some kind. Register ops are faster in that they are right there, no fetch needed.
Depends on the instruction architecture. 68000 had some ways of burying a small literal operand in the instruction. If I recall correctly, MOVEQ.L would let you move zero to a register without touching memory (other than the instruction fetch), and it wasn't a long instruction.
However, moving a zero to a register does take time. Time that would otherwise be used operating with the zero value already present in the zero register.
The second best is what moto did.
As you point out, there is the instruction fetch, which could be the intended operation, rather than developing the zero itself.
On par with that is having enough registers to just hold a zero, and whether that made sense depended on the need and developer strategy.
I am a big fan of the moto CPU's, starting with the 6809. Just to be clear.
But moving it from the zero register to another register would also take time. If what you want is a zero in a register other than the zero register (say, one that is going to serve as the index of a loop, which the zero register cannot do), then MOVEQ should not take any longer than a MOVE from the zero register to another register.
Say we are zeroing memory. No advantage there. Coupla cycles right at the start, then a ton of writes.
Say we are forming a bitmask. Could be an advantage there in that having a zero handy in a register means no fetching one. When a lot of dynamically created masks are needed, this can be a nice gain.
I'm sure we can come up with more. It's not always important, and like you mention with the moto designs, may not matter too much due to many other optimizations possible given a good instruction set.
Some people would rather have the register free for general use! I'm one of those, but if there is a zero register, I use it to get the benefit of it when I can. On the devices I've seen, there are generally a lot of registers so the marginal impact of having a zero register isn't significant. There are plenty to work with.
Maybe I should be clear here too. I personally don't care whether there is one. If it's there, I do things in ways that leverage it, and was just pointing out why devices that have one, ahem... have one! Those that don't may or may not have options that make sense. The way moto did it is very good, and there are other pretty great optimizations possible with their ISA, abusing the stack to write memory, etc...
If not, then I do other things. It's assembly language! Work the chip, right?