Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of my best hacks back in the day was an 8080 emulator, for the 8088/8086 back in the time when what you were really trying to do was run the 8080 code as fast as possible (because the first PCs were slow, and emulating 8080 code was much slower than running 8080 code on a native machine).

Invariably at the time the 8080 emulators would have a dispatch loop, fetching the next opcode and dispatching through a jump table to some code to emulate each of the 256 opcodes. One problem with this was that you couldn't dispatch this way without altering the CPU flags, so you'd also have to save and restore those with LAHF/SAHF (fast) or PUSHF/POPF (slow). In general you'd need about 10 8086 instructions to emulate 1 8080 instruction, and in the early 1980s this meant your emulated CP/M program would run much slower than on your old CP/M computer.

My emulator would emulate 1 8080 opcode with 4 8086 opcodes as follows; An 8080 instruction, say 0x94 = sub h would be emulated with code loaded at address 0x9494. That code would be;

  sub bh,dl   ;bh = bh-dl, emulate h with bh, a with dl
  lodsb       ;al = *si++, get next opcode, increment emulated pc
  mov ah,al   ;eg 0x94 -> 0x9494
  jmp ax      ;jump to next instruction
This is a classic example of trading space for time, your emulator is sparsely distributed through 64K of RAM. So you needed 128K to emulate your 64K 8080.


Shower thought the next day, OMG I got the subtraction round the wrong way! Should be (of course);

  sub dl,bh ;dl = dl-bh, emulate A with dl, H with bh
Not all opcodes are one to one like this one. For example 0x3e = MVI A,imm (move an immediate value to the A register) would be;

  lodsb     ;al = *si++, get the immediate value
  mov dl,al ;dl = al, A = imm
This time I've omitted the 3 instruction postscript (lodsb; mov ah,al; jmp ax) to all opcodes.

While I'm here adding extra information, note another strength of my approach; The si, al and ah registers are the only resources consumed for emulation machinery. The di, bx, cx, dl and f registers are available to emulate the DE, HL, BC, A and F registers without any memory accesse (the dh register was wasted sadly). Remember the host CPU was the 16 bit 8088/8086 machine which also had 'segment' registers to allow access to 20 bits of addressable memory rather than 16 (so 1M not 64K). The segment registers had a role in emulation, basically setting up a 64K 'code segment' to run the emulator and another 64K shared 'stack and 'data' segment to emulate 64K of memory for the emulated 8080. Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.


I was wondering about the registers, why not use the "standard" mapping for BC/DE/HL => CX/DX/BX, and DI for A?

One disadvantage of this would be you have to "xchg ax,di" before and after every accumulator op, but that is a single byte opcode (and on the 8088, instruction fetch is the main bottleneck). Just from intuition, it seems like for common code sequences like "MOV D,H ! MOV E,L", having the byte registers directly addressable would be a win. But maybe you profiled how common the various instructions are in real code?

>Amusingly, getting this emulator going today would involve running another emulator to emulate this first iteration of the now venerable x86 architecture.

Not necessarily. Modern x86 CPUs running in 64-bit mode still support 16-bit code/data segments (the only thing not supported is 64 bit OS + V86 mode, and you don't need that for 8080 emulation). At least on Linux, this is made available to userspace via the modify_ldt syscall.

Of course, when you have 16-bit code and a CP/M emulator, might as well also make it run CP/M-86 programs (most of which do not use the kind of segment arithmetic that would require V86 mode). Then the 8080 CPU emulator would be just another code segment, and probably the easier part to write, compared to the code necessary to emulate the CP/M filesystem on Linux.


Good question and extra information, thanks. I wasn't aware of 16 bit code/data segments on 64 bit x86 CPUs.

I don't think I seriously considered using di for A (I've sort of started using a convention of lower case = 8086 registers, upper case = 8080 register so I may as well stick to it). Simply because of the many cases like the two I've used as examples, both of which are really simplified by having an 8086 8 bit register for A. As a 8080/Z80 programmer learning 8086 at the time, it took me a while to realise that al being off-limits for register A due to lodsb wasn't really a big problem, because any of the 8086 8 bit registers had equivalent expressability to the 8080 accumulator. At that point I stopped worrying about using the standard 8080 -> 8086 register mappings. It's more than possible that I was seduced by apparent simplicity and actual runtime performance could have been tweaked a little higher on average by something with a little more superficial complexity. I was in a hurry and looking to get my 8 bit tools running on my new box over a weekend or similar I think (I can't really remember exactly).

It was such a long time ago that I've forgotten a lot. Only after I wrote the "shower thought" post above did I remember that as well as dh, I had bp in reserve (basically I'd forgotten about bp). I suppose the extra resource (bp) meant even a normal dispatch loop emulator could also be written without needing to emulate any 8080 register with memory, contrary to what I wrote about above.

One optimisation I think I did do was actually to emulate MVI A,imm and other single byte immediate instructions with a lodsw rather than a lodsb, effectively prefetching the next opcode. I had the 256 opcode routines packed together initially, then I'd distribute them to their ultimate 0x0000, 0x0101, 0x0202 ... locations at runtime and append the lodsb; mov ah,al; jmp ax suffix then. So no problem if the code for MVI A,imm was lodsw; mov dl,al; mov al,ah; jmp ax; The appended suffix would still be there, it would just never execute.

My first implementation of this emulator took the form of a binary prefix that could be prepended to any CP/M 80 .com file with the Operating System's concatenate command (PIP ?), to make a CP/M 86 .cmd file. Later on I adapted it to MS-DOS with a similar deployment strategy, not a huge job because the basic system calls I needed were close cousins to MS-DOS equivalents. As you say, quite different for Linux.

In general the window of widespread usefulness for this tool was limited; basically the time when CPM/80 software was still competitively capable and PCs were still not at least an order of magnitude faster than their CP/M 80 predecessors (after that point conventional emulators were fine). I never took the chance to try and share my code with the world, that was harder then. So it never helped out anyone but me sadly.

Love your rep_lodsb handle by the way.


How would full translation work? The opcodes are close? Could you do a just in time version?


Yes, pretty close so full translation is easy if you have the source code. If not, emulation is the normal way. My emulator was faster than native 8080 code on everything but vanilla 8088 PCs, so it's more than sufficient.

I address translation in depth in my Retro Sargon project, an exercise in getting a vintage Z80 (8080 superset) assembly language program working on modern PCs. https://github.com/billforsternz/retro-sargon See the "Yet More Details" section in the README.


I think the 8086 was designed such that 8080 assembly code could be assembled as 8086 machine code. So I'd guess the instruction emulation would probably be reasonably efficient.


That’s genius.


Thank you! I really appreciate that.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: