x86 assembly doesn’t have to be scary (2018)

eatonphil · on Oct 7, 2021

I know this article is about bootloaders but more generally there are two ways I've tackled x86 as a total noob: compile a simple language to like ~5 instructions [0, 1] and, separately, write an emulator for ~5 instructions that can run basic C programs [2, 3].

I think it's fairly common to recommend beginners to write a compiler but I think it's less common to actually recommend trying to emulate parts of x86. I think it's a particularly easy way to get started just because it's the architecture you already know or because its the architecture that all compiler tutorials use (if they don't use LLVM). And if you are a programmer you probably incidentally have gcc, gdb, and objdump on your system ready to help you out.

Doing both compilers _and_ emulators really helped my understanding of x86 and C (even if I wasn't writing C).

My background is in web development and my reason for doing these projects/writing is purely educational.

[0] https://notes.eatonphil.com/compiler-basics-lisp-to-assembly...

[1] https://notes.eatonphil.com/compiler-basics-an-x86-upgrade.h...

[2] https://notes.eatonphil.com/emulating-amd64-starting-with-el...

[3] https://notes.eatonphil.com/emulator-basics-a-stack-and-regi...

secondcoming · on Oct 7, 2021

Compiler Explorer [0] is a godsend for seeing what the compiler is generating.

For example, the codegen for some C++ that was discussed a few days ago [1]

[0] https://godbolt.org/

[1] https://godbolt.org/z/cfYKqajMd

BeeOnRope · on Oct 7, 2021

Looks interesting, can you share what post it was discussed in?

Someone · on Oct 7, 2021

https://news.ycombinator.com/item?id=28721892

rajeevk · on Oct 7, 2021

Here is my article that explains how C code can be understood in terms of assembly: https://www.avabodh.com/cin/cin.html

I also wrote another article which explain how C++ code can be understood in terms of C code: https://www.avabodh.com/cxxin/cxx.html

nvarsj · on Oct 7, 2021

Doing CTFs is also a way to get intimately familiar with x86 assembly, and they are pretty fun too.

iota8 · on Oct 7, 2021

Here is an article that explains how C code is translated into assembly: https://www.avabodh.com/cin/cin.html

This website also has another article which explains how a C++ code can be understood in terms of C code. https://www.avabodh.com/cxxin/cxx.html

flohofwoe · on Oct 7, 2021

I sometimes wonder whether assembly code would even be considered "scary" today if IBM had picked the Motorola 68000 instead of the Intel 8088 for their PC.

The x86 instruction set was a cobbled-together mess from day one, while 68k assembly coding was pure joy because of its elegant and consistent instruction set.

pjc50 · on Oct 7, 2021

It's "scary" on almost every architecture because there's no fault tolerance, recovery, exception handling, or reporting. If you make a mistake you get to reassemble the smashed plate of your memory image in order to work out what happened - assuming you can get at it at all. Some platforms will just reboot on you.

flohofwoe · on Oct 7, 2021

Hmm, I don't agree, stepping through assembly code in most (C/C++) debuggers works just as well as with high level languages, and process isolation in operating systems also works for assembly coding, so it's just as unlikely to crash the whole computer with assembly code as with high level languages (otherwise we'd have a massive problem if operating system security would just depend on banning low level programming).

Bare metal embedded coding is a different topic of course, but regular application development with assembly works just fine.

pjc50 · on Oct 7, 2021

Inline assembly is certainly the least scary case and may be the easiest to get started with. But the original article targets the 16-bit DOS platform as a bootloader, where is "is" the operating system and you have no help.

secondcoming · on Oct 7, 2021

In our codebase, we used inline asm twice; for 'pause' and 'rdtscp'. Even then we replaced that with intrinsics later.

'Regular applications' that use inline asm should really be raising eyebrows.

flohofwoe · on Oct 7, 2021

Sure, that's the case today, but in the 80's and early 90's it wasn't all that uncommon to write big applications completely in assembly code. With a proper macro assembler and IDE-like coding environment that wasn't as bad as it sounds today. The focus has just shifted away from assembly programming to high-level languages, and the tools moved along. At least reading assembly code is still important today though.

ajsnigrutin · on Oct 7, 2021

yep

https://en.wikipedia.org/wiki/RollerCoaster_Tycoon_(video_ga...

> The game was developed in a small village near Dunblane over the course of two years.[2][5] Sawyer wrote 99% of the code for RollerCoaster Tycoon in x86 assembly language, with the remaining one percent written in C.[3]

kingcharles · on Oct 8, 2021

Even into the late 90s 2D PC games were being written with large chunks of assembler. 3D APIs killed that. I'm sure there were huge slabs of machine code in console games until the early 2000s, especially systems that needed a lot of specialized code to fully exploit them, e.g. PlayStation 2.

ajross · on Oct 7, 2021

> 68k assembly coding was pure joy because of its elegant and consistent instruction set

That's maybe a little spun. The separated address/data registers (there are two kinds of registers, and memory operations need to use one from each to combine to a final address) played hell with optimizer strategies for years.

In fact 68k compiler output was significantly sub-par pretty much throughout its lifetime. The "cobbled-together mess" had (post-386 anyway) a significantly more orthogonal instruction set and was just plain easier to optimize, even for humans.

notriskfree · on Oct 7, 2021

The 68k was certainly great fun after 8bit systems, and the x86 was almost just more of the same. ARM32 is also nice, apart from some corner cases. I think everything has corner cases, apart from x86 where the whole thing is an ugly mess.

Banjo911 · on Oct 7, 2021

No, on the 68000 you can do memory addressing using none, one address register, two address registers, or one of each.

On the 68020 you get even more flexibility.

ajross · on Oct 7, 2021

Yes. But what you cannot do is a load with the sum of any two GPRs. They need to be in the right partitions. That makes register assignment a huge pain for the optimizer, and historically hurt the architecture.

That's the kind of complexity that really hurts software. Compare vs. the commonly-cited x86 nonsense (the REP prefix, say), which complicates silicon implementations but generally makes software easier to write (c.f. decades of optimized inline memcpy implementations).

The point being the 68k was a dead end in a different direction. It was a "clean" archiecture from the perspective of a 1970's assembly programmer, but not a late 80's compiler writer.

Banjo911 · on Oct 7, 2021

The only thing you can't do is the sum of two data registers, and this only applies when calculating an effective address.

When you have 15 registers to work with that's not going to be more of a problem than only having 7 or 8 registers to work with.

ajross · on Oct 7, 2021

I don't want to be too blunt, but have you looked at early 90's era 68k compiler output? It was crap. When Sun launched SPARC, like half the advantage of the platform was that the compiler was suddenly generating this amazingly clean code. The phantom spills and intra-GPR movs everyone was used to disappeared overnight.

michaelcampbell · on Oct 7, 2021

As an owner of the original 1984 Mac, I bought a copy of the classic Lance Levanthal 68000 assembly book. I never got much into actually coding in it, but I remember how "this makes sense, mostly" the book felt.

Then I read some of the Intel stuff. OMG.

mjbrusso · on Oct 7, 2021

I agree. 68k was one of the most orthogonal architectures ever.

inkyoto · on Oct 7, 2021

After the PDP-11 architecture, yes.

Taniwha · on Oct 7, 2021

well except for the two sorts of registers

Joker_vD · on Oct 7, 2021

Three, starting with 68040.

a-priori · on Oct 7, 2021

And likely far more by now, had it become the PC instruction set architecture. The x86 instruction set was also simpler back in the late 70s, in the 8086 era when the Motorola 68000 was first released, than it is now.

Back then it only had the one register width (16-bit, e.g. "ax"), whereas now it has the 32-bit series (e.g. "eax") series and the 64-bit series (e.g. "rax"). It also now has SIMD, SSE/AVX, virtualization support, and other technologies. Back then it just had one operating mode (real mode), whereas now it has protected mode, long mode, system management mode, and a few other intermediate modes (e.g. "unreal mode").

So a lot of the complexity that x86 has now was introduced after that decision was made. It's definitely conceivable that the 68000 line would have developed similarly had it been chosen instead of x86 for the PC.

coldacid · on Oct 7, 2021

8086 let you address the upper and lower halves of the 16-bit registers as well, so don't trick yourself into thinking there was just one register width available in the sense that everything could only be treated as 16-bit words.

colejohnson66 · on Oct 7, 2021

Many 8 bit processors had the same “feature”. For example, the Z80’s 8 bit registers were internally just halves of a 16 bit register.

123pie123 · on Oct 7, 2021

totally agree. I learnt a bit of 6502/6809 in my early teens and in to college/ uni. I couldn't not be bothered with the x86 - a complete pita, although to be fair that may have been the not so great manuals I had at the time.

but I started in my first job dissassembling 68000, very easy to work with

Koshkin · on Oct 7, 2021

Ideally - ideally IBM would have created and used their own microprocessor based on the System 370 architecture. But no - for them the PC was a glorified typewriter: even PS/2 (based on 80286) was mostly positioned for use merely as a "smart" terminal for mainframes. So, today the entire world is basically run on faster "typewriters" (sometimes enhanced to include "windows" - even on the server side).

gardaani · on Oct 7, 2021

ARM assembly code looks pretty nice. Mobile phones and Macs are using ARM, so there's lots of devices using it. Only Windows is using x86/x64.

ChuckNorris89 · on Oct 7, 2021

>Only Windows is using x86/x64

Yeah, X64 is only used by windows, that's like ~80% of the desktop computing market, plus nearly every server and cloud instance out there, so not much at all. /s

Why do some users assume that the whole world revolves around Apple's iOS/M1 Mac ecosystem as if it exists in a vacuum?

codebolt · on Oct 7, 2021

Closer to 90%: https://netmarketshare.com/operating-system-market-share.asp...

ChuckNorris89 · on Oct 7, 2021

Jesus, no wonder Microsoft can shove ads and do whatever they want with Windows.

cmrdporcupine · on Oct 7, 2021

Because well over 90% of shipped computing devices are ARM based.

Desktop computers are a declining and relatively small market. Your home, car and office are full of ARM devices. Potentially hundreds.

x86 lives in the data centre (Linux, not Windows, so contrary to grandparent post, but whatever) and in some desktop systems. Not the majority of systems.

Has nothing to do with an Apple fetish.

michaelcampbell · on Oct 7, 2021

> Why do some users assume that the whole world revolves around Apple...

Because for those users, it does, and Apple coddles and encourages that mindset.

edgyquant · on Oct 7, 2021

> nearly every server and cloud instance out there

Is this true? I thought Linux dominated the server space.

bodhiandpysics1 · on Oct 7, 2021

ARM assembly is easy to learn, because of its relatively small and orthogonal instruction set. However, load store architectures are annoying to program in, since you're constantly having to juggle memory. The lack of a convenient way to spill registers is also really really annoying! (you can only spill and load registers two at a time)

cbm-vic-20 · on Oct 7, 2021

RISCV also has a nice assembly language.

Joker_vD · on Oct 7, 2021

Oh, absolutely:

    LUI  a0, 0x7FFFF
    ADDI a0, a0, 0xFFF

is superior to

    MOV eax, 0x7FFFFFFF

Except, of course, that RISC-V snippet doesn't actually load 0x7FFFFFFF into a0 because Reasons, it has to be

    LUI  a0, 0x80000
    ADDI a0, a0, 0xFFF

"But the assembler has the LI pseudoinstruction so it will properly do this calculation for you!". Right, so much for "nice assembly language": you need an actual smart macroassembler to write it.

boondaburrah · on Oct 7, 2021

Talkin' out of my ass here, but is this an artifact of the parameter having to fit in the same instruction word as the opcode ('cause RISC) and because the register is the size of a word (which is partially used by the opcode now), you can't actually load a whole register with an immediate in one go?

Joker_vD · on Oct 7, 2021

Absolutely, and different RISCs coped with it in their own ways. ARM has 12-bit immediates which it treats as having a 8-bit and a 4-bit parts: the 8-bit part is extended into 32 bits and then rotated right twice the number in the 4-bit part. MIPS has 16-bit immediates and the 32-bit load is generally done by LUI then ORI (since MIPS zero-extends the immediates unlike RISC-V which sign-extends those). And RISC-V has 12- (for lower part of the word) and 20-bit (for the upper part) immediates.

The funny part is, there is now the "C" extension to RISC which introduces 16-bit instructions which are allowed to freely mix with 32-bit instructions — so now those 32-bit instructions can be 16-bit aligned and even be split between two physical memory pages which kinda kills the whole "but at least fixed-length encoding prevents Spectre-like exploits" argument.

avianes · on Oct 15, 2021

> "But the assembler has the LI pseudoinstruction so it will properly do this calculation for you!". Right, so much for "nice assembly language": you need an actual smart macroassembler to write it.

In x86 the mnemonic "MOV" can be translated into instructions with a different opcode according to the addressing mode, immediate value size, or target register. Most of the x86 instructions have a similar issue, while RISC-V macroassembler are pretty simple. Therefore, the x86 assembler must actually contain much more intelligence than the RISC-V assembler to make it look "simple" and "nice".

cmrdporcupine · on Oct 7, 2021

Well, sure, it's awkward but also consistent. So once you've learned the rules/pattern for this kind of thing, you can read/write code without having to look up a wack of different instructions and modes and register sets, which x86 is notorious for.

Assembly isn't supposed to be convenient and expressive to write. for that we have high level languages. But some consistency makes for less error prone and easier analysis.

Joker_vD · on Oct 7, 2021

x86 is also consistent, just in a different way. There are basic moves and arithmetics, basic branching, and basic stack-related stuff. Next, there are extensions: string-manipulating extension (STOS/LODS/etc.), multiplication/division extension (MUL/DIV with their idiosyncrastic use of DX:AX), floating-point extension, control-registers extensions (tons of those), vectorized extensions, etc. Inside any set of instructions, things are pretty consistent. It's just that the Intel's Software Developer's Manual is not structured this way, it lumps all of those instructions together.

But RISC-V specification is explicitly structured around describing several basic cores and the extensions to those so it looks like it's all very unified and consistent: and indeed, it mostly is since it was developed mostly in one continuous effort with consistency in mind. Bute there are still some inconsistencies between how things are done in different extensions as well, for example, the "C" extension uses zero-extended immediates in half of its instructions unlike the rest of ISA and the other half of this very extension because of pragmatics: nobody would like to have negative offsets in those shortened instructions, so those are unsigned.

zozbot234 · on Oct 7, 2021

To be clear, that happens simply because ADDI sign-extends its immediate argument. This saves the need for separate immediate opcodes, and is reasonably consistent with the use of sign-extension elsewhere in the ISA.

7thaccount · on Oct 7, 2021

Super dumb question (haven't taken a microprocessors class in over a decade):

How hard would it really be to custom build a chip that was really simple, but modern. I'm thinking like the 6502 in the Commodore, but much faster. Or is the complexity in x86 an inherent property of modern performance? I guess what I'm getting at, is could you build something that is actually pretty darn fast if you don't need to run a modern OS like windows or Linux on it, but keep it drastically simple?

I've been thinking it might be neat to have a blazing fast NUC sized computer that just boots into some barebones forth sitting on top of a few assembly words. Maybe with just enough peripherals to do some actual work (like load from SD card).

sgtnoodle · on Oct 7, 2021

There's inherent complexity in the x86 instruction decoder, but after decoding, modern x86 CPUs don't particularly resemble legacy x86 CPUs internally. The native machine code is all proprietary "micro ops".

A lot of the performance gains over the last couple decades haven't come from machine code changes, but rather from various forms of pipelining and superscaling. Rather than run one instruction at a time, CPUs run hundreds of instructions at a time. The complexity of doing that is that many instructions often depend on the result of instructions immediately before them, and so the CPU needs a lot of shortcut paths internally to keep from stalling out.

You could get to Ghz speeds with a custom ASIC, and you could use base RISC-V as a modern, not crufty ISA. It still won't be anywhere near as fast as a high end x86 or ARM CPU, though, unless it's similarly pipelined and superscaled.

7thaccount · on Oct 7, 2021

So, it sounds like there likely is unavoidable complexity if you want the screaming performance we're used to.

sgtnoodle · on Oct 7, 2021

Yes, but specifically only for a general purpose CPU.

You could make a CPU optimized specifically for forth or whatever, and likely achieve better performance for the number of transistors than otherwise. For example, if virtual memory isn't helpful, than you can omit that whole subsystem.

ampdepolymerase · on Oct 7, 2021

It's called a FPGA; a software defined microprocessor (most higher tier processors can be reprogram to a certain extent through microcode, but a FPGA is a specialised chip designed specifically for this). You can get ridiculously high performance for your specific application, but it will never excel at general purpose computing. Good enough for Forth, great for ML, DSP, and number crunching, bad for running random apps from GitHub.

7thaccount · on Oct 7, 2021

I've used FPGAs before (also ages ago). I guess I could create a simple chip on an FPGA and run a Forth on that, but I was thinking something more physical.

d_tr · on Oct 7, 2021

But would you not need to use an FPGA anyway for experimentation and prototyping? If you want something more physical, fast and small you need a spot at an actual fab. There is no other alternative. You might also want to take a look at the "Minimal Fab" technology. I found out about it a year or so ago and it looked really interesting!

7thaccount · on Oct 7, 2021

Thank you!

Koshkin · on Oct 7, 2021

I find it kind of funny that majority of uses of FPGA is indeed to implement a simple CPU (basically a programmable state machine), and the problem at hand is then solved by writing a program for it.

ip26 · on Oct 7, 2021

The convergence of RISC & CISC has shown that both things are true. A really simple ISA with a modern design can be pretty darn fast. At the same time, additional ISA complexity helps you wring the utmost out of your hardware, and IMO also helps in the bazaar environment - macro ops let a menagerie of hardware do work in the way most efficient for itself without hardware-specific binaries.

Underneath the covers, however, the underlying hardware complexity is an inherent property of modern performance. For example, branch prediction.

badrabbit · on Oct 7, 2021

I feel like X64 doesn't get enough love these days. I think new tutorials should be X64 first and then after you get a hang of it talk about X32. It's harder the other way around, especially when it comes to calling conventions.

coopierez · on Oct 7, 2021

On the contrary, I felt that understanding where x86 had come from helped me understand the reason why the registers were named what they were, why some instructions are longer than others, etc.

It's confusing as a beginner to encounter the "di" and "dx" registers, and to understand why the 8-bit version of "r13" is "r13b" but the 8-bit version of "rdx" is either "dh" or "dl".

sabas123 · on Oct 7, 2021

I think if you start from a point where you need to know all the registers beforehand you already lost.

I like x64 a lot more because IMO it allows you a few important memory related ones in x86 making life a lot simpler.

badrabbit · on Oct 7, 2021

Wouldn't it be better if you knew about all the current registers first then learn about older types? I learned like most people X32 first and I feel like I still read X64 relative to X32.

coopierez · on Oct 7, 2021

I suppose it depends how you learn! For some it is likely better to do what you suggested and start with what's current and then work backwards - but for me I needed to know why certain limitations were in place, and that required me to start at the beginning and work forwards.

As a newer developer this is a problem I've had with computing in general - a lot of newer languages, web frameworks, etc. are solving problems that I only understand through seeing what came before me. For example - why should I care about memory ownership if I've never tried to write a large-scale C program?

HelloNurse · on Oct 7, 2021

New tutorials? Don't they all date from when 32 bit registers were new, 16 bit registers mainstream, and MMX and x87 the new complicated instruction set additions?

It would be certainly a smaller tutorial, but historical perspective might be valuable for understanding cruft layers of design decisions that made sense decades ago and addressed problems that have ceased to exist.

mettamage · on Oct 7, 2021

I started with x86, after that x64 felt simply like a different flavor. There was more Googling to do, but I did that a ton already for x86, so I was used to the type of game I was playing.

raxxorrax · on Oct 7, 2021

You also need to define the system when talking about assembly, I think this is more relevant than the difference between architecture variants. I would recommend Linux because available documentation. Without doing any system specific syscalls and learning about calling conventions you won't get very far.

https://github.com/torvalds/linux/blob/master/arch/x86/entry... (subject to change, see your specific kernel source)

I think more fun is doing assembly on a µC without OS with some interesting peripherals and many MCUs have an instruction set inspired by x86.

badrabbit · on Oct 7, 2021

I think it depends on your goals. I am with you for writing assembly. For reading compiler generated coder however, I think windows is ideal as a starting point because you don't often get spoiled by the availability if source code and debugging symbols.

Ygg2 · on Oct 7, 2021

Umm, did anyone read the reference manual for x86_64? Intel's big enough to kill a goat (~2000). Add AMD on that and you can kill a small tribe of goats.

sabas123 · on Oct 7, 2021

Depending on what you're after there is a lot more than 2000 pages of documentation :)

That said, the C standard is also an impenetrable mess, yet people can start using it without resorting to it.

Ygg2 · on Oct 7, 2021

If you are reaching for assembly, usually that means digging into some optimizations.

Those are highly obscure.

badrabbit · on Oct 7, 2021

There are not that many compiler generated instructions both for X32 or X64 if you look at it by percentage of code. You have to look up the manual for rare instructions no matter what, you hardly need to read maybe a dozen pages to know enough to understand the common operations and control flow. My argument if you start with the current arch, that will be your frame of reference instead of X32 (why not X16 if earlier is better?)

mike256 · on Oct 7, 2021

I think it's much easier to begin with x86. Especially when the assembled code is being run the way it is on this website. x64 is a little harder because just the switch to x64 mode alone almost fills the boot sector.

Joker_vD · on Oct 7, 2021

And then you lose the ability to temporarily switch to the real mode to call BIOS functions so you better load the rest of your program before switching to x64.

pengaru · on Oct 7, 2021

x86's purely stack-oriented C ABI calling convention makes things far less tedious. In x86_64 sure it executes faster with the register passing, but what a miserable thing to read/write as a human.

Maybe this is just a case of having learned and written 32-bit x86 assembler for years in my youth, but I strongly prefer it to x86_64.

inkyoto · on Oct 8, 2021

C stack-oriented ABI calling conventions also started with PDP-11, for which the first portable C compiler was written, and the convention then tagged along to nearly every other architecture as the pcc was progressively ported to other platforms.

If a function return value could fit into %r0 on the PDP-11 (%ax/%eax/%rax on x86), it would be returned in there. %r1-%r4 would be used to pass function parameters in, if they could fit, and/or spill over into the stack.

Heck, even UNIX system call conventions on x86 can be traced back to PDP-11, i.e. the syscall number is passed in %r0 (%eax) followed by a TRAP (INT on x86) instruction (can't remember which TRAP number, though).

badrabbit · on Oct 7, 2021

But would you still prefer X32 if you started with X64? My argument is that some would not find it tediuous if that was the first thing they learned. Similar to how a python programmer would share your sentiment about C because he learned python first. I learned C first so while I accept that it is more verbose and more work in general, I enjoy writing C more.

pengaru · on Oct 7, 2021

I think so. It's nice to have more registers and all, but the consistency of an entirely stack-oriented calling convention is simply more elegant and ergonomic IMHO. The register passing overflowing into stack-oriented at some arbitrary limit is just hideously warty, and obnoxious if you're actually writing assembly with C ABI function calling.

Changing the signature of functions requires rearranging which registers are being populated, and if there's enough parameters some go in the stack, and if you've rearranged the order, now you're moving some from stack back to registers and visa versa. At least when everything was always going through the stack, you just rearranged their position in the stack. x86_64 will always be more annoying in this regard, it's not an ABI decision made with humans in mind at all. The assumption is (rightfully) that compilers are doing this work, and the perf win is significant.

badrabbit · on Oct 7, 2021

Fair enough, you have a good point there

HappyMans · on Oct 7, 2021

I am fascinated with bootloaders and kernel writing. I am not very good at it, but I am fascinated by it, and every so often I try to learn some more. It feels a bit like a useless skill to learn (legacy BIOS bootloaders, that is) given UEFI dominance. But it connects me with my childhood playing with 286es and wondering how to program them.

I love articles like this that break it down. The biggest challenges so far have been getting the assembler to output the correct format (ie, 16 bit real mode) and learning inline assembler in C (and getting GCC to output the correct format).

kingcharles · on Oct 8, 2021

This is the book I learned from in the 90s: https://www.amazon.com/Developing-32-Bit-Operating-System-Cd...

It's basically "Writing an Operating System from Scratch for Dummies".

I actually wrote my own graphical x86 OS starting from the code in there.

HappyMans · on Oct 11, 2021

Thank you so much for the recommendation. Been looking for something like this for a while.

Do you think it's a waste of my time to be learning this in 2021?

kingcharles · on Oct 22, 2021

I definitely don't think it is a waste of time. I found it really fascinating to find out how a PC boots up from zero and how a very basic kernel is coded.

wiz21c · on Oct 7, 2021

Long time ago I coded much of my professional stuff in assembly, I was hired specifically to optimize stuff. But it was 20 years ago, compilers were not very smart.

How smart are compilers these days ? Say, to optimize small function, for example computing a scalar product or applying a 3D matrix transformation to a set of points.

flohofwoe · on Oct 7, 2021

IME: "it depends". You'll have to check the generated assembly code and then tweak your "high-level" C code to appease the compiler to generate code that's acceptable. Sometimes a small change in the high-level code is enough to break the "pattern matching" in optimizer passes.

Here's an example that looks like magic at first glance where the compiler converts manual bit twiddling code to a popcnt instruction (with the right compiler setting), but do the bit counting any other way, and the whole thing falls apart:

https://www.godbolt.org/z/KaM6jWjdx

thegeomaster · on Oct 7, 2021

Pretty smart. The examples you gave are math-heavy, so to get the best performance you need to do use some kind of SIMD instructions. For these you need to drop down a level, although not really assembly - there are compiler intrinsics that you can use. And for simple functions, compilers are getting fairly good at autovectorization, meaning to introduce SIMD instructions automatically. But it's not something you can rely on.

Generally, they do lots of inlining, and then once you inline you can get some more optimizations in, rinse and repeat. Ends up pretty optimal. (This is C++, can't speak for other languages.)

We work on very perf-sensitive code and we never drop down to assembly. For hot loops, we usually inspect the generated assembly and if it's not great, it's fairly easy to "nudge" the compiler towards the better-performing solutions by tweaking the source code. Also some manual unrolling might be needed to better saturate the vector processing cores of modern CPUs.

And when you're working with signed integers, you still have to do stuff like a >> 1 instead of a / 2 :)

wiz21c · on Oct 7, 2021

If it was on StackOverflow, I would give it a "correct answer" flag :-)

But if it's now up to some nudging, it's vastly better to me.

I provided math stuff and vectorisable stuff on purpose :-) Happy to see that vectorisation is somewhat automatic. I remember the MMX days and there were not that funny :-)

BeeOnRope · on Oct 7, 2021

Someone well-versed in assembly can still do much better than the compiler in many cases over reasonably small function. You could of course do better for large functions too, but at some point the cost becomes prohibitive: if you're going to do this, stick to a few hotspots which are as compact as possible.

orbifold · on Oct 7, 2021

GCC and ICC typically do a reasonable job, with the right flags: https://godbolt.org/z/6a8no4n5a.

JoeAltmaier · on Oct 7, 2021

You resort to assembler for critical operations or performance. Having a nice, clean orthogonal instruction set helps none of that.

So while 68000 ASM may look pretty and be easy to remember, its obvious why x86 won - you could get the critical jobs done.

tenebrisalietum · on Oct 7, 2021

I do know x86 for example has things like the STOS/STOSB/STOSW/STOSD which specifies a lot of behavior for a single instruction, and combined with the REP prefixes is a pretty elegant way to do memory block operations. I don't think 68000 had anything like that.

JoeAltmaier · on Oct 7, 2021

To be fair, Intel nerfed those instructions for a long time. See, DOS was using REPNZ STOSW for timing things. So as the processors got faster, those instructions didn't, so as to make DOS BIOS continue to operate correctly. It was a terrible time for x86.

At the same time, I used a clever 12-byte sequence of those opcodes to swap software interrupt vectors on process switch. So each process could have its own floating-point exception handlers etc. We called it the 'soft vector chain' and it was among the least known parts of our kernel.

A colleague (John McGinty) suggested in our old age we could consult by scratching our beards and saying "Ah! It must be the soft vector chain!" for every problem.

inkyoto · on Oct 8, 2021

> its obvious why x86 won - you could get the critical jobs done.

It is a bit less obvious, in fact.

Motorola never looked at 68k CPU's as a serious business, more like toying around with the CPU's all the time or looking at them as a less important spin-off of the main business. Their main sources of income were defence contracts (e.g. specialised or hardened microchips), microcontrollers, DSP's (which were pretty cool, by the way, – all implementing the Harvard architecture), memory chips (I think), radios and the field radio equipment and later mobiles.

They carried largely the same attitude to 88k RISC and PowerPC CPU lines (albeit trying to compete more seriously for a while), but ultimately failing to catch up and leading to the eventual PowerPC demise. After that failure, they spun off anything CPU, DSP and microcontroller related into Freescale, and the rest is now history.

DonHopkins · on Oct 7, 2021

>x86 assembly doesn’t have to be scary

...just ugly.

Joker_vD · on Oct 7, 2021

It doesn't matter that much, really. What truly is ugly is interfacing with the rest of the computer, ugh. You can't use the VGA BIOS from x64, so go-o-o-od luck doing it via PCI. And properly setting up IOAPIC?

agumonkey · on Oct 7, 2021

Somehow seeing prolog, APL, sml code made me think in terms of small combinators that felt a lot like assembly.

Decabytes · on Oct 7, 2021

The problem I’ve always had with learning assembly (arm in my case) was that I never really had a project to work on that would use any of that knowledge

not-elite · on Oct 7, 2021

Same here, mostly doing math stuff on x86.

There was a recent thread [1] optimizing some very primitive trig functions.

I recently watched a djb interview where he talked about the importance of fully utilizing available hardware [2 at 5:15]. That can be a good starting point, although at work its usually easier to just consume more resources than to use what you've got more efficiently.

[1] https://news.ycombinator.com/item?id=28209097

[2] https://www.youtube.com/watch?v=1svxNxG6hHc

ThomasBHickey · on Oct 7, 2021

WASM is a nice 'assembly' language to play with.

begueradj · on Oct 7, 2021

I learned x86 at university. I miss it.

Koshkin · on Oct 7, 2021

You don't have to: Windows programming using NASM is easy:)

  ; To build:
  ;   nasm -fobj hello.asm
  ;   alink -oPE -subsys con hello -entry main

  bits 32

  import imp_CreateFileW      kernel32.dll    CreateFileW
  import imp_WriteConsoleW    kernel32.dll    WriteConsoleW
  import imp_CloseHandle      kernel32.dll    CloseHandle
  import imp_ExitProcess      kernel32.dll    ExitProcess
  import imp_GetLastError     kernel32.dll    GetLastError

  extern imp_CreateFileW
  extern imp_WriteConsoleW
  extern imp_CloseHandle
  extern imp_ExitProcess
  extern imp_GetLastError

  %define CreateFile      [imp_CreateFileW]
  %define WriteConsole    [imp_WriteConsoleW]
  %define CloseHandle     [imp_CloseHandle]
  %define ExitProcess     [imp_ExitProcess]
  %define GetLastError    [imp_GetLastError]

  global main

  section .text use32
    
  main:

    push dword 0 ; hTemplateFile = 0
    push dword 0 ; dwFlagsAndAttributes = 0
    push dword 3 ; dwCreationDisposition = OPEN_EXISTING
    push dword 0 ; lpSecurityAttributes = NULL
    push dword 2 ; dwShareMode = FILE_SHARE_WRITE
    push dword 0x40000000 ; dwDesiredAccess = GENERIC_WRITE
    push dword filename 
    call CreateFile
    mov [handle], eax
    
    push dword 0 ; lpReserved = NULL
    push dword nChars ; lpNumberOfCharsWritten
    push dword 16; nNumberOfCharsToWrite
    push dword hello ; lpBuffer
    push eax ; hFile
    call WriteConsole

    mov eax, [handle]
    push eax
    call CloseHandle
    
    push dword 0
    call ExitProcess
    
  err:

    call GetLastError
    push eax
    call ExitProcess

  section .data use32

  filename:   db      __utf16__('CONOUT$'), 0, 0
  hello:      db      __utf16__('Hello Console!'), 13, 0, 10, 0, 0, 0
  handle:     resd    1
  nChars:     resd    1

Koshkin · on Oct 7, 2021

As my professor told once, "The only difference between programming in assembler and a high-level language is that you have to type more." I can confirm - early in my career I was able to sustainably produce more than a thousand lines of working x86 assembly code a day.

andrewshadura · on Oct 7, 2021

This webpage hijacks the keyboard in a very nasty way, disabling not only scrolling, but also a bunch of other functions.

benjojo12 · on Oct 7, 2021

Really? (Author here)

What browser are you using?

p_l · on Oct 7, 2021

Happens to me as well (current chrome w/ vimium on mac), but only after running the example in the emulator.

Before hitting the emulator (and/or editor, not sure) all keys work fine. After, well, only scroll on the mouse worked for moving through the page.

erik-smit · on Oct 7, 2021

I have similar issue with Chrome 94 on Windows.

It's fine when I initially open the page, but once I click 'run' I can no longer use PgUp, PgDn, F12, etc.

I'm guessing the emulator captures some input. But even when clicking on other parts of the page, it doesn't 'uncapture'.

FWIW, it doesn't really bother me.

heinrich5991 · on Oct 7, 2021

Up/Down/PgUp/PgDown/CTRL-F all work for me. What's disabled for you?

andrewshadura · on Oct 7, 2021

All of those. Also F12 doesn't work, so I have to use the mouse to inspect the internals.

werdnapk · on Oct 7, 2021

No issues in Firefox for me.

joeberon · on Oct 7, 2021

All works fine for me

pajko · on Oct 7, 2021

Now let's talk about SIMD...