It’s funny, this is the exact kind of thing I’d want on a non-x86 system. Raspberry Pico for example would be a great place for a 2KB HTTP server but with not support for HTTPS, IPv6, or for any non-x86 architectures it’s a bit of a non-starter for my use cases. Still, very cool project!
The CPU architecture is actually the least of your concerns there—I'm pretty sure qemu-user can run httpdito on ARM with less than an order of magnitude performance overhead. There are a lot of embedded systems where an HTTP transaction per second per MHz would be more than sufficient.
The bigger problem is that the Raspberry Pico is a dual-core Cortex-M0+, which doesn't have an MMU, so it can't run Linux and especially can't handle fork(). But httpdito is basically scripting the Linux system call interface in assembly language—it needs to run on top of a filesystem, an implementation of multitasking that provides allocation of different memory to different tasks, and a TCP/IP stack. Any one of these is probably a larger amount of complexity than the 296 CPU instructions in httpdito.
The smallest TCP/IP stack I know of is Adam Dunkels's uIP. Running `sloccount .` in uip/uip cloned from https://github.com/adamdunkels/uip gives a count of 2796 lines of source code ("generated using David A. Wheeler's 'SLOCCount'."). uIP can run successfully on systems with as little as 2KiB of RAM, as long as you have somewhere else to put the code, but for most uses lwIP is a better choice; it minimally needs 10KiB or so. uIP is part of Dunkels's Contiki, which includes a fairly full-featured web server and a somewhat less-full-featured browser. I think he got both the server and the browser to run in 16KiB of RAM on a Commodore PET, but not at the same time.
(twIP http://dunkels.com/adam/twip.html is only 139 bytes of C source but doesn't support TCP or any physical-layer protocol such as Ethernet, PPP,or SLIP.)
However, Adam Dunkels has also written Miniweb http://dunkels.com/adam/miniweb/, which implements HTTP and enough of TCP and IP to support it, in 400 lines of C. It needs at least 30 bytes of RAM. Like twIP, it doesn't provide a physical layer. But that's solvable.
You can build mainline linux without an MMU, and there are even pretty crazy setups where you can run it on a ARM cortex (though usually an M4). It is not a standard system though, very little software will run without modification. The biggest issue for such processors is usually actually lack of memory (they have relatively little built-in and most have no external memory busses. There's at least one project where the externel memory is bitbanged through gpio!).
> Not having MMU means there's no virtual memory and instructions refer to physical memory addresses, cmiiw?
Pretty much, yeah.
> You say Linux won't work without MMU, it can't handle physical addresses? Moreover, why won't fork() work without MMU?
When httpdito fork()s two child processes, each of them starts receiving the HTTP request into the request buffer at `buf`. This works because the semantics of fork() give those two children two different buffers at the same memory address, one in each process's address space. The Linux userland relies relatively heavily on these semantics. It was a major obstacle to getting an SSH server running on cisco IOS, for example.
An event-driven server like darkhttpd is a much better fit for an MMUless system. Implementing multithreading is easy (it's half a page of assembly) but implementing memory mapping without an MMU requires some kind of interpreter.
(Actually you can implement fork() without virtual memory and without an MMU, for example with PDP-11-style segmentation, but the Cortex-M0+ doesn't have any of those facilities either.)
>"The Linux userland relies relatively heavily on these semantics. It was a major obstacle to getting an SSH server running on cisco IOS, for example."
Can you elaborate on this? Hasn't Cisco IOS at various times run on MIPS and X86 processors?
The original Cisco IOS ran on 68000 series processors which lacked an MMU. Even the later 68K models used an "embedded" version of a processor which did not have an MMU. For example, the Cisco 2500 used a 680EC30. Regular 68030s had MMUs, but the "EC" model did not. Later versions did run on MIPS though.
Unfortunately I'm just reporting secondhand rumors from people who worked at cisco, and I probably should have made that clear. So I don't know how IOS works at the machine-instruction level, just the command line.
Without an MMU, you can't do paging. That means fork() cannot do the normal copy-on-write business, because there's no page table to copy the entries in.
You also have no inter-process security, so everything can crash everything else including the kernel, and no swap.
I'm pretty sure Linux ELF has always allowed you to specify the initial load address. When I first wrote StoneKnifeForth https://github.com/kragen/stoneknifeforth its load address was 0x1000, but at some point Linux stopped allowing load addresses lower than 0x10000 by default (vm.mmap_min_addr). I originally wrote it in 02008, using the lower load address, and fixed it in 02017. It's still not using 0x804800 like normal executables but 0x20000. ASLR does not affect this.
Maybe you mean that before ELF support, Linux a.out executables had to be loaded at a fixed virtual address? That's possible—I started using Linux daily in 01995, at which point a.out was already only supported for backward compatibility.