I've been messing around with writing a toy database for fun/learning, and realised I've got a fairly big gap in my knowledge when it comes to dealing with performance and durability when dealing with file reads/writes.
Example of some questions I'd like to be able to answer or at least make reasonable decisions about (note: I don't actually want any answers to the above now, they're just examples of the sort of thing I'd like to read in depth about, and build up
some background knowledge):
* how to ensure data's been safely written (e.g. when to flush, fsync, what
guarantees that gives, using WAL)
* blocks sizes to read/write for different purposes, tradeoffs, etc.
* considerations for writing to different media/filesystems (e.g. disk, ssd, NFS)
* when to rely on OS disk cache vs. using own cache
* when to use/not use mmap
* performance considerations (e.g. multiple small files vs. few larger ones,
concurrent readers/writers, locking, etc.)
* OS specific considerations
I recall reading some posts (related to Redis/SQLite/Postgres) related to this, which made me realise that it's a fairly complex topic, but not one I've found a good entry point for.
Any pointers to books, documentation, etc. on the above would be much
appreciated.
* Bigger blocks = better performance. The bigger you can make it the faster you'll go. Your limiting factor is usually the desired resolution of the user (i.e. aggregation will inevitably result in under-utilized space).
* Disk, SSD and NFS don't all belong to the same category. Most modern products in storage are developed with the expectation that the media is SSD. Virtually nobody wants to enter the market of HDDs. The performance gap is just too big, and the existing products that still use HDDs rely on fast caching in something like flash memory anyways. NFS is a hopelessly backwards and outdated technology. It's the least common denominator, and that's why various storage products do support it, but if you want to go fast, forget about it. The tradeoff here usually is between writing your own client (usually, a kernel module) to do I/O efficiently, or spare users the need for installing a custom kernel module (often a security audit issue) and let them go slow...
* OS disk cache is somewhat of a misnomer. There are also two things that might get confused here. OS doesn't cache data written to disk -- the disk does. OS provides mechanism to talk to the disk and instruct it to flush the cache. There's also filesystem cache -- that's what OS does. It caches in the memory it manages the file contents of recently accessed files.
* I/O through mmap is a gimmick. Just one of the ways to abuse system API to do something it's not really intended to do. You can safely ignore it. If you are looking into making I/O more efficient, look into uring_io.