> Is there any attempt to reorder IO events such that writes (and maybe reads) o...

packetlost · on Nov 23, 2022

Yeah, even tweaking kernel IO scheduling would probably be sufficient. It'll depend on spinning rust vs SSDs though.

jorangreef · on Nov 23, 2022

Yes, our thinking here was that for SSD or NVMe sometimes the cost of scheduling is not worth it, and "noop" can be a good choice, since the device is already so fast relative to the cost of the CPU and memory access required to schedule.

As far as we understand w.r.t. mechanical sympathy for flash, what counts most is either a large bulk I/O that the device can internally parallelize, or else sufficiently parallel smaller I/Os.

Then, being sector-aligned to avoid the kernel from having to fixup alignment with a memcpy—also, we're using Direct I/O so we have to—and especially trying to issue writes all with similar “Time of Death”. For example, if you're writing out the blocks for a new LSM table on disk, then it is in fact good to keep these close together, since it's likely they'll be compacted again and reclaimed at the same time in future.

This brings us back to scheduling... :)