Yes, our thinking here was that for SSD or NVMe sometimes the cost of scheduling is not worth it, and "noop" can be a good choice, since the device is already so fast relative to the cost of the CPU and memory access required to schedule.
As far as we understand w.r.t. mechanical sympathy for flash, what counts most is either a large bulk I/O that the device can internally parallelize, or else sufficiently parallel smaller I/Os.
Then, being sector-aligned to avoid the kernel from having to fixup alignment with a memcpy—also, we're using Direct I/O so we have to—and especially trying to issue writes all with similar “Time of Death”. For example, if you're writing out the blocks for a new LSM table on disk, then it is in fact good to keep these close together, since it's likely they'll be compacted again and reclaimed at the same time in future.
Something like a disk elevator at a higher level of the stack?