How is data distributed over multiple hosts? Aside from being replicated, is is sharded? Does it have any sort of redundancy? What happens if you lose a host?
Every host has all data. (This sounds crazy, until you do the math on how cheap storage is and how comparatively "little" data people actually need in the real world.) So we have 6x redundancy in normal operation, across three datacenters (and three different power grids, three different providers). There are also nightly backups.