Hi can someone tell me the difference between RAIDZ1 and RAIDZ2 in a practical s...

toast0 · 2025-07-24T16:49:36 1753375776

One thing that raidz has going for it over a block level raid is that zfs knows which blocks are in use and recovery does not need to read all blocks on all disks to recover.

Zfs also prioritizes user reads and writes over recovery (resilvering), which may not be universal practice with RAID.

It's certainly still the case that if your redundancy drives failed, you're at greater risk of data loss, and IMHO, disk failure rate during repair is higher than base rate: maybe from extra use, but also because of the risk of correlated failures --- if your environment contributed to failure of the first N disks, the other disks were also in that environment and may be about to fail.

commandersaki · 2025-07-24T18:57:56 1753383476

I think also that ZFS supports online rebuild. I remember in my data centre days that a few of the RAID5 products only did rebuilds offline. The HP/Compaq Smartarray stuff was the bees knees though; reliable stuff.

StrangeDoctor · 2025-07-24T20:56:07 1753390567

Almost all “normal” damage is and should be repaired online with zfs. Those offline repairs meant the hardware controller had no idea how to interact with filesystem directly, probably for the best. Level of abstraction purists don’t like this aspect of zfs.

If something particularly bad happened, or you tried being really “clever”, you can get into a rare situation of not being able to import the pool or have it import in read only mode. There are tools to help repair that kind of metadata damage. Then proceed with the normal online repair if needed.

StrangeDoctor · 2025-07-24T02:40:14 1753324814

raidz doesn't work exactly like raid, but conceptually it's helpful to carry over that knowledge. the biggest difference is that all drives can potentially have parity blocks on them.

1. If a drive is missing, and it contained a data block you want to read, then you have to do parity calculations to recalculate that block. this means potentially all drives must use their read capacity for this calculation. I think this would be considered stress and max read throughput is significantly reduced. If your block size is very large, or your files much smaller you might get away with minimal performance hit but you're also wasting a lot of capacity/benefit of z2. (In certain pathological cases z2 can have the storage profile of a double mirror, but with all the complications of z2) The rebuild process will require a recalculation for every missing block type, basically every drive will need to perform a read for each restored block. Writing new data to a degraded z2 pool can force zfs to be quite wasteful. For example, a 5 disk z2 pool with 1 drive missing will mean a maximum of 2 data blocks with 2 parity blocks, instead of the expected 3 and 2. restoring that drive will not automatically restore that capacity unless the files are written again to the restored pool. The drives will be filled unevenly, this has performance and storage efficiency penalties.

2. If you replace both degraded drives at the same time, and you aren't using resilver_defered, it should only need to read all other drives once and write to both new disks. But you might not want this, depending on many complicated factors.