Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've had hundreds of drives in hundreds of terabytes of appliances over years. URE and resilver is a common occurrence, as in every monthly scrub across 200+ drives. This isn't 200 drives in a single array, this is over 4 appliances geographically distributed.

The drives have been champs overall, they're approaching an average runtime of about 8 years. During that 8 years we've lost about 20% of the drives in various ways.

It is almost guaranteed that when a drive fails, another drive will have a URE during the resilver process. This is a non-issue as we run RAID-Z3 with multiple online hotspares.



> monthly scrub

Are they used 24/7 at high iops? Why not nightly scrub?


We could do weekly. The volume of data is large enough that even sequential scrubbing when idle is about an 18 hour operation. As it is, we're happy with monthly scrubbing on the Z3 arrays. We don't bother pulling drives until they run out of reallocatable sectors, this extends the service lifetime by a year in most cases.

I intentionally provisioned one of the long term archive only appliances with 12 hot spares. This was to prevent the need for a site visit again before we lifecycle the appliance. Currently down to seven hot spares.

That replacement will probably happen later this year. Should reduce the colo cost by power requirement reduction enough that the replacement 200TB appliance pays for itself in 18 months.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: