Back when I was even less experienced in self-hosting I setup my media/backup server using a RAIDZ1 array and 3 x 8TB disks. It’s been running well for a while and I haven’t had any problems and no disk errors.

But today I read a post about ‘pool design rules’ stating that RAIDZ1 configurations should not have drives over 1TB because the chances of errors occurring during re-silvering are high. I wish I had known this sooner.

What can I do about this? I send ZFS snapshots to 2 single large (18TB) hardrives for cold backups, so I have the capacity to do a migration to a new pool layout. But which layout? The same article I referenced above says to not use RAIDZ2 or RAIDZ3 with any less than 6 drives…I don’t want to buy 3 more drives. Do I buy an additional 8TB drive (for a total of 4 x 8TB) and stripe across two sets of mirrors? Does that make any sense?

Thank you!

  • mlaga97@lemmy.mlaga97.space
    link
    fedilink
    English
    arrow-up
    11
    ·
    6 months ago

    I think it’s worth pointing out that this article is 11 years old, so that 1TB rule-of-thumb probably probably needs to be adjusted for modern disks.

    If you have 2 full backups (18TB drives being more than sufficient) of the array, especially if one of those is offsite, then I’d say you’re really not at a high enough risk of losing data during a rebuild to justify proactively rebuilding the array until you have at least 2 or more disks to add.

    • Hopfgeist@feddit.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      Let’s do the math:

      The error-reate of modern hard disks is usually on the order of one undetectable error per 1E15 bits read, see for example the data sheet for the Seagate Exos 7E10. An 8 TB disk contains 6.4E13 (usable) bits, so when reading the whole disk you have roughly a 1 in 16 chance of an unrecoverable read error. Which is ok with zfs if all disks are working. The error-correction will detect and correct it. But during a resilver it can be a big problem.

      • mlaga97@lemmy.mlaga97.space
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 months ago

        If the actual error rate were anywhere near that high, modern enterprise hard drives wouldn’t be usable as a storage medium at all.

        A 65% filled array of 10x20TB drives would average at least 1 bit failure on every single scrub (which is full read of all data present in the array), but that doesn’t actually happen with any real degree of regularity.