Comment 4 for bug 1548009

Revision history for this message
Richard Laager (rlaager) wrote :

The disk load added by scrubbing is almost entirely reading. There is a tiny amount of metadata writes to track of the scrub's progress (so it can resume after a reboot). The only time a scrub would write significant data is if it found a bad block (checksum error) and needed to rewrite a good copy of it (from another disk or another part of the disk).

Since it walks the ZFS tree, rather than proceeding in logical block order, scrubbing adds a lot of seeks. That's the load I want to avoid if you're already degraded, out of an abundance of caution, both for performance and safety. (I can't remember actually seeing or hearing of a scrub killing another disk, but the ZFS approach is to be conservative about data safety.) Under normal circumstances, it's definitely not a problem. Scrubbing and resilvering is throttled, and the #1 complaint I've seen is that it is too conservative/slow. (Google "ZFS resilver performance" to see everyone talking about how their resilvers are slow and they want to speed them up.)

At my day job, we have 100% of our storage on ZFS (not on Linux). Scrubs take 1.5 days on one cluster and 4 days on the other, on the exact same hardware, for the exact same data. (The data is the same because they each replicate their data to the other site for disaster recovery purposes.) The difference is that one system has more normal activity than the other, so scrubs have to yield to that. We've never noticed any sort of performance problem while scrubs are running.

Our storage is from Nexenta (whose system is Illumos, not Linux, but still OpenZFS-based). Providing ZFS storage to enterprises is essentially their entire business and they recommend scrubs. Oracle, creator of ZFS, has traditionally (and recently) recommended "at least quarterly" scrubs as a general rule of thumb if you don't know where to start.

In comparing this to the precedent of mdadm... a ZFS scrub would read the same or less data (since MD has to read the entire disk, but ZFS only reads the blocks it has in use), but it would be far more random reads.

I asked in #zfsonlinux and this also seemed worth noting:
(09:17:03) kash: rlaager: i do weekly scrubs everywhere
(09:25:25) kash: rlaager: my scrub zones have included AWS, i don't consider it an expensive operation