Comment 6 for bug 1548009

Revision history for this message
Richard Laager (rlaager) wrote :

A false negative, where an objectively corrupt block is treated as valid, is not ideal, but not harmful. The scrub would fail to correct the error, but it wouldn't make it worse. It would be detected as bad on the next read (scrub or otherwise).

There's also a case of a bad block being overwritten and the data gets corrupted on the way to disk. In that case, again, it's less than ideal, but the scrub hasn't made anything worse. It was bad before and is still bad. Since block pointers can't be overwritten in place, the block pointer hasn't changed, and thus the checksum hasn't changed. So the block will be detected as bad on the next read/scrub.

A false positive, where an objectively valid block is treated as corrupt, is where things have the potential to go bad. If the scrub overwrites that block and the good data makes it to disk, it's a no-op, which isn't harmful. If the scrub overwrites that and the good data is corrupted on its way to disk, that's the only case where a scrub makes things worse. As before, the checksum hasn't changed, so we'll know it's bad. And we obviously had a good copy from which to overwrite, so there's still the potential to repair this on the next read (scrub or otherwise).

So it is technically possible that a scrub could make something worse, but it requires the following:
1) Good data fails checksum validation due to a memory fault.
2) Another copy exists. That copy passes checksum validation.
3) The write is corrupted due to a memory fault on its way to disk.

If we're seeing enough memory faults to trigger this, the filesystem is already in serious danger from normal writes.