ZFS Tricks: Scheduling Scrubs

matthew's picture

Content mirrored at https://blogs.oracle.com/storageops/entry/zfs_trick_scheduled_scrubs

A frequently-asked-question on ZFS Appliance-related mailing lists is "How often should I scrub my disk pools?"  The answer to this is often quite challenging, because it really depends on you and your data.

Usually when asked a question I want to provide the answers to the questions they should have asked first, so that I'm certain our shared conversational contexts match up. So here's some background questions that we should have answers to before answering the "How often" question.

What is a scrub?

To "scrub" a disk means to read data from all disks in all vdevs in a pool. This process compares blocks of data against their checksums; if any of the blocks don't match the related checksum, ZFS assumes that data has been corrupted (bit rot happens to every form of storage!) and will look for valid copies of the data. If found, it'll write a good copy of the data to that storage, marking the old copy as "bad".

What is the benefit of a disk scrub?

Most people have a lot more "stale" data than they think they do: stuff that was written once, and never read from again. If data isn't read, there's no way to tell if it's gone bad due to bit rot or not. ZFS will self-heal data if bad data is found, so a scrub forces a read of all data in the pool to verify that it isn't currently bit-rotted, and heal the data if it is.

What performance impact is there to a scrub?

The ZFS appliance runs disk scrubs at a very low priority as a nearly-invisible background process. While there is a performance impact to scrubbing disk pools, this very low-priority background process should not have much if any impact to your environment. But the busier your appliance is with other things, and the more data is on-disk, the longer the scrub takes.

How long do scrubs run?

On a fresh system with little data and low utilization, scrubs complete very quickly.  For instance, on a brand-new, quiescent pool with 192 4TB disks, scrubs typically complete in just moments. There is no data to read, therefore the scrubs return almost as soon as we start them.

On very busy systems with very large pools and lots of I/O, it's possible for scrubs to run for months before completion. For example, a 192-disk, full-rack 7410 with 2TB drives in the Oracle Cloud recently required eight months to complete a pool scrub. The system was used around-the-clock with extreme write loads; the low quantity of of RAM (256GB/head), compression (LZJB better than 2:1), and nearly-full pool (80%+) conspired to force the scrub to run extremely slowly.

If the slow-running, low-impact scrub needs to complete in a shorter time than that, contact Support and ask for a workflow to prioritize your scrubs to run a little faster.  Realize, of course, if you do so that the performance impact goes up if scrubs run at higher priority!

Should I scrub my pools?

  1. Is the pool formatted with either RAIDZ or Mirror2 configuration? Although these two options offer higher performance than RAIDZ2 or Mirror3, redundancy is lower. (No, I'm not going to talk about Stripe. That should only ever be used on a simulator; I don't even know why it exists on a ZFS appliance.)
  2. Are unable to absolutely 100% guarantee that every byte of data in the pool is read frequently?  Note that even databases that the DBAs think of as "very busy" often have blocks of data that go un-read for years and are at risk of bit rot. Ask me how I know...
  3. Do you run restore tests of your data less frequently than once per year?
  4. Do you back up every byte of data in your pool less frequently than once per quarter?

If you answer "Yes" to any of the above questions, then you probably want to scrub your pools from time to time to guarantee data consistency.

How often should I scrub my pools?

This question is challenging for Support to answer, because as always the true answer is "It Depends".  So before I offer a general guideline, here are a few tips to help you create an answer more tailored to your use pattern.

  1. What is the expiration of your oldest backup? You should probably scrub your data at least as often as your oldest tapes expire so that you have a known-good restore point.
  2. How often are you experiencing disk failures? While the recruitment of a hot-spare disk invokes a "resilver" -- a targeted scrub of just the VDEV which lost a disk -- you should probably scrub at least as often as you experience disk failures on average in your specific environment.
  3. How often is the oldest piece of data on your disk read? You should scrub occasionally to prevent very old, very stale data from experiencing bit-rot and dying without you knowing it.

If any of your answers to the above are "I don't know", I'll provide a general guideline: you should probably be scrubbing your zpool at least once per quarter. It's a schedule that works well for most use cases, provides enough time for scrubs to complete before starting up again on all but the busiest & most heavily-loaded systems, and even on very large zpools (192+ disks) should complete fairly often between disk failures.

How do I schedule a pool scrub automatically?

There exists no easy mechanism to schedule pool scrubs from the BUI or CLI as of February 2015. I opened a RFE a few months back for one to be provided, but I'm not certain how far down the development pipeline such a feature is, if it will exist at all. So in Oracle IT, we just rolled our own. 

The below code is an example of how this can be accomplished. It is provided as-is, with no warranty expressed or implied. Use it at your own risk.

It's been working well for many months for us. Simply copy/paste the below code to some convenient filename, such as "safe_scrub.akwf".  Then upload the below workflow to your appliance using the "maintenance workflows" BUI screen.  The default schedule runs once every 12 weeks on a Sunday. You can tweak the schedule to match your needs either in the source code if you want to adjust the default schedule, or by visiting the "maintenance workflows" command-line interface and adjust the schedule manually after you upload it.

/*globals run, continue, list, printf, print, get, set, choices,
akshDump, nas, audit, shell, appliance*/
/*jslint maxerr: 50, indent: 4, plusplus: true, forin: true */

* A workflow to initiate a scrub on a schedule.
* Author: Matthew P. Barnson 
* Update history:
* 2014-10-09 Initial concept
* 2014-11-20 EIS deployment
* 2015-02-19 Sanitized for more widespread use
* 2015-02-19 Multiple pool functionality added by: Adam Rappner 

/* This program is provided 'as is' without warranty of any kind, expressed or
*  implied, including, but not limited to, the implied warranties of
*  merchantability and fitness for a particular purpose.*/

var MySchedules = [
        // Offset 3 days (Sunday), 9 hours, 00 minutes, week interval.
        // The UNIX Epoch -- January 1, 1970 -- occurred on a Thursday.
        // Therefore the ZFS appliance's week in a schedule starts on Thursday.
        // Sample offset: Every week
        //{offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 604800, units: "seconds"}
        // Sample offset: Every 4 weeks
        //{offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 2419200, units: "seconds"}
        // Sample offset: Once every 12 weeks on a Sunday
        {offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 7257600, units: "seconds"}

var workflow = {
    name: 'Scheduled Scrub',
    origin: 'Oracle PDIT mbarnson',
    description: 'Scrub on a schedule',
    version: '1.2',
    hidden: false,
    alert: false,
    setid: true,
    scheduled: true,
    schedules: MySchedules,
    execute: function (params) {
        "use strict";
        var myDate = run('date'), myReturn = "", pools = nas.listPoolNames(), p = 0;
        // Iterate over pools & start scrubs
                for (p = 0; p < pools.length; p = p + 1) {
            myDate = run('date');
            try {
                run('cd /');
                run('configuration storage set pool=' + pools[p]);
                run('configuration storage scrub start');
                myReturn += "New scrub started on pool: " + pools[p] + " ";
                audit('Scrub started on pool: ' + pools[p] + ' at ' + myDate);
            } catch (err) {
                myReturn += "Scrub already running on pool: " + pools[p] + " ";
                audit('Scrub already running on pool: ' + pools[p] + ' at ' + myDate);
        return ('Scrub in progress. ' + myReturn + '\n');

Happy scrubbing!