zilstat

How do you know if a separate ZIL log device will help your ZFS performance? 

A while back, I had a workload which I suspected had a fair amount of synchronous writes on a ZFS file system. The general recommendation for this case is to use a separate ZIL log, preferably on a device which has a nonvolatile write cache or a write-optimized SSD. However, once you add a separate log to a pool, you cannot easily remove it. So before you make that move, it is a better idea to look at your workload and answer the question, "how much ZIL write activity is generated by my workload?"

I posed that question to Roch, and quickly received a short dtrace script which basically sums the size of the data being sent to the ZIL over an interval. I took that one-liner and transformed it into a more useful tool I call zilstat.

Usage: zilstat.ksh [gMt][-l linecount] [-p poolname] [interval [count]]
    -M  # print numbers as megabytes (base 10)
    -t  # print timestamp
    -p poolname      # only look at poolname
    -l linecount    # print header every linecount lines (default=only once)
    interval in seconds or "txg" for transaction group commit intervals
             note: "txg" only appropriate when -p poolname is used
    count will limit the number of intervals reported

    examples:
        zilstat.ksh # default output, 1 second samples
        zilstat.ksh 10  # 10 second samples
        zilstat.ksh 10 6    # print 6 x 10 second samples
        zilstat.ksh -p rpool    # show ZIL stats for rpool only

    output:
        [TIME]
        N-Bytes    - data bytes written to ZIL over the interval
        N-Bytes/s  - data bytes/s written to ZIL over ther interval
        N-Max-Rate - maximum data rate during any 1-second sample
        B-Bytes    - buffer bytes written to ZIL over the interval
        B-Bytes/s  - buffer bytes/s written to ZIL over ther interval
        B-Max-Rate - maximum buffer rate during any 1-second sample
        ops        - number of synchronous iops per interval
        <=4kB      - number of synchronous iops <= 4kBytes per interval
        4-32kB     - number of synchronous iops 4-32kBytes per interval
        >=32kB     - number of synchronous iops >= 32kBytes per interval
    note: data bytes are actual data, total bytes counts buffer size

The main question to be answered by zilstat is, “how much write activity is going to the ZIL?” If you run your workload and the results remain all balls (zeros), then it is a good indication that your workload would not benefit from a separate log because the workload does not generate synchronous writes. However, if you are seeing a lot of activity, then perhaps your workload would benefit from a separate log. In general, NFS and database servers are the beneficiaries of a separate log.

Here is a sample output from my desktop. I apologize if this wraps on your browser.

# ./zilstat.ksh -t 60
TIME                    N-Bytes  N-Bytes/s N-Max-Rate    B-Bytes  B-Bytes/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
2009 Feb  6 14:25:22          0          0          0          0          0          0      0      0      0      0
2009 Feb  6 14:26:22     287368       4789     283064     618496      10308     524288      7      0      2      5
2009 Feb  6 14:27:22      10304        171      10304      94208       1570      94208      2      0      0      2
2009 Feb  6 14:28:22     223120       3718     213152     618496      10308     544768      7      0      2      5
2009 Feb  6 14:29:22          0          0          0          0          0          0      0      0      0      0
2009 Feb  6 14:30:22       3336         55       1264      28672        477      12288      7      7      0      0
2009 Feb  6 14:31:22          0          0          0          0          0          0      0      0      0      0
2009 Feb  6 14:32:22     149768       2496     145480     294912       4915     282624      8      1      2      5
2009 Feb  6 14:33:22        248          4        248       4096         68       4096      1      1      0      0

From this sample you can see that the amount of data being synchronously written by the applications is rather modest. This data is fit into buffers which is what is actually sent to storage. For my activity this afternoon, the maximum rate of buffer bytes written peaked at 544,768 in a one-second sample. I really didn't notice this as a performance impact, mostly because it is my desktop machine, and it already performs pretty well. Your workload might be very different and zilstat might be able to give you an idea of how much data would be written to a separate log. Clearly, if the numbers are mostly zero, most of the time, then a separate log will likely not help the performance of your workload.

In some cases, you may want to see the ZIL activity per transaction group (txg) commit. This will provide a better idea of the separate log device size. To see per-txg measurements, use the string "txg" instead of a number for the interval. You will also need to specify a pool with the txg option because there may be multiple active pools, and I'm not sure that there is a reasonable way to present that data concisely. If you want to look at multiple pools with the txg option concurrently, just run two instances of zilstat. Here is an example of txg output.

# ./zilstat.ksh -p rpool txg
waiting for txg commit...
       txg    N-Bytes  N-Bytes/s N-Max-Rate    B-Bytes  B-Bytes/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
   3873100          0          0          0          0          0          0      0      0      0      0
   3873101          0          0          0          0          0          0      0      0      0      0
   3873102      24160        805      24160      45056       1501      45056      6      1      5      0
   3873103          0          0          0          0          0          0      0      0      0      0

I find that using the txg view with timestamps is very useful. However, the output is too long to fit nicely on a web page example.

Note that the time interval for txg commits is variable. By design, this should be no more than approximately 30 seconds. However, you may see them occur more frequently. For example, if you are using the Time Slider feature of OpenSolaris, then when ZFS takes snapshots it forces a txg commit. This is another reason why recording timestamps with the -t option is useful.

Note: a packaging bug in OpenSolaris may affect your ability to run DTrace scripts. If you run zilstat and DTrace complains about "Invalid library dependency in /usr/lib/dtrace/iscsit.d: /usr/lib/dtrace/iscsi.d" then you will need to install the SUNWiscsitgt package. See CR 6855460 for details.

Č
ċ
zilstat.ksh
(9k)
Richard Elling,
Jul 24, 2009, 2:11 PM
Comments