How do you know if a separate ZIL log device will help your ZFS performance? A while back, I had a workload which I suspected had a fair amount of synchronous writes on a ZFS file system. The general recommendation for this case is to use a separate ZIL log, preferably on a device which has a nonvolatile write cache or a write-optimized SSD. However, once you add a separate log to a pool, you cannot easily remove it. So before you make that move, it is a better idea to look at your workload and answer the question, "how much ZIL write activity is generated by my workload?" I posed that question to Roch, and quickly received a short dtrace script which basically sums the size of the data being sent to the ZIL over an interval. I took that one-liner and transformed it into a more useful tool I call zilstat.
The main question to be answered by zilstat is, “how much write activity is going to the ZIL?” If you run your workload and the results remain all balls (zeros), then it is a good indication that your workload would not benefit from a separate log because the workload does not generate synchronous writes. However, if you are seeing a lot of activity, then perhaps your workload would benefit from a separate log. In general, NFS and database servers are the beneficiaries of a separate log. Here is a sample output from my desktop. I apologize if this wraps on your browser.
From this sample you can see that the amount of data being synchronously written by the applications is rather modest. This data is fit into buffers which is what is actually sent to storage. For my activity this afternoon, the maximum rate of buffer bytes written peaked at 544,768 in a one-second sample. I really didn't notice this as a performance impact, mostly because it is my desktop machine, and it already performs pretty well. Your workload might be very different and zilstat might be able to give you an idea of how much data would be written to a separate log. Clearly, if the numbers are mostly zero, most of the time, then a separate log will likely not help the performance of your workload. In some cases, you may want to see the ZIL activity per transaction group (txg) commit. This will provide a better idea of the separate log device size. To see per-txg measurements, use the string "txg" instead of a number for the interval. You will also need to specify a pool with the txg option because there may be multiple active pools, and I'm not sure that there is a reasonable way to present that data concisely. If you want to look at multiple pools with the txg option concurrently, just run two instances of zilstat. Here is an example of txg output.
I find that using the txg view with timestamps is very useful. However, the output is too long to fit nicely on a web page example. Note that the time interval for txg commits is variable. By design, this should be no more than approximately 30 seconds. However, you may see them occur more frequently. For example, if you are using the Time Slider feature of OpenSolaris, then when ZFS takes snapshots it forces a txg commit. This is another reason why recording timestamps with the -t option is useful. Note: a packaging bug in OpenSolaris may affect your ability to run DTrace scripts. If you run zilstat and DTrace complains about "Invalid library dependency in /usr/lib/dtrace/iscsit.d: /usr/lib/dtrace/iscsi.d" then you will need to install the SUNWiscsitgt package. See CR 6855460 for details. |