Characterizing workloads in Data ONTAP

One of the most important things when analyzing Data ONTAP performance is being able to characterize the workload that you’re seeing. This can be done a couple of ways, but I’ll outline one here. Note that this is not a primer on analyzing storage bottlenecks — for that, you’ll have to attend Insight this year! — but rather a way to see how much data a system is serving out and what kind of data it is.

When it comes to characterizing workloads on a system, everyone has a different approach — and that’s totally fine. I tend to start with a few basic questions:

  1. Where is the data being served from?
  2. How much is being served?
  3. What protocols are doing the most work?
  4. What kind of work are the protocols doing?

You can do this with the the stats command if you know what counters to look for. If you find a statistic and you don’t know what it means, we have a good help system at hand. To wit and to use stats explain counters:

# rsh charles "priv set -q diag; stats explain counters readahead:readahead:incore"
Counters for object name: readahead
Name: incore
Description: Total number of blocks that were already resident
Properties: delta
Unit: none

Okay, so that’s not the greatest explanation, but it’s a start! More commonly-used counters are going to have better descriptions.

When it comes to analyzing your workload, a good place to start are the read_io_type counters in the <ttwafl object. These statistics are not normally collected, so you have to start counting and stop counting them. Here’s an example on a 7-mode system — I’m going to use rsh non-interactively because there are a lot of counters!

# rsh charles "priv set -q diag; stats show wafl:wafl:read_io_type"
wafl:wafl:read_io_type.cache:0%
wafl:wafl:read_io_type.ext_cache:0%
wafl:wafl:read_io_type.disk:0%
wafl:wafl:read_io_type.bamboo_ssd:0%
wafl:wafl:read_io_type.hya_hdd:0%
wafl:wafl:read_io_type.hya_cache:0%

Here is an explanation of these counters:

  • cache are reads served from the Filer’s RAM
  • ext_cache are reads served from the Filer’s Flash Cache
  • disk are reads served from any model of spinning disk
  • bamboo_ssd are reads served from any model of solid-state disk
  • hya_hdd are reads served from the spinning disks that are part of a Flash Pool
  • hya_cache are reads served from the solid-state disks that are part of a Flash Pool

Note that these counters are present regardless of your system’s configuration. For example, the Flash Cache counters will be visible even if you don’t have Flash Cache installed.

Now that you know where your reads are being served from, you might be curious as to how big the reads are, and if they’re random or sequential. Those statistics are available, but it takes a bit more grepping and cutting to get them. Here’s a little from my volume, sefiles. Note that I need to prefix this capture with priv set:

# rsh charles "priv set -q diag; stats start -I perfstat_nfs"
# rsh charles "priv set -q diag; stats stop -I perfstat_nfs"

Okay, so there’s a lot of output there. It’s data for the entire system. That said, it’s a great way to learn — you can see everything the system sees. And if you dump it to a text file, it will save you using rsh over and over again! Let’s use rsh again and drill down to the volume stats, which is the most useful way to investigate a particular volume:

# rsh charles "priv set -q diag; stats show volume:sefiles"
volume:sefiles:instance_name:sefiles
volume:sefiles:instance_uuid:a244800b-5506-11e1-91cc-123478563412
volume:sefiles:parent_aggr:aggr1
volume:sefiles:avg_latency:425.77us
volume:sefiles:total_ops:93/s
volume:sefiles:read_data:6040224b/s
volume:sefiles:read_latency:433.44us
volume:sefiles:read_ops:92/s
volume:sefiles:write_data:0b/s
volume:sefiles:write_latency:0us
volume:sefiles:write_ops:0/s
volume:sefiles:other_latency:12.17us
volume:sefiles:other_ops:1/s
volume:sefiles:nfs_read_data:6040224b/s
volume:sefiles:nfs_read_latency:433.44us
volume:sefiles:nfs_read_ops:92/s
volume:sefiles:nfs_write_data:0b/s
volume:sefiles:nfs_write_latency:0us
volume:sefiles:nfs_write_ops:0/s
volume:sefiles:nfs_other_latency:15.09us
volume:sefiles:nfs_other_ops:0/s

I’ve removed some of the internal counters for clarity. Now, let’s take a look at a couple of things: the volume name is listed as sefiles in the instance_name variable. During the time I was capturing statistics, the average amount of operations per second was 93 (“total_ops“). Read latency (“read_latency“) was 433us, which is 0.433ms. Notice that we did no writes ("write_ops") and basically no metadata either ("other_ops").

If you’re interested in seeing what the system is doing in general, I tend to go protocol-by-protocol. If you grep for nfsv3, you can start to wade through things:

# rsh charles "priv set -q diag; stats show nfsv3:nfs:nfsv3_read_latency_hist"
nfsv3:nfs:nfsv3_read_latency_hist.0 - <1ms:472
nfsv3:nfs:nfsv3_read_latency_hist.1 - <2ms:7
nfsv3:nfs:nfsv3_read_latency_hist.2 - <4ms:10
nfsv3:nfs:nfsv3_read_latency_hist.4 - <6ms:8
nfsv3:nfs:nfsv3_read_latency_hist.6 - <8ms:7
nfsv3:nfs:nfsv3_read_latency_hist.8 - <10ms:0
nfsv3:nfs:nfsv3_read_latency_hist.10 - <12ms:1

During our statistics capture, approximately 472 requests were served at 1ms or better. 7 requests were served between 2ms and 1ms. 10 requests were served between 4ms and 2ms, and so on. If you keep looking, you’ll see write latencies as well:

# rsh charles "priv set -q diag; stats show nfsv3:nfs:nfsv3_write_latency_hist"
nfsv3:nfs:nfsv3_write_latency_hist.0 - <1ms:1761
nfsv3:nfs:nfsv3_write_latency_hist.1 - <2ms:0
nfsv3:nfs:nfsv3_write_latency_hist.2 - <4ms:2
nfsv3:nfs:nfsv3_write_latency_hist.4 - <6ms:1
nfsv3:nfs:nfsv3_write_latency_hist.6 - <8ms:0
nfsv3:nfs:nfsv3_write_latency_hist.8 - <10ms:0
nfsv3:nfs:nfsv3_write_latency_hist.10 - <12ms:0

So that’s 1761 writes at 1ms or better. Your writes should always show excellent latencies as writes are acknowledged once they’re committed to NVRAM — the actual writing of the data to disk will occur at the next consistency point.

Keep going and you’ll see latency figures that are concatenated for all types of requests — reads, writes, other (i.e. metadata):

# rsh charles "priv set -q diag; stats show nfsv3:nfs:nfsv3_latency_hist"
nfsv3:nfs:nfsv3_latency_hist.0 - <1ms:2124
nfsv3:nfs:nfsv3_latency_hist.1 - <2ms:5
nfsv3:nfs:nfsv3_latency_hist.2 - <4ms:16
nfsv3:nfs:nfsv3_latency_hist.4 - <6ms:18
nfsv3:nfs:nfsv3_latency_hist.6 - <8ms:13
nfsv3:nfs:nfsv3_latency_hist.8 - <10ms:6
nfsv3:nfs:nfsv3_latency_hist.10 - <12ms:1

Once you’ve established latencies, you may be interested to see the I/O sizes of the actual requests. Behold:

# rsh charles "priv set -q diag; stats show nfsv3:nfs:nfsv3_read_size_histo"
nfsv3:nfs:nfsv3_read_size_histo.0-511:0
nfsv3:nfs:nfsv3_read_size_histo.512-1023:0
nfsv3:nfs:nfsv3_read_size_histo.1K-2047:0
nfsv3:nfs:nfsv3_read_size_histo.2K-4095:0
nfsv3:nfs:nfsv3_read_size_histo.4K-8191:328
nfsv3:nfs:nfsv3_read_size_histo.8K-16383:71

These statistics are represented like the latencies were: 0 requests were served that were between 0-511 bytes in size, 0 between 512 & 1023 bytes, and so on; then, 328 requests were served that were between 4KB & 8KB in size and 71 requests between 8KB & 16KB.

When it comes to protocol operations, we break things down by the unique operation functions — including metadata operations. I’ll show the count numbers, but it’ll also show percentages of requests and individual latencies for each:

# rsh charles "priv set -q diag; stats show nfsv3:nfs:nfsv3_op_count"
nfsv3:nfs:nfsv3_op_count.getattr:4
nfsv3:nfs:nfsv3_op_count.setattr:0
nfsv3:nfs:nfsv3_op_count.lookup:0
nfsv3:nfs:nfsv3_op_count.access:1
nfsv3:nfs:nfsv3_op_count.readlink:0
nfsv3:nfs:nfsv3_op_count.read:424
nfsv3:nfs:nfsv3_op_count.write:1767
nfsv3:nfs:nfsv3_op_count.create:0
nfsv3:nfs:nfsv3_op_count.mkdir:0
nfsv3:nfs:nfsv3_op_count.symlink:0
nfsv3:nfs:nfsv3_op_count.mknod:0
nfsv3:nfs:nfsv3_op_count.remove:0
nfsv3:nfs:nfsv3_op_count.rmdir:0
nfsv3:nfs:nfsv3_op_count.rename:0
nfsv3:nfs:nfsv3_op_count.link:0
nfsv3:nfs:nfsv3_op_count.readdir:0
nfsv3:nfs:nfsv3_op_count.readdirplus:0

In this case, most of my requests were reads & writes; there was very little metadata operation (just 4 attribute lookups). If you want to know what these operations represent, you can read some of my previous blog entries or check out the RFC that defines your protocol.

One last thing that I forgot to add is trying to figure out if your workload is random or sequential. The easiest way that I’ve found to do that is to look in the readahead stats, as the readahead engine is a particularly useful way of determining how your reads are operating. We don’t need to focus on writes as much because any write much larger than 16KB is coalesced in memory and written sequentially — even if it started as random at the client. (I’ll talk about RAVE, the new readahead engine in DOT 8.1+, in later posts.)

I’ve filtered out some of the irrelevant stuff. So let’s gather what we can use and see:

# rsh charles "priv set -q diag; stats show readahead:readahead:total_read_reqs"
readahead:readahead:total_read_reqs.4K:629
readahead:readahead:total_read_reqs.8K:77
readahead:readahead:total_read_reqs.12K:0
readahead:readahead:total_read_reqs.16K:4
readahead:readahead:total_read_reqs.20K:1
readahead:readahead:total_read_reqs.24K:7
readahead:readahead:total_read_reqs.28K:8
readahead:readahead:total_read_reqs.32K:0
readahead:readahead:total_read_reqs.40K:0
readahead:readahead:total_read_reqs.48K:0
readahead:readahead:total_read_reqs.56K:0
readahead:readahead:total_read_reqs.64K:77662

These statistics work much the same as our previous ones. 629 read requests were serviced that were 4KB blocks, 77 were serviced that were 8KB blocks, etc. The numbers continue right up to 1024KB. (Keep this in mind whenever someone tells you that WAFL & ONTAP can only read or write 4KB at a time!) Now if you want to see sequentiality:

# rsh charles "priv set -q diag; stats show readahead:readahead:seq_read_reqs"
readahead:readahead:seq_read_reqs.4K:36%
readahead:readahead:seq_read_reqs.8K:46%
readahead:readahead:seq_read_reqs.12K:0%
readahead:readahead:seq_read_reqs.16K:0%
readahead:readahead:seq_read_reqs.20K:0%
readahead:readahead:seq_read_reqs.24K:0%
readahead:readahead:seq_read_reqs.28K:0%
readahead:readahead:seq_read_reqs.32K:0%
readahead:readahead:seq_read_reqs.40K:0%
readahead:readahead:seq_read_reqs.48K:0%
readahead:readahead:seq_read_reqs.56K:0%
readahead:readahead:seq_read_reqs.64K:96%

We can see that, of the 4KB reads that we serviced, 36% were sequential. Mathematically this means that ~64% would be random, which we can confirm here:

# rsh charles "priv set -q diag; stats show readahead:readahead:rand_read_reqs"
readahead:readahead:rand_read_reqs.4K:63%
readahead:readahead:rand_read_reqs.8K:53%
readahead:readahead:rand_read_reqs.12K:0%
readahead:readahead:rand_read_reqs.16K:100%
readahead:readahead:rand_read_reqs.20K:100%
readahead:readahead:rand_read_reqs.24K:100%
readahead:readahead:rand_read_reqs.28K:100%
readahead:readahead:rand_read_reqs.32K:0%
readahead:readahead:rand_read_reqs.40K:0%
readahead:readahead:rand_read_reqs.48K:0%
readahead:readahead:rand_read_reqs.56K:0%
readahead:readahead:rand_read_reqs.64K:3%

Here, of our 4KB requests, 63% of them are random. Of our 8KB requests, 53% were random. Notice that the random percentages and sequential percentages for each block histogram will add up to ~100% depending on rounding. If you dig deeper into the stats, readahead can provide you with some useful information about how your data came to be read:

# rsh charles "priv set -q diag; stats readahead:readahead"
readahead:readahead:requested:2217395
readahead:readahead:read:82524
readahead:readahead:incore:25518
readahead:readahead:speculative:79658
readahead:readahead:read_once:82200

You can interpret those numbers thusly. Again, I’ve removed the irrelevant stuff:

  • requested is the sum of all blocks requested by the client
  • read are the blocks actually read from disk (where the readahead engine had a “miss”)
  • incore are the blocks read from cache (where the readahead engine had a “hit”)
  • speculative are blocks that were readahead because we figured the client might need them later in the I/O stream
  • read_once are blocks that were not read ahead and was not cached because it didn’t match the cache policy (e.g. flexscale.enable.lopri_blocks being off)

With this information, you should have a pretty good idea about how to characterize what your Filer is doing for workloads. Although I’ve focused on NFSv3 here, the same techniques can be applied to the other protocols. If you’ve got any questions, please holler!

Advertisements