Getting started with NetApp Storage QoS

Storage QoS is a new feature in Clustered Data ONTAP 8.2. It is a full-featured QoS stack, which replaces the FlexShare stack of previous ONTAP versions. So what does it do and how does it work? Let’s take a look!

The administration of QoS involves two parts: policy groups, and policies. Policy groups define boundaries between workloads, and contain one or more storage objects. We can monitor, isolate and limit the workloads of storage objects from the biggest point of granularity down to the smallest — from entire Vservers, whole volumes, individual LUNs all the way down to single files.

The actual policies are behavior modifiers that are applied to a policy group. Right now, we can set throughput limits based on operation counts (i.e., IOps) or throughput counts (i.e., MB/s). When limiting throughput, storage QoS throttles traffic at the protocol stack. Therefore, a client whose I/O is being throttled will see queuing in their protocol stack (e.g., CIFS or NFS) and latency will eventually rise. However, the addition of this queuing will not affect the NetApp cluster’s resources.

In addition to the throttling of workloads, storage QoS also includes very effective measuring tools. And because QoS is “always on”, you don’t even need to have a policy group in order to monitor performance.

So, let’s get started. When it comes to creating our first policy, we actually require three steps in order for the policy to be applied to the workload:

  1. Create the policy (with qos policy-group create...)
  2. Apply the policy to a Vserver (with vserver modify...)
  3. Apply the policy to a volume (with vol modify...)
  4. Monitor what’s going on (with qos statistics...)

Before we start, let’s verify that we’re starting from a clean slate:

dot82cm::> qos policy-group show
This table is currently empty.

Okay, good — no policy groups exist yet. Step one of three is to create the policy group itself, which we’ll call blog-group. In reality, you’d specify a throughput limit (either IOps or MB/s), but for now we won’t bother limiting the throughput:

dot82cm::> qos policy create blog-group -vserver vs0

Let’s make sure the policy group was created:

dot82cm::> qos policy-group show                    
Name             Vserver     Class        Wklds Throughput  
---------------- ----------- ------------ ----- ------------
blog-group       vs0         user-defined -     0-INF

Let’s confirm:

But, because we didn’t specify a throughput limit, the Throughput column is still showing 0 to infinity. Let’s add a limit of 1500 IOps:

dot82cm::> qos policy-group modify -policy-group blog-group -max-throughput 1500iops

(If we wanted to limit that volume to 1500MB/s, we could have substituted 1500mb for 1500iops.)

And verify:

dot82cm::> qos policy-group show                                                
Name             Vserver     Class        Wklds Throughput  
---------------- ----------- ------------ ----- ------------
blog-group       vs0         user-defined 0     0-1500IOPS

So, step three is to associate the new policy group with an actual object whose I/O we wish to throttle. The object can be one or many volumes, LUNs or files. For now though, we’ll apply it to a single volume, blog_volume:

dot82cm::> volume modify blog_volume -vserver vs0 -qos-policy-group blog-group

Volume modify successful on volume: blog_volume

Let’s confirm that it was successfully modified:

dot82cm::> qos policy-group show                                                
Name             Vserver     Class        Wklds Throughput  
---------------- ----------- ------------ ----- ------------
blog-group       vs0         user-defined 1     0-1500IOPS

Cool! We can see that Workloads has gone from 0 to 1. I’ve mounted that volume via NFS on a Linux VM, and will throw a bunch of workloads at it using dd.

While the workload is running, here’s how it looks:

dot82cm::> qos statistics workload characteristics show                         
Workload          ID     IOPS      Throughput      Request size    Read  Concurrency 
--------------- ------ -------- ---------------- --------------- ------- ----------- 
-total-              -      392         5.26MB/s          14071B     14%          14 
_USERSPACE_APPS     14      170       109.46KB/s            659B     32%           0 
_Scan_Backgro..  11702      115            0KB/s              0B      0%           0 
blog_volume-w..  11792     1679       104.94MB/s          65527B      0%           4

As you can see, our volume blog_volume is pretty busy — it’s pushing almost 1,700 IOps at over 100MB/sec. So, let’s see if the throttling is effective. First, we’ll give the policy group a low throughput maximum:

dot82cm::> qos policy-group modify -policy-group blog-group -max-throughput 100iops

Now let’s check its status:

dot82cm::> qos policy-group show
Name             Vserver     Class        Wklds Throughput  
---------------- ----------- ------------ ----- ------------
blog-group       vs0         user-defined 1     0-100IOPS

Now let’s see how the Filer is doing:

dot82cm::> qos statistics workload characteristics show
Workload          ID     IOPS      Throughput      Request size    Read  Concurrency 
--------------- ------ -------- ---------------- --------------- ------- ----------- 
-total-              -      384         6.71MB/s          18333B     11%          33 
_USERSPACE_APPS     14      169         2.50MB/s          15528B     26%           0 
_Scan_Backgro..  11702      115            0KB/s              0B      0%           0 
blog_volume-w..  11792       83         5.19MB/s          65536B      0%          15 
-total-              -      207         4.81MB/s          24348B      0%          17

You can see that the throughput has gone way down! In fact it’s gone below our limit of 80 IOps. And that, of course, is what’s supposed to happen. Now let’s remove the limit and see if things return to normal:

dot82cm::> qos policy-group modify -policy-group blog-group -max-throughput none

dot82cm::> qos statistics workload characteristics show                            
Workload          ID     IOPS      Throughput      Request size    Read  Concurrency 
--------------- ------ -------- ---------------- --------------- ------- ----------- 
-total-              -     1073        44.37MB/s          43363B      8%           1 
blog_volume-w..  11792      626        39.12MB/s          65492B      0%           1 
_USERSPACE_APPS     14      302         4.90MB/s          17041B     29%           0 
_Scan_Backgro..  11702      115            0KB/s              0B      0%           0 
-total-              -      263       471.86KB/s           1837B     19%           0

Because we can apply the QoS policy groups to entire Vservers, volumes, files & LUNs, it is important to keep track of what’s applied where. This is how you’d apply a policy group to an individual volume:

dot82cm::> volume modify -vserver vs0 -volume blog_volume -qos-policy-group blog-group

(To remove the policy, set the -qos-policy-group field to none.)

To apply a policy group against an entire Vserver (in this case, Vserver vs0):

dot82cm::> vserver modify -vserver vs0 -qos-policy-group blog-group

(Again, to remove the policy, set the -qos-policy-group field to none.)

To see which volumes are assigned our policy group:

dot82cm::> volume show -vserver vs0 -qos-policy-group blog-group                       
Vserver   Volume       Aggregate    State      Type       Size  Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
vs0       blog_volume  aggr1_node1  online     RW          1GB    365.8MB   64%

To see all volumes in all Vservers’ QoS policy groups:

dot82cm::> volume show -vserver * -qos-policy-group * -fields vserver,volume,qos-policy-group

vserver    volume qos-policy-group 
---------- ------ ---------------- 
vs0        blog_empty 
                  -                
vs0        blog_volume 
                  blog-group

To see a Vserver’s full configuration, including its QoS policy group:

dot82cm::> vserver show -vserver vs0

                                    Vserver: vs0
                               Vserver Type: data
                               Vserver UUID: c280658e-bd77-11e2-a567-123478563412
                                Root Volume: vs0_root
                                  Aggregate: aggr1_node2
                        Name Service Switch: file, nis, ldap
                        Name Mapping Switch: file, ldap
                                 NIS Domain: newengland.netapp.com
                 Root Volume Security Style: unix
                                LDAP Client: -
               Default Volume Language Code: C
                            Snapshot Policy: default
                                    Comment: 
                 Antivirus On-Access Policy: default
                               Quota Policy: default
                List of Aggregates Assigned: -
 Limit on Maximum Number of Volumes allowed: unlimited
                        Vserver Admin State: running
                          Allowed Protocols: nfs, cifs, ndmp
                       Disallowed Protocols: fcp, iscsi
            Is Vserver with Infinite Volume: false
                           QoS Policy Group: -

To get a list of all Vservers by their policy groups:

dot82cm::> vserver show -vserver * -qos-policy-group * -fields vserver,qos-policy-group 
vserver qos-policy-group 
------- ---------------- 
dot82cm -                
dot82cm-01 
        -                
dot82cm-02 
        -                
vs0     blog-group       
4 entries were displayed.

If you’re in a hurry and want to remove all instances of a policy from volumes in a particular Vserver:

dot82cm::> vol modify -vserver vs0 -volume * -qos-policy-group none

That should be enough to get us going. Stay tuned, because in the next episode I’ll show some video with iometer running!

Documentation:

Advertisements