NetApp Keystone – A deep dive into ONTAP AQoS and how they drive STaaS Service Levels

I’ve been spending some time getting spun up on Keystone, and subsequently went down the rabbit hole as far as I could go. Keystone is structured around service levels. And to properly understand those service levels it helps to understand the concept of IO Density, alongside how that is implemented with ONTAP Adaptive QoS. Join me as I go down said rabbit hole through the underlying concepts and hopefully pop up on the other end as a Keystone service level expert.

I/O Density

Starting things off with I/O density. NetApp defines it as the “number of input/output operations processed per second based on the total space that is being consumed by the workload.” Within Keystone “total space” is defined at TiB level. Thus I would simplify it as each TiB of allocated space comes with a demonstrable performance value.

To use a simple example, let’s say we have the I/O density of 100 IO per 1 TiB. If I have a 10 TiB volume, that gives me the expected performance of 1,000 IOPS. If I have a 50 TiB volume, that increases to 5,000 IOPS, a 250 TiB volume would be 25,000 IOPS, etc etc.

Going forward the nomenclature I’ll use for I/O density is IOPS per TiB of data, or IOPS/TiB.

Two quick extra points… NetApp might use MBps/TiB instead of IOPS/TiB, but I’ve seen the latter used more frequently during sizing and contracting. The second is that yes, I understand that every IO is different and having performance metrics without defining the basis for that IO is horrible practice. NetApp defines an IO within Keystone as “32KB block size, and a random combination of 70% read and 30% write IO operations.”

Adaptive Quality of Service (AQoS)

Next up on the concept docket is Adaptive QoS. First introduced in ONTAP 9.3, Adaptive QoS (which I’ll simplify to AQoS) lets users set performance minimums and maximums on a per-volume basis. Within Keystone* there are two core elements to an AQoS policy…

Expected IOPS
Peak IOPS

At first glance they’re pretty self explanatory, but there’s some nuance that needs to be defined.

Expected IOPS is based of the I/O density derived from the provisioned (logical) capacity of a volume. For example if your AQoS policy is set with Expected IOPS = 100, then for every TiB of provisioned capacity ONTAP 100 IOPS are allocated. A 50 TiB volume would then receive 5,000 IOPS.

Expected IOPS also serves as an effective performance floor. In the above example ONTAP will always seek to allocate 5,000 IOPS of available performance regardless of how much data is being consumed within said volume.

Peak IOPS on the other hand is the maximum number of IOPS that a volume can consume. Anything over that limit and ONTAP will throttle the workload. The intent behind this setting is to prevent runaway workloads from impacting the rest of the environment.

An example here would be setting the Peak to 200 IOPS. This would mean the ceiling for a 50 TiB volume would be 10,000 IOPS.

When you purchase Keystone NetApp sets up these various AQoS policies for you with sample commands provided in their documentation.

qos adaptive-policy-group create -policy-group <Keystone_performance> -vserver <SVM_name> -expected-iops 1024 -peak-iops 2048 -expected-iops-allocation allocated-space -peak-iops-allocation used-space -block-size 32K -absolute-min-iops 250

You can see where expected IOPS and peak IOPS are defined.

*There are other elements of AQoS which ONTAP administrators can change, such as the way IOPS are allocated. However those settings are fixed by default through Keystone so I’m not going to cover alternate uses here. There are also a ton of cool things you can do with AQoS to really fine tune performance availability that I’m reluctantly not getting into here.

**You’ll also see a setting for Absolute Minimum IOPS. Keystone uses this as a non-IO density based minimum threshold. While Keystone volumes are supposed to have a minimum of 1 TiB, I suppose there’s nothing stopping a user from creating one smaller, and I suppose the minimum is defined just so you don’t bottom with some unusable IP number.

Service Levels

Now for the meat and potatoes, combining I/O Density and AQoS into Keystone service levels. Keystone, as a service offering, uses defined performance levels to translate storage needs into storage as a service. These performance levels use I/O density to define the available performance capacity of Keystone, which are then implemented by the Adaptive Quality of Services functionality of ONTAP.

I hope now you understand why I took so long laying out the foundation. It took me a bit to wrap my head around everything, so don’t worry I’ll have some examples in a bit to help.

Let’s take a look at the different service levels provided through Keystone…

Service Level	Extreme	Premium	Performance	Standard	Value
Targeted Workloads	Analytics, databases, mission-critical apps	VDI, VSI, software development	OLTP, OLAP, containers, software development	File shares, web servers	Backup
Expected IOPS	12,288	4,096	2,048	512	128
Peak IOPS	6,144	2,048	1,024	256	64
Maximum MBps	384	128	64	16	4
Target 90th Percentile Latency	<1 ms	<2 ms	<4 ms	<4 ms	<17 ms

*S3 object storage (StorageGRID), and cloud storage (CVO) do not have any defined performance levels within Keystone and are just consumed at a straight $ per TiB.

For our first example, let’s say we create a 100 TiB volume within the Performance service level QoS policy.

100 TiB (provisioned capacity) * 1024 (Expected IOPS) = 102,400 IOPS

This the expected performance floor that ONTAP will attempt to maintain for that volume.

What if only 25 TiB of that volume is consumed?

25 TiB (used capacity) * 2048 (Peak IOPS) = 51,200 Peak IOPS

In this situation the peak IOPS is lower than the expected IOPS, thus it becomes irrelevant. However once you get past 50% of utilization the peak number comes into play…

75 TiB (used capacity) * 2048 (Peak IOPS) = 153,600 Peak IOPS

If my math is right, with Keystone’s service levels, up to 50% of volume capacity the Expected IOPS is the defining performance metric. Over 50% capacity the Peak IOPS metric comes into provide a limiter. It ends up looking something like this…

Now What

Three pages of text into this and you’re probably wondering what’s the practical impact of this information. Well as you work with your partner/NetApp to size out a Keystone solution your primary focus should be determining an appropriate service level for each of your volumes.

The next step is planning your volume configuration. As the allocated capacity and logical used capacity numbers determine available performance for the volume different configurations will provide different results.

Let’s map out the examples from before. 100 TiB volume, starting with 25 TiB logical used capacity, and the Performance service level.

At 25 TiB, the AQoS policies calculate to 102,400 expected IOPS and 51,200 peak IOPS… thus the max is 102,400 IOPS.

At 75 TiB, the AQoS policies calculate to 102,400 expected IOPS and 153,600 peak IOPS… thus the max is 153,600 IOPS.

Now what if you over provision the volume by taking it from 100 TiB to 200 TiB?

At 25 TiB, the AQoS policies calculate to 204,800 expected IOPS and 50,700 peak IOPS… thus the max is 204,800 IOPS.

At 75 TiB, the AQoS policies calculate to 204,800 expected IOPS and 153,600 peak IOPS… thus the max is 204,800.

Increasing the allocated volume capacity you create a higher expected IOPS floor. Double or tripling the allocation scales the performance availability.

But what does this mean from a practical perspective? If you have a volume that needs a specific performance threshold you can provide larger volume size for more IOPS. Remember that Keystone is charged on logical used capacity, and over subscribing volume sizes doesn’t impact overall array utilization thanks to the magic of thin provisioning.

But Wait, There’s More!

The latency objectives defined in the service levels are based on 90th percentile measurements… in other words 90% of IOPS fall within the defined limit. Throughout the day ONTAP samples volume latency every 5 minutes. Over the course of the day those numbers are combined to form a full picture of the volume latency, and in theory, at least 90% of the IOPS are within the limit. Anything above that constitutes degraded performance.

It’s also worth noting that latency is only monitored with respect to the on-system latency. Latency derived by factors outside of the array, such as the network or application issues, don’t reflect in Keystone metric reporting.

Additional Resources

NetApp Keystone documentation
- Most of this article is based on that documentation, but pulled from different pages across the whole of the documentation
NetApp Keystone STaaS Service Description
NetApp KB What is Adaptive QoS and how does it work?

Publication History

August 11, 2024 – First Draft
August 26, 2024 – More than a few updates later…
August 27, 2024 – Finally public!