Why cache is better than tiering for performance

When it comes to adjusting (either increasing or decreasing) storage performance, two approaches are common: caching and tiering. Caching refers to a process whereby commonly-accessed data gets copied into the storage controller’s high-speed, solid-state cache. Therefore, a client request for cached data never needs to hit the storage’s disk arrays at all; it is simply served right out of the cache. As you can imagine this is very, very fast.

Tiering, in contrast, refers to the movement of a data set from one set of disk media to another; be it from slower to faster disks (for high-performance, high-importance data) or from faster to slower disks (for low-performance, low-importance data). For example, you may have your high-performance data living on a SSD, FC or SAS disk array, and your low-performance data may only require the IOps that can be provided by relatively low-performance SATA disks.

Both solutions have pros and cons. Cache is typically less configurable by the user, as the cache’s operation will be managed by the storage controller. It is considerably faster, as the cache will live on the bus — it won’t need to traverse the disk subsystem (via SAS, FCAL etc.) to get there, nor will it have to compete with other I/O along the disk subsystem(s). But, it’s also more expensive: high-grade, high-performance solid-state cache memory is more costly than SSD disks. Last but not least, the cache needs to “warm up” in order to be effective — though in the real world this does not take long at all!

Tiering’s main advantages are that it is more easily tunable by the customer. However, all is not simple: in complex environments, tuning the tiering may literally be too complex to bother with. Also, manual tiering relies on you being able to predict the needs of your storage, and adjust tiering automatically: how do you know tomorrow which business application will require the hardest hit? Again, in complex environments, this relatively simply question may be decidedly difficult to answer. On the positive side, tiering offers more flexibility in terms of where you put your data. Cache is cache, regardless of environment; data is either on the cache or it’s not. On the other hand, tiering lets you take advantage of more types of storage: SSD or FC or SAS or SATA, depending on your business needs.

But if you’re tiering for performance (which is the focus of this blog post), then you have to deal with one big issue: the very act of tiering increases the load on your storage system! Tiering actually creates latency as it occurs: in order to move the data from one storage tier to another, we are literally creating IOps on the storage back-end in order to accomplish the performance increase! That is, in order to get higher performance, we’re actually hurting performance in the meantime (i.e., while the data is moving around.)

In stark contrast, caching reduces latency and increases throughput as it happens. This is because the data doesn’t really move: the first time data is requested, a cache request is made (and misses — it’s not in the cache yet) and the data is served from disk. On it’s way to the customer, though, the data will stay in the cache for a while. If it’s requested again, another cache request is made (and hits — the data is already in the cache) and the data is served from cache. And it’s served fast!

(It’s worthwhile to note that NetApp’s cache solutions actually offer more than “simple” caching: we can even cache things like file/block metadata. And customers can tune their cache to behave how they want it.)

Below is a graph from a customer’s benchmark. It was a large SQL Server read, but what is particularly interesting is the behavior of the of the graph: throughput (in red) goes up while latency (in blue) actually drops!

If you were seeking a performance augmentation via tiering, there would have been two different possibilities. If your data was already tiered, throughput will go up while latency will remain the same. If your data wasn’t already tiered, throughput will decrease as latencies will increase as the data is tiered; only after the tiering is completed will you actually see an increase in throughput.

For gaining performance in your storage system, caching is simply better than tiering.

Is VMware forcing the homogenization of enterprise storage?

Stephen Foskett wrote this article about possible futures of enterprise storage; in turn it was further examined by Scott Lowe here.

Stephen’s article posits this: is VMware forcing the homogenization of enterprise storage? (If it is, will that be at the expense of it’s corporate parent, EMC?) Is VMware’s progress in technologies like VMFS, snapshots etc. moving customers to make different purchasing decisions, away from VMware? My answer is: no. At least, not in a significant fashion.

In order for VMware’s storage offerings to prosper, they need to be able to supplant existing storage vendors that are not offering significant value-adds. Those can come in many forms, though: be they great hardware offerings, great software offerings, or perhaps even great price offerings. The companies that don’t offer anything strategic or unique are likely to go by the wayside because those are the companies whose products VMware can “replace” with its own offerings.

It’s important, then, to keep in mind the key differentiators of various storage vendors. If they have a product or a solution that fills a particular niche better than anyone else, they’re likely to survive (even against VMware as its own software vendor). However — like any performer in any market — if you make a poor product in a competitive industry, you’re going to be replaced. The only thing that makes VMware’s offerings uniquely potent is that they’re sometimes able to better-leverage the vSphere platform because they literally wrote the APIs that they’re hooking in to.