Why efficiency matters: NetApp versus Isilon for scale-out

Disclaimer: I am a NetApp employee. The views I express are my own.

As the marketplace for scale-out offerings heats up, it’s interesting to see the different approaches that different companies take with their product development. The two leaders in scale-out NAS & SAN are NetApp and Isilon, and they both take rather different approaches to scale-out technology when it comes to performance. In this post, I will attempt to quantify the result of those differences: how NetApp does more, with less, than a comparable Isilon offering.

In terms of reference material, I’ll be drawing from the SPECsfs2008_nfs.v3 submissions from both NetApp and Isilon. NetApp’s 4-node FAS6240 submission is here, and Isilon’s 28-node S200 submission is here. First things first: I picked the two submissions that most closely matched one another in terms of performance. As you can see from the sfs2008 overview, there are a lot of submissions to choose from. I chose NetApp’s smallest published cluster-mode offering, and then looked for an Isilon submission that was roughly equivalent.

As part of this exercise, I put together list prices based on data taken from here (NetApp) and here (Isilon). I chose list prices because there is no standard discount rate from one vendor, or one customer, to another. If you have an updated list price sheet for either vendor, please let me know. Here are the results:

NetApp

  • 260,388 IOps
  • $1,086,808 list
  • 288 disks for 96TB usable

Isilon

  • 230,782 IOps
  • $1,611,932 list
  • 672 disks for 172TB usable

Doing some basic math, that’s $4.17 per IOp for NetApp and $6.98 per IOp for Isilon.

And to get a roughly equivalent level of performance, Isilon needs more than twice as many disks to do so! That brings us full-circle to the point about efficiency. NetApp does more with less disk, which means we get significantly more performance per disk than Isilon does:

“But wait”, I hear you cry, “NetApp has FlashCache! That’s cheating because you’re driving up the IOps of the disks via cache!” It’s true – the submission did include FlashCache. 2TB in total; 512GB in each of the four FAS6240 controllers. Isilon’s submission had solid-state media of their own; 5.6TB in total from a 200GB SSD in each of the 28 nodes.

“But wait”, I hear you cry, “RAID-DP has all of that overhead! WAFL, snap reserve space; you must be short-stroking to get those numbers!” Wrong again – we’re using more space than the competition.

In order to meet roughly the same performance goal, Isilon needed to provision almost twice as much space as an equivalent NetApp offering. That’s hardly storage efficient. It’s not cost-efficient, either, because you have to buy all those spindles to reach a performance goal even if you don’t have that much data you need to run at that speed.

“But wait”, I hear you cry, “those NetApp boxes are huge! They must be chock-full of RAM. And CPUs. And NVRAM too!” True again; each NetApp controller has 48GB of RAM for a total of 192GB. By contrast, Isilon has 1344GB of RAM. Isilon does have slightly less NVRAM (14GB) compared to NetApp (16GB).

“But wait”, I hear you cry, “NetApp requires 10Gb Ethernet for that performance!”. Yes, and so does Isilon. Let’s see how efficient we get not only from the nodes themselves, but also from the load-generating clients, too:

Although 10Gb Ethernet switch ports are coming down in price, they’re still not particularly cheap. And look at the client throughputs: Isilon struggled to get more than 10,000 IOps from each client, which means you have to scale out your client architecture as well. Which, of course, means more money.

“But wait”, I hear you cry, “NetApp is still going to use more power and space with their big boxes!” Not true. Here are the environmental specs for both:

Isilon did use less rack space (56RU) than NetApp (64RU). The environmental data were taken from here (Isilon) and here (NetApp).

Every single graph pictured above was compiled with data taken only from the three sources listed: the SPECSFS submissions themselves (found via Google), the list-price lists (found via Google) and the environmental submissions (found via Google). I will gladly provide the .xls file from which I generated the graphs if anyone’s interested.

Thoughts and comments are welcome!

Advertisements

16 thoughts on “Why efficiency matters: NetApp versus Isilon for scale-out

  1. Why does the cluster of 4 6240s show ~ a 30% performance hit for the same specsfs benchmarks as a stand alone 6240? Why would one waste 30% of their H/W spend for no change in operational features?

    Reply
    • Hi all, Dimitris from NetApp here.

      @ the mysterious Dr. Scale-out:

      It’s a good thing for the other scale-out vendors that they don’t have stand-alone nodes to show in benchmarks I guess 🙂

      Any clustering implementation exacts some sort of performance penalty.

      Read my analysis here:

      http://bit.ly/K2FBz1

      We made the cluster accesses use indirect unoptimized paths on purpose in order to show worst-case scenario and be realistic.

      In effect, nearly all the I/O went over the cluster interconnects, instead of going directly out of the nodes that owned the data.

      Still kicked butt.

      If we hadn’t gone the unoptimized route people would then complain we cheated and the FUD cycle continues.

      D

      Reply
  2. Why was only half of the available capacity exported for use? Doesn’t this drive up the cost per useable TB? Is this how customers are expected to use the products as well? Is it worth getting 30% less performance and 50% less capacity utilization for the CAPEX spend on each filer when using NetApp for scale-out?

    Reply
    • Only half of the capacity was used because that was all that was needed to reach the IOps number provided in the benchmark. I don’t know how familiar you are with the SPECSFS benchmark, but: for every IOp generated, there is a corresponding amount of data written to disk. Hence Isilon used less disk space than NetApp did, because Isilon produced fewer IOps than NetApp.

      Also, I don’t know if you’re representing a vendor or a VAR, but in the interest of transparency please say so if you to have ties to one.

      Reply
  3. Hmmm . . . Netapp required 8 file systems/volumes (4 data/4 OS). Isilon used 1 file system/1 volume.

    How do you balance performance and capacity across 4 volumes and aggregates located on 4 different filers?

    As in ONTAP 7-mode, a file can only reside within a single volume, locked within a single aggregate, locked on a single storage controller. Admins are still required to manage file placement within the controllers’ aggregates, and monitor the space and performance utilization of each of these 4 discrete volumes. In a real-world customer environment, there will certainly be more aggregates and many more volumes, amplifying the management tasks.

    C-mode simply clusters scale-up filers and along with them, all of their constraints and operational inefficiencies. Really, how does this benefit customers requiring scale-Out?

    Reply
    • Yes, Isilon did require one data filesystem whereas NetApp required 4. I’m not counting the root volume, because that’s meaningless – there’s simply nothing to manage there, and its separation is beneficial in every other aspect.

      Admins are required to manage aggregate and volume placement, but not file placement. I’m not sure how familiar you are with OnCommand System Manager, but managing aggregates and volumes is hardly taxing. Indeed, the ability to manage both actually lets us throw away previous constraints — for example, moving volumes from one controller to another (hot) allows for the controller to be removed, upgraded or even retired.

      Which other constraints do you think still exist? Part of the reason cluster-mode is such a milestone for NetApp is because so many of the previous constraints are now gone. If those constraints were still around, it would have been released a lot sooner 🙂

      Reply
      • “Admins are required to manage aggregate and volume placement, but not file placement. ”

        If you are not concerned with file placement…you are a joke of a storage admin. The burden cannot fall on the user and / or application alone to deal with data management and expected performance.
        If you are not a part of the process of building a complete working system…why are you involved in anything at all…storage admins have to be more concerned than with storage management.
        Filesystems and expectations of those filesystems with regards to data management and performance within that storage environment are important as well….

      • Hi Charles, thanks for posting. There is a difference between “required to manage” and “it’s good to manage”, and the statement was supposed to convey the lack of requirement.

        That said, I would argue that (on NetApp) file placement isn’t as important as file management. For example, it doesn’t matter so much if you put a file in /vol/foo or /vol/foo/bar, but it does matter if /vol/foo (or /vol/foo/bar) contains 15 million files.

        I wholeheartedly agree with your last sentence & couldn’t have said it better myself.

  4. Wow! 48TBs from a cluster than can conceivably scale to 8PB of capacity. Found a benchmark sweet spot, did we? Talk about a waste of capex.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s