VVols Matter, Just Maybe Not To Every Storage Product

There were many interesting takeaways from VMworld 2016 in Vegas, one of which was the continuing love-hate relationship that storage vendors have with VVols. The storage management technology, which shipped with vSphere 6.0, has been in the public spotlight for so long (remember the first tech previews back in 2012?) that it’s sometimes hard to even remember what it was supposed to do for VMware customers! One of the most ridiculous claims I heard from two different companies was that VVols was designed to make block storage look like NFS, so there wasn’t really any reason to use VVols if you had a good NFS product. No, I’m not going to mention the vendor names, but yes, those two.

There are three things I want to cover here. First, let’s look at what VVols is and what it isn’t. Then, we’ll look at the perception that customers aren’t interested in VVols, and that adoption is slow. Finally, we’ll look at VVols and NFS and finally put that canard to rest.

Reality-Check2If you want a very technical deep-dive into the architecture of VVols, there are lots of great resources out there, including this great blog by Andy Banta and this overview by Cormac Hogan.  I don’t want to go down that path again, mostly since I can’t do it as well as either of those gentlemen. What I want to make clear is this: VVOLs isn’t a storage technology. It doesn’t make slow storage faster. It doesn’t give storage features to arrays that don’t already have them. It doesn’t make storage cheaper, or help you use less of it. VVOLs is a storage management platform that is designed to marry the storage requirements of VMs as closely as possible to the operational cadence of the VMware platform administrators that handle the day-to-day tasks. It was conceived as a way to be able to use policies to manage how storage is utilized instead of having to rely on deep knowledge of the storage infrastructure to handle placement, remediation and service level management. It’s goal is to allow administrators to apply policy at the smallest possible storage construct (the individual VMDK, or virtual disk), yet not make that granular management more of a burden on already busy staff.

Today, VVols is available in vSphere 6.0 with the VASA 2.0 spec on the management side of the solution. The VASA 3.0 spec has been released to partners, and we expect to see it released with the next version of vSphere. VMware’s own SDS storage product, VSAN, is managed using VASA and the same storage policy that VVols uses, so the idea that VVols isn’t being adopted, or that there isn’t any interest from customers is misleading. VMware administrators will use ANY product that they think better aligns their day-to-day existence, especially if it puts them in control of technology that is relevant, especially if it’s tech that used to be siloed off.

If we take as fact that VMware is going to continue down the VASA path with storage, and that VMware administrators and users are going to demand an end to the many-to-one construct that is the VMFS-formatted datastore, then why do we keep hearing “no one is interested in VVols” from the ecosystem? I think there are a couple reasons for this. First, enterprise VMware shops have gotten more risk-averse as they have virtualized their tier 1 production applications, and upgrading to a x.0 release of vSphere isn’t something that is done lightly. To compound this issue, there’s been a general perception that VMware QA on new releases has gotten progressively worse, and most customers I know won’t look at a new feature, especially one that sits in the data path, until at least the second release. For VVols and VASA 2.0/3.0, this will be the next vSphere release, and my suspicion is that we’ll see a lot more uptake and interest at that time.

I Wouldn't Do ThatSecond, the VASA 2.0 spec isn’t very complete. There are some big things (replication, for instance, or SRM support), that aren’t available yet, and for which VVols would be a natural fit. Despite the 2.0 tag, this version of VASA is a completely new construct and there’s a lot of work left to do. This will certainly improve over time, but if VMware can’t convince customers that they can QA a new product well enough to make them comfortable deploying it into production, the lag of adoption will continue.

Finally, putting it gently, most of the storage vendors have shitty implementations, if they bother to support VVols at all. It’s not their fault, of course, since none of us have really demanded that storage vendors modernize their platforms at all over the last 20 years, but saying that a legacy architecture struggles in a VVols environment is putting it mildly. When I say they struggle, what do I mean? Well, let’s look at the relevant components of a VVols implementation:

  • VASA Provider – This is the management engine. Even though it’s out-of-band, it’s critically important to make sure it’s always available. Why? Because without it, pesky things like powering on VMs becomes impossible. VMware let the storage vendors decide how to implement the VASA provider, and while some (like SolidFire) have built the VASA provider into the controllers themselves, giving it the benefit of failover protection, most have deployed the VASA provider as a VM, and have left the protection of it up to the customer. I want to be clear: if your storage provider is deploying VASA as a VM, it’s a shit implementation and I wouldn’t put it into production. It’s lazy, and will end up putting you in a bad spot.
  • Protocol Endpoint – This is a logical IO proxy for the connected hosts. In block arrays, it’s a LUN that is presented. How much IO can your array handle over a single LUN? VMware has allowed storage vendors to decide how to deploy PEs, but each vendor is constrained by its architecture. If you have a scale out design, it’s pretty logical to deploy (at least) one PE per node. But what if you have a traditional dual controller design? How do you determine how many PEs to have, where to place them, how to size it to account for controller failure and how to balance the load across them?
  • Storage Container – The storage container is the pool that VVols are created in. It’s not a LUN, but an aggregation of storage capacity. The question is, what does the storage container correspond to on the array? Is it locked to a physical construct, like an aggregate or a disk pool? Or is it ephemeral and connected to a storage cluster, able to scale as the array scales? SolidFire certainly chose the latter option, but then again our architecture allows that to happen easily. Look at this workflow from the EMC Unity documentation and ask yourself, where is the simplicity? What happened to putting things into the context of the VMware administrator? If you look at the first step “Create storage pools” it includes steps like determining array licensing levels, choosing disk tiers and RAID levels. This isn’t how VVols is supposed to be. Storage containers are logical constructs, and are used to create separate security or reporting zones, not a physical construct that overlays a traditional, inflexible, disk-driven storage concept.
    2016-09-09_18-35-45
  • Storage Policy Based Management – This is the real magic of VVols. The ability to create policies that are relevant to your business, or to your processes, and use those to drive process that used to be manual or static; processes like initial placement, service levels and ensuring that VMs are on storage that provides the relevant features. The trick here is that when administrators have the ability to create and apply policy as quickly as their business requirements change, holding them back from using those policies because it requires them to physically move blocks of storage and the VMDKs they comprise from disk tier to disk tier is a cruel joke. Having the policies is step one. Having a storage architecture that can make them powerful tools for VMware administrators is the hard part. If your vendor makes you move data to change storage policies, it’s a shitty implementation.
  • Virtual Volumes – The most granular point of management that VMware has ever given customers to work with. There are virtual disks, virtual files and all the other storage objects that relate to a virtual machine. This is the actual thing that gets attached to the VM. This is what customers want to allocate capacity and performance to. At a basic level, this is the part that overwhelms a traditional storage array. In all of my years in the storage ecosystem as a customer and vendor, I got used to having arrays support dozens of volumes, maybe hundreds in an extreme case. With VVols, a single VM can have 3-7 VVols, and now you have to support thousands or even tens of thousands of volumes on those same controllers. As storage vendors realized this, we started to see support for VVols fall off. EMC, NetApp, HDS and others have all chosen to punt support into new or future versions of hardware and code that have a better chance of keeping up. If your vendor takes this awesome ability to create a virtual disk that is free of all physical attachments and puts it on the same old disk group, using the same old disk, and the same old tiering, that’s a shitty implementation. SolidFire creates a separate, individual volume for EVERY VVol, which means we can apply our unique min/max/burst QoS individually to every VVols, independent of type. We can snapshot, clone and replicate individual VVols, or coordinated groups of them. We can report on different logical tenants separately and show performance and efficiency telemetry. This is exactly the use case that SolidFire was designed for, and we’ve been doing it at scale in the OpenStack ecosystem for years.

Finally, there’s this idea that customers who are using NFS don’t need VVols because they already have VMDKs as individual objects. Mostly, I hear this from storage vendors who have realized that they don’t have an architecture that can handle VVols, and so they dismiss the things that VVols are actually providing. Even if we take the unique features that SolidFire offers, there are some HUGE gaps between NFS and VVols. The first and most obvious is the policy driven nature of the entire platform. How do you let VMware admins decide what service levels to offer, and then give them the ability to audit and remediate those inside their environments? How do you separate out the control and data plane, protecting each separately? How do you implement all of the VAAI features that block storage has had for years, without needing to resort to another vCenter plugin? How do you distribute traffic evenly across all of the nodes/controllers evenly and scale that intelligently? For all of its ability to be “VM-Aware” because it sees each VMDK as an individual file (no shit), NFS doesn’t provide much of what VVols can. Don’t believe me? Here’s the internal blog post from NetApp in 2014, when they were full steam ahead with VVols support. Today, NetApp is focusing all VVols development on the SolidFire platform. Why? Not because NFS is “good enough” that VMware customers don’t need VVols, but because architecture really, really matters.

Not How It WorksCustomers are DYING for a good VVols implementation. The problem is that outside of VSAN and SolidFire, who else actually has one that allows them to take full advantage of what VVols can give them? Don’t let storage vendors blame that on VMware. It’s disingenuous at best, and dishonest at worst. If a vendor deploys VASA in a VM, limits a storage container to a fixed group of disks, doesn’t manage protocol endpoint layouts intelligently, requires data movement to implement or change policy and can’t support the same number of VMs that they could on static datastores, and then makes the comment that they aren’t seeing any demand for VVols from their customer base, you are being misled. If you want to know more about what it looks like when VVols are done right, reach out and let me know and I’ll show you a demo of how we’ve implemented them on the SolidFire platform.