The Untimely Demise of Software Defined Storage?

A DellIt’s been an interesting couple of days in storage-land, that’s for sure. On Thursday, El Reg released a report that confirmed what seemed like a crazy rumor: DellEMC was killing off the software-only version of ScaleIO, their only node-based, shared-nothing block storage platform.

Now, with hindsight, the writing may have been on the wall for a while. It’s no secret that no matter how many glowing reviews the platform got from the public, EMC struggled to find a niche for it. Because it wasn’t sold with hardware, the total amount of revenue driven was probably smaller than most deals, and neither the company nor its sales team seemed to give it much attention. Five years after its acquisition of the product, and 7 years since its inception, there’s still been less than a handful of updates to the product, and the long promised “ScaleIO.Next” that was touted all over the place in early 2017 still hasn’t materialized. There’s a DellEMC community blog for the product that hasn’t been updated since August of 2016, and a couple threads of people asking over and over again when the next version of the software will be available. There’s no compression, no deduplication, no VVols support, and no native replication of any kind at the moment.

So, is ScaleIO dead? I don’t think so. Competitively it’s the only thing that plays (from an architectural standpoint) in the same ballpark as SolidFire and other node-based products, and we see lots of interest in that architecture from customers. For organizations that have moved on from the legacy dual controller, shared shelf model of storage, there are only so many products available on the market. ScaleIO is good tech, in my opinion, and is viable in the market, especially if the mythical v3.0 can add some of the basic data services that storage arrays have to provide in 2018. Unfortunately, being part of the Dell server organization isn’t going to do the product any favors from a development standpoint, and if I were a customer I’d be asking some hard questions of my sales team. VMware seems to have more and more of the upper-hand within the Dell Technologies universe, and with the relative success of vSAN and with some of the friction between the ScaleIO team and VMware lingering because of the forced inclusion of ScaleIO into the ESXi kernel, some clarity needs to be forthcoming. There are probably more details that matter, but those have been covered far better by the likes of Chris Evans and others.

All of that aside (and I work for a DellEMC competitor, so please take the preceding paragraphs with as much salt as you’d like), there’s a bigger question here about the intersection of HCI, Software Defined Storage, and the current storage industry. What’s going on? Is there something important being said when a storage product that has a significant number of enterprise-class customers gets pushed into an HCI-only storage model running on specific hardware?

words-matterFirst, I think we need to put a clear line between “software defined storage” and “software only storage” because they are two very, very different things. Software defined storage, to me, is storage software that runs on commodity hardware, not custom hardware built to suit. ScaleIO, SolidFire, vSAN and others fit this model, whether they are called “appliances” or “ready-nodes”. Software Only, on the other hand, is a very different consumption model, and puts a different kind of strain on the vendors. ScaleIO, vSAN, Ceph and others are available as a straight software license, which can be installed on customer provided hardware. In my opinion, standard SDS products should be seamlessly available in either model, and the best of them should be interchangeable and have feature parity.

The Software Only model is, frankly, a pain in the ass for everyone involved. You are taking something that is very mission critical (there is no bad day like a storage bad day), very inter-woven with the hardware below it, and very complex, and releasing it to customers to install on hardware of their choosing. Most companies will provide a hardware compatibility list to make sure there are guardrails, but even there you have some pretty
well documented HCL failures that impacted customers.

In my experience there are very few customers who are actually capable of managing a supply chain efficiently enough to see a net cost savings by going software only. There are definitely customers who have existing hardware they want to reuse, and customers who will pay more for a specific brand of server hardware to keep the operational costs of managing the data center down, but I can count on one hand the number of global, Fortune 50 customers who are interested and capable of keeping up with the demands of hardware at scale.

If managing a hardware supply chain is hard, and validating and supporting customer hardware is hard, and designing software that is predictable and performant regardless of hardware platform is hard, why does anyone bother?

ceph-failedmelasttimeUsually, software only SDS solutions occupy the outer limits of storage buyers: the very, very rich, and the very, very cost averse. Large companies buy tons and tons of software only because they already manage their hardware supply chain, and procuring yet another software product to deploy is pretty straight-forward. On the other side of the coin, products like Ceph, Nexenta and others live in places where budgets are tight, or a build-it-yourself mentality and risk tolerance for open source storage exists. Nexenta has been in the game since 2005, and in the 13 years since has generated a cumulative $100m in revenue. For the sake of contrast, NetApp generated 55x more revenue than that….in 2016 alone. Ceph is another product that seems to inhabit the “we had a couple old servers laying around, how can we pool the storage?” space. Don’t get me wrong, there are definitely enterprise customers who are running large, high performance environments on those products, but that’s not the standard use case. I met with a data collection company in Florida who had the most impressive all-flash Ceph setup I’d ever seen. Multiple availability zones, running with no support contract at all, managed in-house using a hardware budget that approached $10M a year. The storage was free, but the storage wasn’t cheap, if you get my meaning. Red Hat has done a great job (as they usually do) of providing the support needed to get enterprise customers to trust open source software, but it’s ironic that the vast majority of Ceph deployments I’ve interacted with at customers are using no support contract at all.

SolidFire added a couple additional wrinkles to the game. On one hand, the product did some things that were both new and very beneficial to customers. Creating a platform that was flash-only, using the data path to optimize commercial SSDs, selling nodes with a fixed amount of IOPS each, and guaranteeing performance to individual volumes were all things that customers really, really liked about the product, but they made “software only” hard. Sure, even the SolidFire branded nodes were simply off-the-shelf Dell nodes, but each model was engineered to have a specific performance profile, which made them predictable, supportable and able to be mixed and matched. How do you move that to a software only model? Do you individually qualify every bill of materials a customer wants to use, and commit to an amount of QoS you are willing to guarantee? Do you create validated bundles with specific vendors? Do you release an Intel-based reference design and then let customers deploy those templates from whatever vendor they choose? Do you go the route of a standard HCL and create a benchmarking tool that lets customers “score” their build to see how many IOPS it’s capable of? Depending on the customer you ask, the answer is generally “all of the above” which is hard to justify.

SolidfireAnd regardless of the path you choose as a storage vendor, who are you actually building it for? Is there enough market to serve to justify the significant additional costs associated with this deployment model, and with the associated roadmap items that are needed to make it viable over time? Can you actually make money? My guess is that this ended up being a significant part of the internal discussion about ScaleIO, and in the end it got rolled into an HCI platform that won’t cater to the same customers that were buying the SDS product. DellEMC has decided it’s not worth the standalone, software-only development effort, right or wrong. If I were a customer, I’d be asking hard questions from any hardware vendor offering SDS with a software only model. What’s the plan, long term? What does the committed roadmap look like? How does it fit into the larger business? What’s the strategy for moving to new hardware platforms and form factors? How does the product integrate into the rest of the portfolio?

Personally, I think that being able to deliver a node-based, software-only product is a critical part of being able to help customers get through the transition that involves putting dual controller architectures in their past. If we, as an industry, are going to drive the concepts of transparency, programmability, simplicity, scale, and automation into storage, we must put our product roadmaps where our mouths are. Putting faster disk interconnects and fibre channel ports with more bandwidth in place is simply kicking the technical can down the road. The architectures that are the most natural fit for SDS and software-only consumption models, and especially the ones that can make the transition from appliance to software without compromising the performance and feature set along the way, are the ones customers are most interested in as their operational postures mature.

happy mistakesTo make it clearer, I think DellEMC is making a huge mistake here, regardless of any short-term concerns with market positioning or product mix. ScaleIO is the only storage product they have that is interesting (IMO) from a scale, architecture and operational standpoint, and they should be putting more resources into the feature roadmap, and enabling it to be used more easily in a software-only model. By throwing it to the PowerEdge team, and making it available only on Dell hardware and only as part of an HCI package, they are not only hurting the product, they are alienating the customers who could easily fund that development with their revenue going forward.

I don’t mean to pick on DellEMC; anyone else who takes a kick-ass, node-based, shared-nothing, scale-out storage platform and decides to hide it behind hardware and/or a packaging exercise would get the same advice from me. Free up your customers to focus on the things that are important to them and be able to deliver to them the same product, with the same features, across whatever consumption model makes sense for them. Vendors shouldn’t be in the business of deciding for customers what’s in their best interest. We should be making the best possible software products we can, and delivering them in any way they are needed.