Old Complexity + New Complexity = Progress?

Martin Glassborow posted a thought provoking piece on his blog a couple weeks back talking about how the intelligence native to infrastructure seems to be working it’s way up from the individual components.  Specifically he references the EMC acquisition of ScaleIO, and how storage seems to be trending towards a DAS+software model.  He was also good enough to engage in some back and forth with me on Twitter, and it got me thinking enough to put this post together.

My fundamental problem is that so many vendors are working hard to give customers the opportunity to solve the wrong problems.  The only thing that matters, in the end, is the availability and performance of ALL the apps required to make an enterprise business successful.  Full stop.  Those are the variables we are all solving for, and everything on the other side of the equal sign should add up to providing as much of those two things as possible.

EACartoon_2

And the complexity is that there’s a lot of pieces on the other side of the equal sign.  Acquisition cost is one, but it’s a relatively minor one over the lifecycle of an infrastructure stack.  Environmental aspects are one, with space and cooling being increasingly important.  But those are far from the only ones…

  1. Performance IconPerformance: This isn’t a one size fits all, or even two sizes fit all conversation.  In a typical enterprise there can be dozens or even hundreds of applications, all with their own performance profiles.  CPU, RAM, I/O, capacity, latency, scalability…all of these things are needed in different measures, to support different SLAs on the same platform at the same time.  Can your infrastructure provide not only the resources needed in the proper combinations, can it provide assurance (QoS) between those tenants?
  2. Availability IconAvailability: Part of this is just “do you have tolerance to component failures” and honestly, most converged infrastructure stacks have that part taken care of, or at least give customers the ability to purchase such a configuration (that said, please don’t buy this FlexPod). The other parts are more complicated: how do you operate the infrastructure?  Can you operate it with context or just as a collection of components?  Has your provider given you any tools to integrate the operation of the infrastructure into the tools you have already invested in?  How hard it is to troubleshoot a problem?  How much support does your infrastructure provider give you when there’s trouble?  How many partners are there in the ecosystem to help when you need it?  How much confidence do you have in the vendor(s) you are working with, and will they be around in 3, 5, 7 years?
  3. Cost IconOperational Costs: Same-site scalability has been largely removed as an obstacle at this point, at least from a technology standpoint.  In years past, adding more capacity meant adding another storage array, or another set of individual servers, and that meant increasing the number of administrative touch points.  A combination of scale out technology (think Isilon) and better instrumentation (think multi-device aware Unisphere or UCSM) have helped customers reduce the need to scale staff as they scale capacity at a particular site.  But what about multi-site?  Can your instrumentation handle having dozens of “stacks” spread around multiple regions, time zones or continents?  Can you manage each of those stacks in context of not just the applications running on them, but their relationship with the other stacks, and with the business needs that they are fulfilling?  Can you maintain operational efficiency as you scale both horizontally and vertically?  Can your infrastructure be managed horizontally, in parallel with the business tasks it is performing, or does it perpetuate the silos that have been so destructive to enterprise IT?
  4. Risk IconRisk Mitigation: Although I’ve listed this one last, it may be the most critical of all.  Each of the preceding three variables are colored by the amount of risk that a company is willing to assume.  I know of financial trading companies who will do in-house re-writes of HP server BIOS firmware in order to get better latency timings on their trading floor systems.  Yes, they are assuming an incredible amount of risk in doing this, but the business outcome justifies it.  On the other end of the spectrum there are SME companies that simply don’t have the money to spend on enterprise-class hardware.  Trevor Pott lives and breathes this sector, and works hard to illuminate the gap between SME and Enterprise.  He wrote a great article for The Register on the core dichotomy and disconnect, based on the needs of the SME customer.  They choose to go with products that are cheaper to acquire, understanding that the lack of maturity, the lack of transparency and the startup nature of the companies they are buying from is introducing risk, but these customers are willing to assume that while they franticly look to move their applications into the public cloud.  But for the standard enterprise, risk is something you work hard to minimize.
  • Implementation timelines are a risk; does your vendor minimize them?  Availability of components is a risk; does your vendor maintain their own stock, or work JIT with a manufacturer? 
  • QA is a risk; does your vendor manufacture your stack based on a tested physical and logical configuration, or are you left to figure out what to do with the boxes on your own? 
  • Interoperability is a risk; does your vendor provide the entire stack, or are you left to fend for yourself on the pieces they don’t offer, or when the pieces they do offer aren’t confidence-inspiring? 
  • Upgrades are a risk; does your vendor give you regular, tested, certified upgrade paths, the tools to verify the version integrity of the entire stack and the services to help when you need it? 
  • Tool sprawl is a risk; does your vendor provide an API to allow you to leverage the tools you have already invested in, or do they force you to use their stand-alone tool that doesn’t talk to anything else in your environment?  Does your vendor give you the ability to scale those tools across multiple separate instances, even in different locations? 
  • Touchpoints with other business solutions are a risk; does your vendor certify interoperability with BRS and multi-site replication solutions?  Do they have products that can be included as part of the stack purchase to handle backup, restore, migration, replication, adjacent storage and data protection?

It’s risk mitigation that I think we lose sight of when considering companies who offer to “simplify” the data center and abstract all of the hardware on the back end with a black-box software shim.  All they are doing is hiding the old complexity (which we largely understand and have trained our personnel to handle) with new complexity.  Does it remove the need for a storage fabric?  No, but it does mean you don’t have to have it external to your servers.  Does it remove the need for a storage array?  No, but the new storage array might look different than the old one does.  Does your business have the ability to assume that risk?  At the SME level, you bet it does.

Companies like Scale, Nutanix and Simplivity and others like them will find lots of customers for whom the acquisition cost, rightly or wrongly, is the close to the end-all, be-all of the risk equation.  For them, it’s a direct line comparison between doing their business in the public cloud and getting the cheapest VMware/Hyper-V private cloud they can.  This isn’t a bad thing, by the way!  When your entire data center consists of low-powered applications that don’t need stringent SLAs, you don’t need the same number of people to manage them, and you don’t need the same kind of infrastructure to support them.

12635-no-shortcuts

Moving that model into the larger enterprise, however, is fraught with peril.  In addition to having more competitors in that space who have deep and long-lasting relationships with the customers, the requirements on the left side of the equal sign are significantly higher.   If you think about it, there’s a reason why converged infrastructure is such a small part of the overall technology spend: most of them force you into a compromise.  If you choose a reference architecture, you get lots of help on the front end, and they certainly do their best to make it easy to purchase the relevant technology, but there’s no active post-sales support at all, specific to the platform purchased.  No operational tooling, no integrated support, no ability to natively include ancillary business solutions like workload mobility, replication and backups, no validated interoperability guidance…

For all of the reference architectures I can think of, there’s no native intellectual property included at all, either in the design or the solution, which means the customer still has to manage all of that risk themselves, just like they’ve always done.  Yes, they get to pick the components they wanted.  Yes, they get some implementation guidance.  Yes, they can design a solution that supports all of the applications that their business needs to support.  Yes, there’s some incremental value in that.  No, it’s not the best you can do.  Every day that passes after implementation that platform gets further away from the tested config.  I think this Tweet says it best…

Shared Hallucination

A reference architecture is a gateway drug to a shared hallucination: the vendor pretends they have sold something with on-going value, and the customer believes that he’s done more than take advantage of a marketing campaign.  In the end, this is the status quo, repackaged.

On the other side of the spectrum, the hyper-converged vendors ask customers to make a different kind of compromise.  In return for a lower acquisition cost, and ease of scale, customers have to understand that there’s NO flexibility to adapt to workload requirements outside of a very narrow set of parameters.  When all you have is DAS of different types and a relatively untested software algorithm to move data back and forth and provide the error-checking and redundancy necessary, there’s only so much you can do.  There’s certainly some interesting players in the hyper-converged space, but if you look at the list of requirements listed above, how many of them are really satisfied by a combination of generic hardware,  DAS storage and a Google or ZFS file system?  How many large enterprises are going to run mission critical SAP and Oracle instances on a Nutanix cluster?  Like I said earlier, the smaller end of the market will be willing to make the tradeoffs necessary to minimize the acquisition cost, so it’s not like there’s no market for hyper-convergence, but my opinion is that it’ll be at the expense of people like Dell and HP, not companies like Cisco and EMC.  They are just different markets.

It’ll be the hyper-converged companies, however, that are hurt most by the inevitable move of business and productivity apps to the cloud, in my opinion.  If I were an SMB or start-up these days, there’s probably no chance I’d buy hardware at all.  Every application I need is available in a SaaS model, from multiple vendors with proven track records, so why bother?  This holds true at larger enterprises too, but usually only in specific lines of business, and there will always be legacy applications that have to stay in house.  The risk to the larger hardware providers is mitigated somewhat by the fact that many of the SaaS companies and the service providers that host them need to have enterprise class hardware on the back-end, and it’s no coincidence that some of the largest service providers and systems integrators in the world have standardized on integrated infrastructure like the Vblock Systems.

fightthefudAll of this leads us to a new question: what kind of converged infrastructure is your vendor trying to sell you?  With the lines between reference architecture and converged infrastructure being blurred, inadvertently at times by vendors who have offerings in both categories and deliberately by vendors who don’t, I’ve had more than one person reach out and ask “what exactly am I being pitched here?”  This gets especially interesting with the analyst groups who are relatively easily snowed by vendors who want to move revenue from one category to the other.  For example, I have never, ever, even once, seen a pre-configured HP Cloud Matrix infrastructure solution that used 100% HP technology, was quoted on a single PO and manufactured and delivered as a unit by HP.  Never.  Yet HP still reports almost $500MM in revenue from “Integrated Infrastructure Systems” in the Gartner Market Share Analysis report from late last year.  Ridiculous.  But who can tell?  How do you tell the difference?

Before VMworld, I hope to release a flow chart that helps to resolve this confusion.  I’ve worked with some friends at Simplivity, HP, Dell and a couple other vendors to create a list of real, cross-vendor characteristics for integrated infrastructure, reference architectures and integrated workload appliances.  Using it, customers will be able to easily walk through the flowchart and see exactly what they are getting, and hopefully we can help clear up the confusion on the analyst side of things as well.  If you are a vendor of integrated infrastructure and you want to be involved in this process, reach out to me and let me know.  The more input we get, the better the outcome for the industry is, and this isn’t meant to be a competitive thing as much as an education tool.

What do you think?  Polite comments and discussion are always welcomed below, and please make sure to disclose any relevant affiliations that are important to the conversation!