Scaling Out vs. Scaling Up: An SP Viewpoint

This topic has been discussed a lot, but Scott Lowe’s posts here and here shows that the argument still has life.  More importantly, some of the new hardware being released (Cisco UCS, HP ProLiant G7, etc…) is creating tension between the desire to conserve capital, and mitigate risk, especially with virtualization in the mix.

As usual, the actual argument happens in the Service Provider space much as it does in the enterprise space, but the variables and business drives are just different enough that I thought I’d walk through the issue…from our side of the fence. 🙂

As a service provider with a large, successful and mature Enterprise Cloud offering with multiple ways we can deliver that to our customers, we realize additional margin with every additional VM that we consolidate onto a single vSphere host.  It’s just simple math: less capital spend on infrastructure + more customer revenue = happy executive team!  Previous to the Nehalem generation of processors, the CPU was almost always our bottleneck in a multi-tenant environment, so it made the decision easy: once you start seeing scheduler contention/wait time you add more hosts.  We never filled a host with RAM, because we’d never use it all.  The advent of the Nehalem CPU turned that “soft” limitation on it’s ear.

With the current version of the HP G6 servers we have yet to find a host that runs into CPU contention issues.  We’ve moved steadily upwards from 24GB to our current 72GB of RAM per host.  Honestly we’d probably put more RAM in them if the price of the damn 8GB DIMMS wasn’t so ridiculous!  A 2U server that can hold 192GB of RAM is pretty crazy (and don’t even get me started on the 384GB of RAM in a bladeusing Cisco UCS!).

In an enterprise, there are definitely use cases for that kind of density, both in the physical and virtual server spaces.  There are also use cases in the SP industry, but it’s a hard line to walk with customers.  Remember the triangle rule: everyone competes on price, service or technology, but you can only excel at two.  Some offerings in the public cloud are obviously price plays.  The SLA is weak or non-existent, the platform lacks transparency or any significant detail around the infrastructure that is supporting the workloads and billing is usually done via website and based on some usage metric.  The customer expects to pay little and gets little in return.  I would (my opinion) classify most of the Hyper-V hosters, most of the vCloud Express and AWS in this bucket.  It’s not a bad business if you can generate the volume, but it’s a value play, not a technology play.

At my company, we compete on the service and technology fronts, we understand we aren’t the cheapest option.  Our offerings are built on the best hardware, using the best hypervisor and are built out as part of a custom configuration that is delivered to each customer as part of an iterative sales process.  It is an enterprise-class product, designed to manage enterprise-class workloads for customers with enterprise-class requirements.  Because of this, the scale out or up question is tricky.  Scaling up increases margins, but also increases risk for the customers.  Scaling out increases cost and lowers density, but is much more manageable if (when) issues happen that call on VMware’s HA process to aid in the resolution.

Every service provider is going to have a different target audience and price point, and that is going to drive the design of the platform.  We have a clientele that demands uptime and performance, not the lowest price and the ability to swipe a credit card, and we work hard to give that to them.