Facebook is not IT, But is it Cloud?

With all the news about Facebook lately, it’s interesting to look at what they’ve become. Some pundits have labeled them the bellwether for growth in the IT space. With the release of their “Open Compute Project” data center, there’s talk of them ushering in a “new era” of data center design.

Don’t get me wrong, you can’t get to the size that Facebook is without being a difference-maker. The bigger you are, the more water you displace, and Facebook definitely gets everyone’s attention when they start moving around in the pool. That being said, I don’t see them as particularly visionary outside of the confines of their application design, and I’m not sure that I’d call them a bellwether for anything.

Look, marketing angle aside (and I personally think the “Open Compute Project” is brilliant) Facebook isn’t doing anything that Google and others haven’t already done. When you have a massively scale-out application (and almost nothing else) it becomes both wise and easy to create thousands of custom-built servers to support it. When you also have the ability to deliver that application on a completely custom code-base, there are other ways you can leverage the “cheapest-of-breed” model. When you can plan for the loss of servers on a massive scale (think ~50% at one time) you really don’t care about the quality or reliability of those servers anymore, right? In that respect Facebook and Google are doing almost the identical thing: racks and racks of throw-away hardware in a data center that provides the absolute minimum protection at the lowest cost per node possible. Of course it’s efficient. It’s also a model that doesn’t matter to the other 99% of the industry.

Ask the largest owners/providers of data center space (large enterprise, large co-lo, telco and SP companies) and all of them will shrug their shoulders. Outside air? Higher ambient temps? Physically isolated cold/hot aisles? Forced plenum? Efficient server power supplies? Modular design? Come on, all of these have been in general use for years. I worked for a regional co-lo company for six years, and almost every one of these principals were in use there since 2004 or so. None of this is revolutionary, and I don’t think it really matters other than as the final proof point for these technologies; if Facebook is doing it, it must be OK…

On the server side, the idea of packing a data center with 10,000 servers that are all identical is a pipe dream for “normal” data center operators. With multi-tenant facilities, the customer brings their own servers. Even in enterprise data centers there are multiple workload profiles that have to be accommodated, and even the magic of virtualization doesn’t round out all the rough edges. Different processor vendors and steppings have a material impact on the ability of the customer to use all of the features of a standard virtualization stack, and so unless you are going to apply CPU masking all over the place, or unless you are going to wholesale replace every server at once to keep the CPU models similar enough to be useful, you are going to have an issue. When you are running a single code-base that you can purposefully abstract away from the processor speed/type (think something like SETI@Home, where the “work product” is just a completed job, where the time to complete is tied to the hardware doing the work) this kind of model just doesn’t make sense. Mark Thiele has a good discussion of this on his blog here, and he estimates that just 25% of the Facebook data center design will be relevant to the greater market. I think that number might be high.

So it’s not IT. At least not the IT that we’ve known traditionally. And it’s not a general-purpose data center. Well then, what is it? Is it “The Cloud” we’ve all heard so much about? Well…maybe.

In my opinion, it’s cloud in the most basic sense, born out of the legacy “software-as-a-service” model. In this case you have a single application that is scaled out to an unbelievable degree. It’s so big it’s almost hard to wrap your mind around, but Facebook isn’t the only one inhabiting that universe. Google, Amazon, eBay, Twitter, ShutterFly and others are in that same space, where they have the need to run a small number of applications at incredible scale, at variable/seasonal loads, for millions of end-users who could be anywhere on the planet. In the purest sense, they are an “application cloud”, and their entire business model supports this to some degree. The challenge is that while this is normal and understandable to these kinds of businesses, it’s not something that the majority of the world’s IT can (or should) emulate. As much as server/storage/virtualization vendors would like to have us believe, the world does not revolve around the infrastructure; the users, and their applications, always come first. If you ever doubt this, just look at Oracle. When you have control of all the applications that the end-users need, you can treat them like crap and have them come back for more. In the case of Facebook and Google, the data centers, every server it holds and even the geographic location of the facility have been focused to support the one application stack they provide to their end-users, which isn’t so different from how an enterprise uses specific types of servers and environmental design to cover the requirements from each of the application types they support. The difference is in the number of applications that are required, and the scale that they are used.

Does that make it “cloud” in the sense that we are talking about today, with our “private cloud” and “hybrid cloud” nomenclature? I don’t think so. Calling Facebook’s “Open Compute Project” a private cloud is mis-counting the trees due to the perception of the size of the forest. There are only a couple trees, and even though they are huge, a private cloud typically is designed to handle a different workload profile. I don’t want to take away from what Facebook has done; as a guy who lived and breathed data centers for a long time I’m appreciative of the efforts they and Google have put into their environments. I’m just not sold that it’s revolutionary, a bellwether for where general IT is heading or a good example of a private cloud.

Thoughts, comments and (polite) objections are always welcome in the comments. Please disclose any vendor affiliation to help keep the conversation on track.

(BTW: for those who said it was impossible, there’s a 1100+ word blog post that doesn’t mention my employer or our flagship product even once. 🙂