Technical Impressions of UCS

I’ve written a lot about the process we went through in the lab, and some of the challenges we faced from a business standpoint, but I wanted to try and give a straight technical review of the UCS platform.  Regardless of your business model or focus, there’s a lot to like there, and there are some concepts that are new.  With Cisco being the first real addition to the ranks of the x86-based server market in years, they had the ability to step back from how things are being done by the likes of HP, Dell and IBM and make some changes that definitely challenge some of the traditional wisdom.  It’s also easy to see that there was a healthy dose of input from a company very, very comfortable with the concepts of data muxing and de-muxing as well as with the framework needed to manage very large streams comprised of different kinds of packetized data.

6a0128779ad48c970c01310f4494c5970c

The first thing to note is that while I will talk about the UCS product in conjunction with the VCE alliance and the vBlock it is definitely a stand-alone product.  Cisco sells the system directly, or in conjunction with other storage partners like NetApp.  At it’s core, it’s an x86-based blade chassis, holding 8 half-width or 4 full-width blades in a single chassis.  The half-width blades are relatively unremarkable from a CPU/RAM standpoint, looking like many similar products on the market.  The full-width blades have an interesting distinction, that being the Cisco Extended Memory Technology which allows for double the memory footprint of a standard Nehalem-based server.  384Gb of RAM is an almost ridiculous amount of memory, but would certainly come in handy in some application use-cases.

The first real innovation happens at the back of the blade, where it ties into the chassis interconnect.  Cisco offers two kinds of “de-muxing” (my word, not theirs) interconnects, a CNA based on a standard Emulex or Qlogic chipset, and a Virtual Interface Card based on a Cisco chipset (formerly code-named “Palo”).  The CNAs function much like any currently available on the market now: in one direction they take the converged network stream, break it out into virtual NICs and HBAs and hand it off to the blade, and in the other direction they mux the data types into on stream and hand it off to the fabric interconnect.

6a0128779ad48c970c0120a8ddd231970b

The CNAs are good, and work as advertised.  There are some drawbacks to note, however:

  • There is no offload engine, and all of the muxing/de-muxing happens on the blade processor.
  • There is no flexibility in how the presentation to the hardware looks; you get two vNICs and two vHBAs, no more, no less.

In an environment where you have multiple physical networks that are connected to the hosts (which as a service provider we do), you are going to have to think hard about how to collapse that down.  Especially when used in conjunction with the Nexus 1000v-based dvSwitch in VMware, you end up with a very, very simple network layer, which can be challenging.

The VIC/Palo card is brand new (I believe the first of them started shipping last week) but very intriguing.  First of all, they offload all of the interconnect operations onto the card itself, so there is going to be a performance gain for the blade just by having it in there.  It also allows you to present up to 128 vNICs and vHBAs to the host, allowing much more flexibility with regards to the backend network.  It will be great for environments that have multiple physical networks, but it will be a necessity for hosts that require connections to more than one discrete storage fabric at a time.  Unfortunately we didn’t get any hand’s on with the Palo cards in the lab, but I’m definitely looking forward to getting my hands on them.

Now that we’ve looked at the blades, the chassis itself is interesting mostly because of the unified fabrics in the back.  Each chassis has two physical fabrics, and each fabric has a total of 40Gbps of bandwidth upstream.  While that seems like a lot, fully populated with half-width blades it still means you are oversubscribing the bandwidth 4:1 (the blades have two 10Gbps vNICs each, for a total of 16Gbps of bandwidth). It remains to be seen how that will affect us in the real world.

Upstream from the chassis are the Nexus 6100-series fabric interconnects (FI).  There’s a couple parts of the UCS magic that happen here.  The first is that this is where the data gets aggregated from the network and SAN layers.  The FI has both FC and 10GE connections, and is able to be configured to allow for more bandwidth on either side as needed.  Each FI services a single fabric connection downstream, so there’s no interconnect between them at all.  Upstream you can have any fibre channel infrastructure you want, and I suppose you could have any 10GE infrastructure either.  Obviously Cisco makes the MDS switch series and the Nexus 7000 series specifically for this environment, and everything plays well together.

6a0128779ad48c970c01310f449769970c

The last bit of “magic” is also contained in the FI devices: the UCS Manager interface.  The UCS Manager, as a piece of software, is honestly about as impressive as the hardware itself.  It (finally) introduces the concept of a “service profile” into the mix, allowing you to pre-provision both network and storage by chassis slot, regardless of whether there’s a server present or not.  This completely separates the workload from the hardware, making hardware failures easier to manage (no more pulling HBAs or re-zoning/masking LUNs!).  The interface is a little cluttered, but with everything you can do from the LAN, Server and SAN standpoints it’s amazingly functional.  The Ionix UIM that is included as part of the vBlock build actually does a pretty poor job of recreating the UCSM functionality, or maybe said another way, it doesn’t bring a lot to the table from a UCS standpoint that wasn’t already done well on the UCSM side.

One thing I need to make a quick comment is the environmental requirements that the UCS requires.  There’s some variability in the build-out, of course, but the standard vBlock build out includes:
For each Compute Node (4 full UCS Chassis, 2 Nexus 6120XPs, Rack, Cabling)

  • 1300lbs of equipment
  • 12,568W of power draw
  • 51,628 BTU/hr of heat dissipation

For each Aggregation/Services Node (2 Nexus 7010, 2 MDS 9509, Rack, Cabling)

  • 840lbs of equipment
  • 10,539W of power draw
  • 43,437 BTU/hr of heat dissipation

That’s an INCREDIBLE amount of strain on even a new data center.  33Kw of power draw in three cabinets works out to 500W/sqft and that’s a lot.  As we work out the feasibility of using this product set in our company you can bet that I’m going to go talk to the facilities engineering team first.  This is definitely something that you can’t throw into any data center (much less a multi-tenant one) without some planning and discussion.

Overall, the hardware is well made, well thought out and offers a lot of functionality.  The operational savings that the design and UCSM offer are awesome, as is the single point of support and purchasing that the vBlock offers.  Despite the cost in both capital and environmentals it’s a compelling offering from three of our biggest partners and it’s definitely one we’ll consider carefully.  If you have any questions or if there’s something I didn’t cover here, please leave a comment and I’ll try to get you an answer.