Thin Provisioning, Here We Come?

As a follow up to my post yesterday (this morning, whatever…) about disk sizing, we came across some interesting data I wanted to share. 

I want to preface this post with an admission: I am crazy about data.  The bigger the pile the better.  I love digging through a mountain of seemingly unrelated information trying to find what secrets it holds and what paths it reveals.  As a former baseball player, manager and league commissioner I've long been known to have a soft spot for statical analysis and this comes in handy during my day-to-day job (although it sometimes drives my team crazy).

This obsession with data (and a healthy case of OCD) has led to some interesting habits:

  • I never delete anything.  Ever.  I might need it. 
  • Yes, I have every e-mail I've ever written, stored locally in PSTs by year and archived off-site.  Xobni is your friend!
  • I have to limit the time I spend doing anythingwith regards to statistical analysis; if I don't I'll waste an entire day playing!

At my job, this plays out with me have spreadsheets for everything.  Doing product development/management requires lots of trending (sizing, scalability, pricing, etc…) and I've always got Excel open trying to figure out how to make the numbers work.  The partners we work with, especially a certain District Manager at EMC, like to mock my spreadsheets, but they are how I bring order to my world.

So back to the actual reason for the post.  Today, courtesy of my favorite VMware Ninja, we decided to take a peek into the utilization of not just the VMFS volumes, but of the disks presented to the Guest OSs directly thanks to the presence of the VMware Tools.  The question was, how much of the presented storage could be recovered on average if we enabled VMware thin provisioning?

Using our mountain of vCenter-based data, we queried every volume on every Guest, got both full capacity and free space, and dumped it all into, you guessed it, a spreadsheet!  I'll have to say I was pretty surprised with the results, because almost 60% of the space being presented to the VMs is sitting unused!  60%!!  The first decision point on how we bring our provisioned and sold disk amounts closer together is definitely going to include thin provisioning of the VMs.  Since there's (almost) zero performance impact to the guests/hosts, and since we already do a good job of managing the resource availability in the datastores, there's really nothing to lose except the bloat.  Of course since this is a production environment we'll test, test and test some more, but I think we're on to something.

Better resource utilization is good for everyone.  The customers get a product that costs less and that is more "point" scalable, and Peak 10 gets to expand the number of customers we can support on a given platform.  I'm excited that the data gave us the justification we needed to start looking in a new direction, and this is definitely a spreadsheet I won't be deleting. šŸ™‚