Tuesday, October 31, 2006

In the age of virtualisation, does hardware matter anymore?

Hmmm... some recent developments at my company got me thinking. Basically, the old 'x86 tender' reared it's ugly head again, as it does every few years. In the past, I have been fairly passionate about the hardware that runs the platform I have supported or developed SOE's for. I have actually turned jobs down because they didn't use HP, and resigned from others because they decided to start buying cheaper hardware. I strongly believe you get what you pay for. For example, 2 Engineers worth $50K each do NOT equal one engineer worth $100K. Similarly, if a company decides to go for inferior hardware because it's cheaper, showing a blatant disregard for the work/life balance of their support staff, then they can go fuck themselves. But that kind of thinking belongs in a world without virtualisation.

With virtualisation well entrenched in the roadmap, it didn't seem to phase me so much this time around. All the pain that inferior hardware causes (dodgy hardware, poor hardware design, bad drivers / firmware, frequent driver / firmware updates, frequent engagement of vendor support, etc) pretty much goes away with virtualisation, if your farms are sized intelligently and you are leveraging high availability technologies like HA / DRS on the VMware VI3 platform or clustering with Microsoft Virtual Server.

One of the things I hated the most in my time in the pharma industry was driver validation. Trust me, when a new print driver is released that you really need to fix a nasty bug (I'm talking around the times of NT4 to 2000 migration), and thousands or even tens or hundreds of thousands of people's well being depends upon it's ability to correctly print the micro symbol from an application written for a German MUI that is being run in a manufacturing facility in Kuala Lumpur on a machine with a Malaysian MUI, you WILL care to test it. Then test it again, then have someone else test it, then have another group validate it, then do a UAT, then verify the UAT, then have the validation department verify the UAT, then have the validation department run their own UAT, which is validated by another independent internal QA group... you get the picture. Oh and then do that with the 11 MUI's that your company mandates must be supported. Virtualisation obviously doesn't help with print drivers, but with video, network, array controller, system management processor and chipset drivers taken care of, there's a whole lot less to worry about. I don't want to know what would happen if 10 milligrams of the active ingredient in Viagra went into each pill instead of 10 micrograms. At least they'd all die smiling I guess .

As anyone who has built SOE's for large enterprises knows, hardware differentiation is the cause of most of the work. Having to do hardware enumeration and appropriate driver and software installation causes more work than the actual OS configuration by a long shot. But again, with VM's, this pain goes away. Most of the updates to SOE's surround driver / firmware updates, having to include new hardware models in the build etc. It's all gone now, or going. System recovery? Easy - restore the last snap of the VM. Hell, bring it up on your DR infrastructure instantly because that's where your disk based backup is, right?

No matter how much redundancy you build into a single piece of hardware, you can't account for everything. Ever bought x86 hardware with a redundant motherboard? How about a storage frame with a redundant circuit breaker? Didn't think so. So when you look at it like Google does, hardware just doesn't really matter anymore. We should design / engineer our infrastructure with regular hardware failure as a given. That way, we'll end up with something that is both resilient and cost effective. And we may even get that work/life scale tipped back to horizontal.