Why Are Organizations Struggling to Become 100% Virtualized? The nth Generation of Virtualization
Organizations are on a quest to increase operating efficiencies and reduce costs by maximizing the utilization of their precious resources. One of the key technologies enabling this is virtualization. The more virtualized an organization is, the more flexible they can be with their resources – enabling the sharing of resources and costs among multiple groups. To reach maximum flexibility, organizations have a goal of being 100% virtualized. Yet, many organizations are getting stuck at 75-85% virtualized. The question is why? Is it that some applications and their workloads do not perform well virtualized? Could it simply be the budget needed to reach 100% virtualized is not available or realistic? Or, is there something more at play that is preventing the organization from reaching the objective?
Over the past two decades I have spent time working with some of the largest IT organizations in the world. I have come to believe that the challenges are not as simple as the applications or the budget. Rather, I believe the many generations of infrastructure that have been deployed are what is keeping organizations from becoming 100% virtualized.
Over the past decade, organizations have focused on consolidating physical resources to virtualized ones with a variety of server, storage, networking, application, management and operational technologies. When architects of this virtualization revolution fail to achieve the level of virtualization hoped for, they tend to look at the next generation of technologies in the hopes that the newer technologies will further their virtualization efforts. These next generation technologies have ranged from servers with TB’s of RAM and a large number of cores, to flash-based storage systems and virtualization platforms that can support thousands of virtual machines. Even with the advent of hyper-converged systems that integrate the entire stack into a single appliance, organizations are struggling to eke out virtualizing the last 10-15% of their environments. The challenges in reaching 100% virtualization are many and require IT and business line managers to rethink their environments. This multi-generational rethinking, or the next generation of virtualization platforms and technologies is what I like to refer to as the “nth generation of virtualization.”
In technology, we tend to expect the next generation of technology to solve the problems of the past generation. With virtualization technologies, we expect to do more with less, but this is not always true. Many of today’s environments are based on 20-30 year old technology that was delivered to solve the challenges of today.
As an example, we all remember the carburetor in cars of the 70’s and 80’s. Having gone to university in upstate New York with its freezing cold winters, I came to hate my 1981 Toyota Corolla with its unreliable carburetor. As the temperature dropped below freezing, the choke did not always work and the carburetor would not open to allow the air into the engine with fuel to create an explosion allowing me to drive off. Instead, I was left to freeze with a long screwdriver in hand, trying to pry open the frozen choke plate air delivery flap and start the car. Fast forward a few years and better technology came to market in the form of fuel injection –- no longer was the unreliable carburetor there to plague my cold winter mornings –- it was replaced by a reliable fuel and air delivery system that always worked no matter what the temperature.
In IT, we often use shims (like the screwdriver) on our current solutions to enable them to solve new problems. As a result, we never really solve the problems that have plagued us. This discussion brings us to the first fallacy of the march to the nth generation of virtualization: one can’t solve today’s problems with 20-30 year old technology that was designed to solve the problems of yesterday.
A perfect example of this thinking is RAID technology. RAID was invented in 1987 to solve the problem of data availability. Drives were combined with other drives and an additional parity calculation was done to ensure data integrity in the event of a drive failure. However, this did nothing to ensure performance until newer RAID levels were invented, and current RAID levels have survived, mostly unchanged, for almost 30 years in many enterprise storage products. This old way of doing things has provided higher data availability in the event of single or multiple drive failures, but at the sacrifice of quality of service, performance, and cost, therefore impeding progress toward the 100% virtualized goal.
What’s the reason for this? Physical RAID groups have incredible contention for blocks of storage whether on disk or a flash-based system. This is because the blocks are usually shared within the virtual machines to aid management efficiency and scale. These environments require better architectures that share nothing in a physical layout at scale. Many organizations are looking at tens of thousands of virtual machines in their environments and they require quality of service and performance in the event of a drive failure, or multiple drive failures with recovery in minutes, not days. Also, these physical RAID groups are built with extremely large drives with TB’s capacity per drive. When they fail, the system processors that are normally writing and reading data are used to recover the data on the failed drive using a hot-spare drive and mathematical calculations to recover the data from parity. It can take more than a day to over a week to recover the drive depending on the workload on the storage array at the time and the amount of data to be recovered. This non-deterministic approach to recovery and performance has been the bane of the enterprise for over a decade. Can you imagine going to your CIO and saying, “We had a failure and we don’t know how long our performance and quality of service is going to be degraded”? Sounds like a good time to update your resume. This challenge is further illustrated in the service provider market where quality of service events require the provider to refund payments for service. Internal IT organizations are mirroring service providers as their lines of business demand quality of service for their top tier applications, yet they are relying on legacy platforms that suffer when failures occur.
This is why it’s so important for storage solutions to focus on solving the problems of today and the future, ending IT’s reliance on shims like the screwdriver (plus a little hope and a prayer) to keep outdated technologies in step with the organization’s needs while also removing barriers to the 100% virtualization goal. IT can find such solutions in INFINIDAT platforms that are built from the ground up to solve the challenges of storing the future. By ensuring performance and quality of service is not interrupted when components fail, we have enabled the enterprises to scale beyond the limits of legacy platforms. Organizations can reach 100% virtualized, all while ensuring performance, reliability and budgets are met. When drives fail in the InfiniBox enterprise storage array, performance is not impacted and the system protects itself in under 15 minutes, even if there are multiple drive failures. In fact, INFINIDAT’s InfiniBox is designed for lights out environments that would normally cripple legacy systems that require constant care to ensure maximum performance and reliability. Based on our testing, we can expect an InfiniBox system to not require service for drive replacement in over three years as the system can have multiple drives fail and not suffer performance degradation.
Watch how TriCore Solutions, a leading cloud provider, expanded their services effectively, increased both reliability and performance, and gained operational efficiencies with InfiniBox.
One of our clients, TriCore Solutions, a leading Oracle Solutions provider, deployed an InfiniBox and had four drives fail in 14 months. When we showed up to replace the drives they asked why were we there as they had no performance or quality of service incidents. We asked the team at TriCore what they would have done if four drives had failed on their legacy system used before InfiniBox and they said that they would have lost data. According to TriCore, their legacy provider’s service personnel would have been running onsite after the first drive failed for fear of more drives failing, and that TriCore’s quality of service would have been a problem for anywhere from a day to a week. This is just one example of how the InfiniBox is built to ensure that organizations can deliver performance and quality of service as they reach 100% virtualization breaking the chains of the nth generation of virtualization.