De-Risking Enterprise Storage Upgrades (Part 1)
Guest Blogger: Eric Burgener, Research Vice President, Infrastructure Systems, Platforms and Technologies, IDC
During the life cycle of an enterprise storage platform, administrators will likely upgrade that platform a number of times. In defining upgrades, we're specifically discussing within-system upgrades which include issues like firmware and software upgrades, applying software patches, and various types of hardware upgrades where relevant (e.g. controllers, storage devices, etc.). For storage platforms that host at least some mission-critical workloads, the ability to perform within-system upgrades non-disruptively is a critical requirement.
Because it puts IT infrastructure much more in the path to critical business success, the digital transformation that most enterprises are undergoing only enhances the need for higher availability across a variety of different workloads. Design decisions as well as feature implementations can significantly minimize upgrade risk (and in some cases remove it entirely), and there are a number of things to look for in a storage system that absolutely must support non-disruptive upgrades.
IDC has noted a trend towards increasingly dense mixed enterprise workload consolidation as customers refresh their storage infrastructure. While workload consolidation can drive compelling economic and administrative benefits, it also potentially increases the size of the fault domain. This makes high availability a foundational requirement for a workload consolidation system. The overall availability of a system is impacted not only by the level of redundancy and the failure recovery routines built into a system but also by how it behaves during routine within-system upgrades. With many enterprise storage vendors claiming to deliver "five or more nines" reliability during normal operation, customers also need to look at and understand how upgrades are performed and how those upgrades may impact application services.
That said, what should enterprises be looking for in a storage platform that needs to support truly non-disruptive upgrades? Here are some of the key ones with quick discussions of why they're important:
- A more current "clean sheet" design. Many enterprise storage systems still being sold today were originally designed in the 2000s (or even earlier). Architectural design objectives were very different from what they are now, as were the performance, availability and functionality requirements. Persistent flash was not even on the horizon, in-line storage efficiency technologies didn't exist, and primary storage systems were envisioned to support tens of terabytes of capacity and only gigabytes of system bandwidth – to quote only three examples. As these systems get upgraded to support new media types, the core architectural assumptions can't change very much. Twenty years ago few (if any) enterprise storage systems were designed for non-disruptive upgrades – there was no expectation for it. A "clean sheet" approach enables vendors to free themselves from legacy baggage, designing the systems most efficiently to meet today's requirements. The subsequent discussion below calls out some of the more modern design approaches that enable support for non-disruptive operations.
- Software-defined designs. Many of today's established enterprise storage providers' platforms were architected under a "hardware-defined" assumption. A clear trend in enterprise storage over the last decade has been the evolution toward more "software-defined" designs. These are preferable because, by moving more functionality into software, they are easier to enhance with new features, support better non-disruptive upgrade capabilities, can better accommodate new and different hardware technologies and device geometries, and offer increased flexibility. New software releases can offer higher performance and critical new functionality without requiring any hardware upgrades. While some functionality can be moved into software even with older hardware-defined designs, a "clean sheet" architecture that starts with a software-defined orientation provides much greater flexibility to support extensive feature innovation through software upgrades.
Software-defined approaches enable the virtualization of both the front end and back end of a storage system. This design enables functional updates to each half that do not impose ripple effects on the other, supporting more reliable non-disruptive upgrades. They also enable better fault isolation to minimize the impacts of any given failure while at the same time enabling less complex fault management routines that impact not only failure recovery but also upgrade execution. Software-defined designs have proven these benefits, which is one of the reasons why the industry is moving away from hardware-defined designs to more software-defined ones.
- Operating system architecture that places most of the operating system features in user (not kernel) space. Legacy storage operating systems built around monolithic designs were much more prone to failures or other impacts when even a small glitch occurred. These storage operating systems also required comprehensive regression testing when new features were added, and imposed significant risk when making changes. Newer designs restrict core infrastructure plumbing capabilities to the kernel, making it much smaller and more reliable, and run features like data reduction, snapshots, encryption, quality of service, replication and other storage management features in user space. With this approach, any of these features can fail or be upgraded without impacting the kernel, enabling the system as a whole to continue servicing applications. This design approach makes systems more reliable, enables less risky upgrades, and requires less regression testing when changes are made by either vendors or users.
Use of these three design considerations is widespread among enterprise storage array vendors that have architected their systems within the last six or seven years. But Infinidat, an enterprise storage provider who focuses on selling large scale systems that are specifically architected for dense mixed enterprise workload consolidation, has gone beyond these three common approaches with additional innovations that further improve their ability to provide non-disruptive operations. That is the focus of my next blog on the topic of de-risking upgrades.