Infinidat Blog

Introducing InfiniBox 3.0

When you set out to change an industry like enterprise storage, you need to constantly deliver on new innovation. Just to note, INFINIDAT is now up to 62 patents granted and another 60 that have been submitted. With the release of version 3.0 of the InfiniBox yesterday, we are very grateful for the opportunity to share the latest innovations in compression, iSCSI support and performance analytics, that we’ve built into what is already a very patent rich solution. Read on to learn more.

Data Compression Without Performance Impact

When it comes to compression, our goal was to develop a compression algorithm that lets you store more data on the box with absolutely no impact to I/O latency.

InfiniBox 3

Here’s how our compression works. As a write comes into an InfiniBox, it’s goes into DRAM on one of the three controllers — which is 10x faster than any All-Flash array. Next, we mirror that write over our InfiniBand network and we put a copy of that pending write into memory on one of the other two controllers. Once we have the write in two places, on two separate physical servers, we acknowledge that write back to the host. This entire process takes 180 microseconds.

When compression is enabled, nothing changes in that critical data path. The write comes in. It gets mirrored. We acknowledge the write back to the host. The host clears its buffer and continues to work.

We have implemented our compression mechanism on the back end of our architecture which is completely virtualized and independent of any host I/O. As writes accumulate in our DRAM cache (every InfiniBox has up to three terabytes of DRAM), we have a process which holds data sections out of memory and assembles them into structures that we call the destage stripes. It is here where we compute parity and the data protection. We call it InfiniRAID™. Next, we persist that data and we write it to long term media. During this destage is where we implemented compression. In version 3.0, the allocator pulls data sections out of memory. It computes parity and then it compresses those data sections before creating the stripe, before creating the data protection, and before writing it to backend media. This allows the InfiniBox to have all the benefits of compression without taking the performance hit that comes from the act of the data compression itself.

Learn more about InfiniBox 3.0 in this episode on The Cube.

Compression is a completely inline process. There is never a case where uncompressed data is written and then in the background it’s compressed, like in a background scrubbing process. The only way that data can get to persistent media is if it passes through the compression engine.

The other interesting aspect of our implementation, is that we implemented it as a data-aware compression framework. So at the moment of destage, the compression engine can choose from a catalog of potential compression algorithms and choose on a section-by-section basis the one which is going to give the highest compression ratio. In version 3.0, the first compression algorithm that we have dropped into this release is LZ4 compression.

The other thing to note is that this is a pure software implementation that supports all of our protocols. There is no special hardware that’s required. There’s no ASICs or FPGAs or any type of hardware whatsoever that’s required for with our compression capabilities.

The InfiniBox already has a pretty significant pricing advantage over other enterprise systems on the market, but now with compression, we’re extending that advantage. In our lab testing and in real-world customer scenarios we are seeing compression ratios of 2:1 and sometimes as high 9:1 depending on the data type. As a result, InfiniBox now offers the ability to store as much as 5PB of effective capacity in a single 42U rack. This is unprecedented in the industry and will enable massive consolidation savings for our customers.

Compression

Example Compression Ratios by Storage Pool

“Carrier Grade” iSCSI

INFINIDAT has taken iSCSI support a step further in version 3.0 enabling additional reductions in the amount of rack space required for a given amount of usable capacity. Until now, InfiniBox supported iSCSI via a dedicated node that we call on iSCSI node — a well supported approach used by many of our customers.

In version 3.0 of the InfiniBox software, the iSCSI service is delivered natively from the same nodes that deliver NFS, and Fibre Channel. This reduces the number of rack spaces that are required to support iSCSI, and also makes it a first-level peer with all the other production protocols that we support.

iSCSI

In the past, solution architects have always advised customers that they can use iSCSI, but if they want the lowest latency, the highest performance, and the most deterministic performance, invest in a Fibre Channel SAN. What is unique about our new implementation is that it’s true carrier grade iSCSI. Just like with our SLAs for Fibre Channel, and NFS, we offer seven nines of reliability and there is absolutely no performance penalty.

More customers are looking to iSCSI to help lower their costs and with our implementation, they can do so without sacrificing reliability. Also, our iSCSI feature is such that managed service providers and cloud providers, who need to deliver storage that is 100% available, can now do so over ethernet fabrics.

Understand Your Storage Better with Performance Analytics

Version 3.0 also features a major update to our performance monitoring framework — performance analytics. We have put a great deal of research behind this feature and believe we are the first storage software company to offer this capability.

The performance analytics tool in version 3.0 allows storage operators to do multidimensional analysis of performance and operational metrics coming off their storage systems and their storage fabrics. If you’re familiar with data warehousing concepts and OLAP cubes, this will totally make sense to you.

Performance Analytics

Our implementation of performance analytics is what state-of-the-art looks like in the industry today. Performance analytics provide the ability to take various storage objects and see time-series analysis. For example, you can show IOPs, latency, and throughput, all with respect to time. This can also be done at an individual volume or pool level and it can be done for an individual file system, or for an entire storage rack.

Also, our data science team has been looking at the very rich amount of instrumentation that we have in our code. Fundamentally, this feature is an extension to our functional module software architecture.

By way of example below, we’re looking at a NAS volume consuming resources. We’re looking at traces of operations. Here we are showing SCSI and NFS commands. This then allows you to look at latency at the system level for SAN and for NAS with respect to time; this is how you get started.

NAS Data Resource

NAS Data Resource Profile

If you right click on any trace on the screen, you are presented with a context-sensitive menu. By right clicking on any particular pool, it gives you the option to drill down and select by I/O category, by I/O type, or by latency, providing you with the choice of what dimension to analyze.

You really have a ton of options here. You can create a histogram for various I/O sizes and see how the commands break down. Or you could do it by latency and see if it’s particular I/O sizes that are causing certain behavior or latency. The most powerful out of all of these is the “comp” option, which is called Filter. Here you can filter a particular trace, and have a context-sensitive menu that lets you start discriminating on the data that you want to analyze. You can start by looking at all of the traffic that’s going into a particular tenant or pool. Next, you can have the filter only show metadata operations. This would exclude reads and writes, but it would include, for example, NFS ReadDir commands or file opens and file closes. In every case it is the metadata that tends to be very problematic for legacy storage systems.

The whole point of this feature is to allow storage operators to start at a very high level, look at the operation at the data center level performance, and then drill down to individual storage flows from one endpoint to another, and then slice and dice it into various dimensions.

These metrics are also JSON formatted, so it’s really easy to parse and incorporate it into your existing infrastructure. This is especially helpful for large shops or service providers that have their own interface for managing their environment.

Conclusion

It is important to point out that ALL of these features are available to 100% of the InfiniBoxes that have been sold and are deployed in the field. INFINIDAT has hundreds of systems that have been deployed since late in 2013 and every system can take advantage of these new features — AT NO CHARGE. The upgrade is non-disruptive and compatible with every InfiniBox system out in the field. Continuing with our mantra that ‘add-on software license features are bad,’ clients who have purchased a system and maintenance, receive the upgrade at no additional cost. This methodology helps to ensure that the customer has a true understanding of the TCO of the system at the outset, unlike other storage vendors who charge for new features.

These are the three core anchor features of our version 3.0 release. In addition to these new capabilities, we have enhanced the system in other ways too. For example, the ability to have as much as 200TB of SSD in a system helps increase our lead when it comes to outperforming All-Flash arrays with real world workloads.

About Steve Kenniston

Steve is a well-known storage industry evangelist and blogger (www.thestoragealchemist.com). Over the last decade, Steve has worked for storage startups such as Avamar and Storwize and helped to sell and integrate these companies into EMC and IBM respectively. As VP of Product Marketing at INFINIDAT, his mission is to help clients understand the best technologies to extract the most from their data. TEST

1 https://www.genome.gov/11006943/human-genome-project-completion-frequently-asked-questions/