High Performance Computing: GPFS on InfiniBox
I’m always interested to see how our customers use INFINIDAT storage. We have customers running InfiniBox in the VMware, OpenStack, Oracle, SAP, and numerous other environments. This time I had a chance to talk to a team which runs our storage as part of their High Performance Computing (HPC) setup.
Data intensive workloads require highly tuned, specialized compute and storage solutions. There are many different use cases, and large scale analysis of genomic data is one of them.
Last year, the Weizmann Institute of Science, a multidisciplinary research center with around 2,500 scientists, had started to redesign their HPC environment to provide more capacity and performance for their genomic research teams.
The Weizmann Institute of Science. Photo courtesy of Elikrieg.
The Nancy & Stephen Grand Israel National Center for Personalized Medicine (G-INCPM) at Weizmann decided to deploy a new environment with an InfiniBox F6000 system as underlying storage for IBM General Purpose File System (GPFS).
As of today, the G-INCPM Weizmann team is highly satisfied with this setup. GPFS on top of InfiniBox runs in production, and GPFS cluster delivers multi-GB/sec of throughput to various client systems.
InfiniBox performance, capacity and density allows G-INCPM Weizmann to serve different environments from a single storage array:
Figure 1: InfiniBox in a mixed GPFS and VMWare environment
Most of the InfiniBox capacity is allocated for the GPFS storage. Filesystem size is 815TB with 627TB used by over 120M files. After thorough testing, the Weizmann team deployed 2 GPFS storage nodes with 2x16Gbps HBAs per node connected to the InfiniBox storage. Based on the file size breakdown and performance expectations, GPFS block size was defined as 16MB. Smaller block size could provide some additional savings for smaller files, but would result in performance overhead.
Every GPFS storage node has also one Infiniband connection to serve HPC clusters – an SGI UV and a PBS batch management system. In addition, GPFS protocol nodes are deployed as VMs on the VMWare cluster, accessing GPFS over an Ethernet/IB bridge. These nodes provide NFS and SMB access to applications which have lower performance requirements,which in turn, run as virtual machines in the VMWare ESX environment or on researchers’ laptops.
The G-INCPM Weizmann team have achieved over 12GB/sec throughput to the storage array during their pre-production tests. The production workload of the storage system regularly maintains over 4.9GB/sec throughput. Storage response latency is relatively high as the Weizmann team deliberately configured GPFS volume without SSD caching layer on InfiniBox.
Figure 2: InfiniBox performance at G-INCMP Weizmann as measured by InfiniMetrics tool.
GPFS Made Simpler
Running GPFS on InfiniBox storage makes the overall configuration simpler. There is no need to define a special tier for GPFS metadata, no need for intensive GPFS parameters tuning to achieve higher performance at the storage level. Also, there is no need to replicate disk groups to achieve higher reliability. All these aspects are addressed by the InfiniBox storage.
Leon Samuel, Operations Manager at G-INCMP Weizmann
The G-INCPM Weizmann team is very satisfied with the InfiniBox setup, and is now deploying their second InfiniBox system at the DR site. The team is testing GPFS Active File Management (AFM) for GPFS replication, and may consider a native InfiniBox replication solution for the VMWare environment.
“InfiniBox storage is a key part of our production environment. Once set up, it just works and provides great performance. GPFS usually requires a lot of attention by the IT team, but the underlying INFINIDAT storage gives us a necessary confidence in data availability,” says Sharon Dahan, Head of the Information Technology Unit at G-INCPM Weizmann.