Friday, August 13, 2021

AI, the new battlefileld for file systems

I wrote originally this article for StorageNewsletter published April 16th, 2021

The recent Nvidia GTC 2021 conference was once again the opportunity for storage vendors to refresh, update and promote their storage offering for AI aligned with new product announcements from the GPU giant.

Historically HPC was essentially deployed at research centers, universities and at some scientific/technical sites for very specific needs. Some vendors have tried to promote HPC into enterprises and some storage players follow the direction polishing the space and products with new design. Essentially covered by parallel file systems, this effort has anticipated a much larger adoption of systems dedicated to AI, ML and deep learning specifically. I notice a sort of convergence between HPC and AI both in terms of needs, requirements and vendors’ solutions.

As said, AI brings and extends HPC to the enterprise with some similarities, but of course differences, and really shakes some historical storage approaches as applications are highly demanding. AI presents new IO patterns with a need for high bandwidth, low latency with a mixed of file size, a mixed of file access pattern – small and large, random and sequential – but clearly read operations dominate the IO interaction with storage. These operations have a clear impact on the training phase and it can be limited by the data reading rate and of course multiple re-reads. Memory and storage hierarchy play a fundamental role here. And as a general idea, bring parallelism at every stage is an answer illustrated by the famous “Divide and Conquer” mantra to deliver linear scalability. In other words, performance is the dominant factor and requirement being a must have.

On the network side, of course IB, RoCE and Ethernet – multiple 100Gb/s or 200Gb/s ports are pretty classic here – configured as non blocking networks are well deployed with some capability to group interfaces for some vendors.


I tried to summarize the file storage solutions I see coupled with Nvidia DGX systems, POD and SuperPOD. I see essentially 2 file storage families here with NAS and parallel file systems all based on NVMe to satisfy performance requirements. At the limit, HDD could be considered for tier-2 of these but tiering has to be managed carefully to avoid any impact of data access.

To boost IOs, Nvidia introduced GPUDirect Storage to avoid the CPU path and role in data exchange providing data to GPU faster. This feature is enabled via the Magnum IO API superset.

For NAS, I don’t mean classic file server but rather a super fast scalable NAS such Pure Storage FlashBlade with AIRI or VAST Data Universal Storage and the LightSpeed specific flavor leveraging NFS over RDMA. Some of them develop their own internal architecture like VAST Data with a shared everything model or Pure Storage with a specific hardware-based shared nothing approach, all these playing in the scale-out NAS area. In some NFS-based configurations I also see NetApp AF-Series and Dell PowerScale, also a shared nothing model, as well and all these players use the nconnect (maximum is 16) NFS mount option to gain some parallelism effect from the NFS farm.

On the other category, parallel file systems are present as they’re aligned by design with the parallelism need of AI workloads. It’s important to keep in mind that this model is intrusive with an agent or a piece of software installed on the DGX side. These offering are represented by DDN AI400X embedding ExaScaler based on Lustre, WekaIO with Weka AI based on WekaFS, IBM Spectrum Scale and BeeGFS promoted by NetApp among others. As I wrote recently I'm still surprised that HPE, with Cray coupled with Lustre having also WekaIO on their catalog for AI, picked IBM Spectrum Scale claiming they need it for AI. And if you check the Weka AI reference architecture below, you will see some HPE Proliant in the configuration and this HPE white paper continues to illustrate the fuzzy HPE strategy. Also as said, DDN leverages Lustre with ExaScaler within AI400X especially for AI. As an example an AI400X delivers 50GB/s in read and 3 million IO/s and DDN recommends one appliance for every 4 DGX. The linear scalability was demonstrated with 10 AI400X coupled to 16 DGX offering 500GB/s in read. But it’s almost a philosophical or even religious decision, at least what is good is the wide choice users can consider.


Clearly, AI sets a new level of reference architecture (RA) and all the vendors listed above published some RA for DGX, some with POD and it seems that DDN is the only one validated for SuperPOD. You can visit this Nvidia page listing DDN, Dell, IBM, NetApp, Pure Storage, VAST Data and WekaIO:
Of course, it exists other vendors outside of this official Nvidia list who published RAs as well, I can list Pavilion Data or HPE with WekaIO for instance.

I also see the multi-protocol aspect of these storage as an attractive attributes like the possibility to expose the namespace via S3 and NFS without the need to copy or duplicate content. It could be used for instance for remote data ingest via S3 from dispersed IoT sensors and then processed accumulated data “locally” via NFS.

The term U3 – Unified, Universal and Ubiquitous – I introduced some time ago could be used also to qualify these offerings.

Definitely AI is the new battlefield for file systems and innovation runs fast in that domain.
Share:

0 commentaires: