Tuesday, November 21, 2023

Hammerspace promotes pNFS for AI workloads

Hammerspace, a fast growing file storage company dedicated to U3 files accesses, just published an interesting reference architecture for AI workloads. U3 refers to the acronym and the model I introduced more than years ago in a famous (french) blogpost and it means Universal, Unified and Ubiquitous.

As the most active promoter of pNFS, Hammerspace illustrates a new use case where the NFS extension can be used and deployed to deliver very high throughput results.

pNFS couples several advantages against classic parallel file storage and classic NFS:

  1. It is a standard, developed many years ago, and belongs to NFS. Standard NFS means that the NFS client is already in the Linux distribution and can deployed right away. For the background and to explain the maturity and expertise around pNFS, Hammerspace, with its previous company instance Primary Data, acquired the Israelian company Tonian Systems in 2013, a developer of a metadata director tailored for pNFS. For the ones who are interested about Tonian, I wrote in 2011 and 2012 2 blogs posts in French, sorry, (1 & 2) about them. And for others I did the very first interview of David Flynn when he launched Primary Data in San Jose.
  2. It is parallel like other classic but proprietary examples even well adopted ones like Lustre. So in terms of performance, it should deliver high numbers.
  3. It is purely software like a few other approaches and doesn't require any additional hardware element or specific component.
As AI workloads consider tons of clients, thousands of GPUs and data silos and instances, having the capability to trigger and aggregate I/O operations is paramount. This is the goal and design objectives of pNFS, doing parallel I/Os, in a standard way. For the architecture displayed below, 3 entities exist: the consumer group at the top who are client machines with CPUs and GPUs that need to process data, the storage group represented by all NVMe units serving data to consumer via NFSv3 and on the side, a cluster of metadata servers with replication enabled, exposing the logic on the file system layout and answering clients requests. In a nutshell, this asymmetric model is well known in the HPC/AI domain. 

And the beauty is that Hammerspace is able to use at the same time its global file services model fueled by a global namespace capability with its advanced data placement feature to group various media and storage sub-systems.

It is also the time for Hammerspace to share its contribution to the world as the engineering team is an active developer of pNFS, more globally NFS v4.2, and the designer of a new approach coupling NFS and SSD. It is explained on this paper.

Now this reference architecture has to be deployed, it would be interested to see the pace of the adoption in the next few months. As mentioned at the beginning, Hammerspace is very good example of U3.

We'll learn more about this during the coming IT Press Tour in January in California.

Share:

0 commentaires: