Thursday, July 06, 2023

An easy way to access files from S3

Discovered during SuperComputing 22 in Dallas, TX, I invited CunoFS team to the recent 51st edition of The IT Press Tour in Berlin. It was a very interesting session.

In fact, the product is developed by PetaGene, an UK-based ISV dedicated to life sciences data challenges.

The basic problem still is the same since AWS release the S3 API to access its object storage service, how could you, I mean an application, interact with a storage system exposed with that interface if that application speaks and supports only file semantics. It exists tons of such applications where users can’t change their I/O behaviors.

I know, we know, plenty of services, from entry level to enterprise ones, free sometimes, that can do that objet-file translation for more than a decade, like Arcitecta, CTera, Hammerspace, Panzura, Nasuni, Spectra Logic, Tiger Technology or XenData to name a few and even AWS announced such capabilities a few weeks ago. Some can be considered as gateway, others as side engine, but it’s a bit different here as CunoFS is a client software installed on every systems who need to access this data. No gateway concept exists here in the architecture and no network file system involved as well. Everything realized at the file level is made locally on the client that appears to applications like a local file system. CunoFS supports of course POSIX semantic for user and node and as independent clients doesn’t suffer from inter-node communication, consensus or synchronization.

Now having said that, this service layer offers dramatic performance levels that are really different from other implementations. Results speak for themselves with 56.9Gb/s in read mode and 52.3Gb/s in wrote mode for copying files vs. EFS, FSx, S3fs or other solutions. Same time for time to copy - read and write - or with aggregated throughput beating the linearity. CunoFS team lists EBS in their configuration comparisons, it should be listed with a file system coupled with it as EBS is a block storage service. I’m surprised to read on slide 9 on the presentation that file storage is based on RAID and object storage on erasure coding. Isilon was built on erasure coding since day 1 more than 2 decades ago and some object storage players, started later, offered only replication adding EC later. This is the case for Caringo, founded in 2005 now owned by DataCore, Cloudian or Scality founded in 2009.

It’s important to understand that object storage is a technology and an internal organization with a specific access method but an object-based access API like S3 doesn’t imply an object storage back-end, we find S3 today on top of almost everything.

CunoFS represents definitely an attractive solution for every users wishing to reach ease of use and performance with full POSIX compliance. Then it’s a question fo philosophy as a client software must installed on every machine and could be considered as intrusive and time consuming. For more details I invite you to check their web site but interestingly also the performance white paper published by Dell available here.

Share:

0 commentaires: