Tuesday, September 17, 2024

MooseFS, a confidential pioneer that deserves a try

Distributed file system is a hot topic for many years with many initiatives both commercial and open source. In the open source world, it exists lots of projects but MooseFS has a special identity with this 20 years old, it is one of the pioneers in the domain. We can also listed Gluster, Lustre, Ceph, SaunaFS, RozoFS, BeeGFS, OrangeFS or XtreemFS and some commercial offering from Weka, Quobyte or Panasas to name a few.

Many of these had their roots with the famous Google file system paper published in 2003. The philosophy relies on 2 elements: a backing store fueled by a series of data servers what is called here chunks servers, a directory engine for data placement, locking... controlled by a central servers named here metadata server and one of them is the leader coupled with followers and finally the client layer which represents the access layer, where the file system is exposed. Chunk servers are running Linux and their local disks are formatted with classic disk file systems such as xfs, ext2 or zfs and each chunk is a file within a tree structure, with its name associated with the chunk reference. Clients can run Linux, MacOS or Windows supporting various flavors of FUSE and receive a software agent that exposes a Posix semantic and established communication with metadata and data servers. For Windows for instance, MooseFS leverages Dokany as a FUSE wrapper. As these machines access data directly and they operate as a standard machine, they usually run applications.

This is a real architecture. For MooseFS everything has started in 2005 when Gemius initiated an internal file system project. Some clusters deployed at the time continue to run and deliver services today without interruption. Being fully hardware agnostic, MooseFS is a perfect example of a Software-Defined Storage.


The other important to consider is that MooseFS is not a NAS even if clients can expose NFS and SMB via respectively Ganesha and Samba extensions and even S3 with MinIO to continue on the full open source dynamism. The product is able to expose a block interface and I have to say my surprise even if I understand the desire for the team to address a vast variety of needs. For sizing information, MooseFS supports cluster up to 16EB for 2 billion files.


The team had 4 main goals that are well illustrated by core features in the product:

  1. Scalability by multiplying servers, capacity is delivered,
  2. Performance by adding and processing I/O in parallel between clients and chunk servers,
  3. Reliability by utilizing replication then erasure coding
  4. and TCO via the support of any commodity hardware.
To give details on how data is access from the client machine, it's important to understand that below 64MB, a client sends data to only one server and above that level, data is chunked and distributes to different data servers. All this operates in parallel and we can qualify MooseFS as a parallel file system as well beyond to be a distributed one. In other words a distributed file system can be parallel or not but a parallel file system is for sure distributed.

For erasure coding based on Reed Solomon, the mechanism is controlled by chunk servers and works in the background. First, data is written to chunk servers as fast as possible without any protection. These servers then trigger replication across severs to provide a minimal protection and later they initiate the erasure coding phase wish the split of data, parities calculation, and redistribution of the data with all placement information sent to the meta data server for future access by clients. The stripe unit size seems to be 256kB.

Two editions exist for the product, a community edition with everything available on GitHub, full open source and free of charge, and a pro edition that is sold based on raw capacity, presenting some unique enterprise features like advanced tiering or snapshots. The cluster is managed via a CLI, a web Gui and presents an API.

The company behind MooseFS is based in Warsaw, Poland, and is privately held and profitable. Its revenue comes from selling pro licenses sold as a lifetime license, no subscription exists so far, and support and many users started by using the community edition and then expanded ot the pro one. In terms of use cases or vertical industries, the team is very open and doesn't really target some specific domains as they promote an universal approach, they more rely on the partner to "verticalize" the offering.

During the recent meeting during The IT Press Tour in Istanbul, Turkey, the team has launched the Community Edition 4.0, several years after the pro. This version shared 97% of the code of the pro version, offers manual failover, limited but good enough for many configuration erasure coding with a 8+1 model, tiering.

The MooseFS team will be at SuperComputing in Atlanta mid-November, perfect place to continue to talk, discover the solution and start to evaluate the solution.
Share:

0 commentaires: