Arcitecta develops an universal data management solution ~ File Storage Technologies (FST)

Arcitecta, a pioneer in unstructured data management, joined the IT Press Tour a few days ago for the second time. The first time we met the company was in 2015 so a pretty long time ago. As the market climate and needs evolved a lot in 8 years, it was the perfect opportunity to get an update on the company and product with the US team and Jason Lohrey, its founder, today CEO and CTO, remotely.

The company was founded in 1998 in Melbourne, Australia, and it is so far an Australian company. As of today the company is self funded and profitable. The foundational idea of the company is to manage data at scale in various forms with its Mediaflux product.

The firm targets large projects with large volume of data, often distributed stored on various systems and storage units. Mediaflux operates as a universal data management solution moving data to the right tier while maintaining the access, thus the product offers some tiering capabilities and reduce globally cost of the data storage environment.

The product relies on a specific database named XODB designed by the company that stores all metadata collected and discovered through a transparent mechanism. By metadata here I mean system, user defined, embedded or application oriented ones all protected by XODB. So far XODB scales up to 100 billions of records and is able to keep track of versions. It also adds some security features and offers a fast WAN data transfer capability.

To be fully transparent, the team has developed it own NFS, SMB and S3 service layer and doesn't really on standard service when scalable file service is needed.

Mediaflux sits in the data path and clearly there are advantages in both approaches out of or in the data path. The product confirms that at scale, scanning file system is not possible, the solution is clearly to feed an external database like GUFI, developed at Los Alamos, does. We see on the market some other solutions leveraging "classic" databases like MySQL or PostgreSQL but again, at scale, it turns out that this database layer represents a performance bottleneck. We learnt from this session that an XODB entry consumes 1kB and 1 billion files information represents 1TB of metadata database so it fits perfectly in current SSDs. largely available on the market. We can even think that this XODB could be implemented as an in-memory database if needed.

The team believes that real data management must reside in the data path. This is clearly a debate, but what is true is the following: to manage correctly unstructured data, the solution has to manage extensively metadata.

Mediaflux XODB is one instance, if the configuration is small, 1 node is used and operates itself the database but if other nodes are added to build a scale-out approach, there is still only one database with extra access nodes connected to that special database node. This is also why the team has developed its own protocols like NFS, SMB or S3 API seating in the data path as they need this implicit connection to the XODB to feed the experience, understand where the data reside after potential moves... One of the good effect of such approach is the single global namespace placed across all access nodes and database node that transparently expose the same file storage view to users and applications. These data can be accessed by any protocols and the same content is served without the need to be copied across sub-zones.

In terms of storage back-end, Mediaflux supports file servers, NAS, tape, cloud and object storage. It clearly reminds me what StrongLink offers also for some years even of the company situation seems to be at a vital moment. But in the past we saw some people moving from Arcitecta to StrongBox, the previous name of StrongLink, and honestly the global pitch is very similar even if the reality is different.

This versioning file management allows a transparent point in time model when you need to restore and navigate within the file namespace.

One case study shown during the session is the unification of 30 independent ZFS NAS servers, representing a total of 2 billion files for 6PB of data, with a horizontal access layers inviting the users to create some policies to move, migrate, tier, group data among these servers population. Before Mediaflux, it was impossible let's say difficult to move data between servers, create a transparent access and share a global view of data. But at the same time, it is not the role of ZFS playing as a disk even local file system, delivering a very good experience at that level. Again the idea is not to replace these entities but glue all these via an access layer above filer servers. And some other services have been added such as DR for files copy to a remote location or archiving to a AWS S3 Deep Archive.

On the pricing model, the firm uses a unique users license model and not a capacity classic one. This unique user aspect is based on a window of 30 minutes. All features are included and there is no add-ons or options.

As already mentioned, the team targets very large data sets with billions of files and try to be involved in demanding data management projects worldwide with specific partners like Spectra Logic, Dell and others as the company doesn't have any sales reps. This choice explains also their limited market reach as they rely on partners who have their own agenda.

The company is confidential and needs to become more visible, there is a disconnect between what Arcitecta offers with a very comprehensive solution and the identification of the company globally. This is why Coldago Research positions Arcitecta as a specialist in its last Map 2022 for Unstructured Data management available here.

Wednesday, June 14, 2023

Arcitecta develops an universal data management solution

0 commentaires: