October 6, 2016

LizardFS, an Open Source SDS again

LizardFS (www.lizardfs.com), name of an open source SDS developed by Skytechnology from Poland, continues to incrementally penetrate the market. Nothing really new with this product except it perfectly illustrates the dynamism of this segment and the penetration of open source and SDS and their close link. The project came from MooseFS that tries to clone Google File System.
The philosophy is the same as I preached it for more than a decade ago based on commodity server and local disks: shared-nothing, distributed and asymmetric in that case. LizardFS is POSIX compliant and distributed under the GPLv3 license model. Like many hyperscale products and for performance reasons, LizardsFS considers a asymmetric model with 3 components:
  • Metadata servers (MDS) are deployed with a minimum of 2 servers working in a master/slave manner but many secondary/standby can be deployed. You can also consider metadata loggers or metaloggers to protect your metadata activity. There information can be used when all metadata servers are lost.
  • Chunkservers aka data servers store the data and each file submitted to the system is segmented in 64MB. A minimum of 2 chunkservers is recommended for obvious reasons.
  • Clients running a LizardFS layer - available for Windows (Wow good) and of course Linux - to implement the logic and interactions between metadata servers and chunkservers.
In term of data protection, LizardFS offers replication and erasure coding. Replication works at the file or directory level and it is implemented in 2 modes: standard and XOR copies. The number of copies reflect the number of copies each chunk should have. The replica placement is controlled by the master i.e the active metadata server. The XOR copies is a bit different as it introduces an additional parity chunk essentially for cost efficiency. LizardFS 3.10.0 has made great progress with the introduction of an erasure coding mode that pushes even further the protection model. Next model is the geo-replication where data can be written in different geo location very helpful for disaster recovery but also for collaboration across and between teams.
Besides common characteristics shared with more than 40 SDS products on the market, LizardFS has some interesting features: multi data centers support, transparent trash bin, snapshots, quality of service, quotas and monitoring tools. The product is available for CentOS, Red Hat Enterprise Linux, Ubuntu LTS and Debian but potentially deployable also elsewhere. I invite you to visit LizardFS on GitHub at https://github.com/lizardfs/lizardfs. Good luck.

0 commentaires: