Tuesday, December 08, 2020

StorPool deserves an update

StorPool, an European leader in block software-defined storage, joined The IT Press Tour for the second time. The first time was in June 2014 after I discovered the company in 2013 and shared with the world a first blog as nobody heard about them before.

StorPool is a real SDS if you consider my definition: "SDS transforms a rack of (classic and standard) servers (with internal disks) into a large storage farm. You can find on the market block, file or object storage as it is essentially how the storage entity is exposed outside". In other words, take 3 servers, install Linux and StorPool software and you get an high performance and resilient storage array.

The team develops probably one of the most comprehensive distributed block software layer, others are ScaleIO, acquired by EMC, Datera, Kaminario/Silk and more recently Lightbits Labs or Excelero to list a few.

With a minimum if 3 nodes, the team implements a very rich block storage offering:

  • special IP around on-disk format (CoW model with 4k aligned to the page size of Linux), protocols, client layer...
  • exposed as a block device via iSCSI or the StorPool linux client
  • active/active controller, dedicated or shared volumes across applications servers
  • targeting primary storage needs with HDD and flash
  • superior performance vs. local storage
  • ready for bare-metal or cloud-native applications deployed on KVM, Kubernetes, VMware and Hyper-V
  • 3x replication, snapshots, async remote copy and data integrity...
  • "Run from backup" mode and backup streaming
  • a full API REST approach
  • integration with Ansible
  • in decoupled topology or converged mode
  • and an intuitive UI for management and monitoring.

In a nutshell, the product delivers outstanding performance levels, look at these numbers:

  • Latency < 100μs measured at the VM level
  • Throughput > 1M IOPS per server with linearity with 250k IOPS per CPU core

One element that continues to surprise me is that StorPool doesn't support erasure coding but just replication. Replication is a good mechanism, no doubt about it, but erasure coding at large scale and large files is also interesting. It has demonstrated some huge impact and some others players offer this. So it invites me to think about a potential difficulty in their design to add EC or they don't receive such requests. But why? as value is better durability, better storage size efficiency and lower costs. Let's do simple math to compare. If you consider 500TB of usable storage, 3 copies mean 1500TB of raw storage with an overhead of 200% (x3) and an efficiency ratio of 33%. If you compare with EC with 6 data fragments and 2 parities fragments, noted EC 6+2, you obtain 33% overhead and 75% of efficiency. Of course I can even add one more parity to reach 6+3. So it means we can do almost similar durability with just 667TB or 42 HDDs. Let's put price in the equation and consider a common 16TB SATA internal drive on Amazon at $400. The difference in capacity is 833TB (500 x 3 - 500 x 1.33) which represents 52 HDDs and $20,800. And here we don't count the energy, the chassis size or number of nodes... as we reduce by 52 globally the 94 initial drives number (1500 = 94 x 16). The total 1500TB costs here $37,600 dropped to $16,800. With that StorPool would become unbeatable...

The team focuses on performance and achieves impressive numbers available publicly with different benchmark tools and environments. It explains why the adoption is started for a few years confirmed by the profitability of the company.

As a business, StorPool sells via a mix of direct and resellers and the French ISP Iguane Solutions is a customer for instance. This cloud and internet providers represent a key strategic direction for the company for several years and the footprint there is significant.

We'll continue to follow StorPool, again one of the key European storage innovator in block SDS.

Share:

0 commentaires: