Yahoo insists on 3 dimensions when selecting SDS:
- Cost trade-off,
- Access Methods what Yahoo calls Interfaces and
- Storage abstractions with Block, File or Objects.
Yahoo names its approach Cloud Object Storage or COS and started with Flikr with a multi-PB configuration. In 2015, COS support more than 100 PB for Flikr, Yahoo Mail and Tumblr. Wow impressive.
After a deep study between OpenStack Swift, Ceph and a few commercial solutions, Yahoo selected Ceph. The configuration chosen by Yahoo is a federation of Clusters, all Ceph based, that gives to Yahoo the level of flexibility needed by the company. In addition to that, they develop and embed into applications their on hashing algorithm to place data on the right cluster. Each Ceph Cluster is 3PB raw giving a simple and fast recovery and could be seen as the increment size within the supercluster. In addition, Yahoo prefers Erasure Coding in a 8+3 mode. In term fo storage media supported, Yahoo uses SSD, HDD and SMR.
Yahoo also contributes to the project and changes a few things to boost the response time and reduce latency. For instance, in S3, a bucket that stores object in Amazon terminology belongs to 1 node, Yahoo changes that and the bucket is now sharded across multiple node to increase parallelism. Next deployments will be around geo-replication for business continuity, small object configurations and lifecycle management. Very real use case for Ceph at a very large scale. Impressive.