Oct 8, 2015

Cloudera introduces a new storage engine for Hadoop

White paper available
when you click on the Kudu logo
Cloudera (www.cloudera.com), one of the top leaders behind the Hadoop wave, announced recently Kudu, co-developed with Intel, as a new storage engine based on a column-base database model. This announcement illustrates the rationale for a very efficient storage engine for Big Data and Analytics project. It represents a 3rd choice with HDFS and HBase. Unlike HBase who runs on top of HDFS, Kudu is a native layer for Hadoop with real-time IO capabilities, able to do updates in place, perfectly aligned with streaming environments. Deployments are often complex based on the usage of the Hadoop platform, HBase is fantastic for small queries with updates in place and HDFS is more dedicated for large datasets. Until now, users must consider and deploy hybrid configurations with both data storage service. The other motivation is linked to the IT environment by itself with more GB of RAM per server and of course the large adoption of Flash and SSD.
Kudu considers multiple objectives when it was designed:
- performance for both scan and random access,
- CPU efficiency,
- IO efficiency,
- Updates in place and
- active-active replicated data sets.
3 years after the decision to define and build a new storage layer, Kudu is ready, fully open source available via Apache Software License 2.0. It's interesting to see that some companies already participate in the development effort such AtScale, Intel as already said, Splice Machine, Xiaomi and Zoomdata. Cloudera thinks about donating this effort to the ASF to continue the momentum around the platform.

No comments :