Mar 17, 2017

CrateDB, a new DB for IoT

Crate.IO (, the European developer of the fast growing database CrateDB, organized a very interesting session with the IT Press Tour group last week in San Francisco.
The company targets the IoT market segment with the 4 famous V identified as new data challenges:
  1. Velocity: 100B+ data points per day, for instance a connected vehicle generates 2k readings per second and sensors sampling at 100kHz,
  2. Volume: accumulation and retention of data with multiple dozens of TB to query,
  3. Variety: multiple data types needed with unstructured, structured, geospatial, time series, JSON...
  4. Variability: essentially the associated elasticity.
Many users has chosen "classic" DB such MySQL, PostgreSQL or Oracle for structured data, others have tried NoSQL solutions like Cassandra, Splunk, Elastic or MongoDB but have rapidly reached a wall. The first group has scalability and speed limits, the second one is pretty difficult to adopt especially if the application is already designed around a relational model and integrated with SQL.

Despite the database jungle illustrated by the bad visual aspect of the picture made by 451 Research a few years ago, we understand the complexity of this crowded segment and it seems that users have real difficulties to select the right approach. Crate.IO has realized these difficulties and has designed a best of both worlds DB with SQL to simplify integration and protect investment in code, people and models and has decided to leverage scale-out model to accept huge volume of data and I/O operations. In other words, CrateDB is an open source SQL DB on NoSQL architecture designed for real-time machine data. The solution offers a SQL interface and Search capabilities. CrateDB has several advantages over competition:
  • Super easy to use and integrate in the application stack,
  • Simply scalable,
  • Query and data type versatility,
  • Real-time performance and
  • of course open source

The product has already more than 1 million downloads, 1000+ clustered globally deployed worldwide at 50+ customers in production. 3 case studies were detailed: Gantner Instruments for nuclear, wind and solar real time sensor data with 1,000 channels at 10kHz, Skyhigh Networks with millions of users, billions of inserts/day (peak 100k/sec) with tens of thousands of concurrent TCP connections and have replaced MySQL and Elastic and finally Space Time Insight for power grid status visualization supporting 200,000 rows/sec and several TB of data globally. In term of architecture, CrateDB has many interesting design characteristics such masterless, auto-sharding, dynamic schema, in-memory speed among others illustrated by the following slide.

2017 will be interesting for Crate.IO and the company would need a new additional VC round to accelerate US market landing and market penetration. I invite you to follow carefully the company in the next few months.

No comments :