Jun 7, 2016

Big Data Services reinvented (, takes advantage of the Spark Summit this week in San Francisco, to unveil the result of active development effort for at least 2 years. I had the privilege yesterday to discuss with Asaf Somekh, founder and CEO, and Yaron Haviv, founder and CTO, here about their mission and solution approach.

First a few informations about the company, was founded in 2014 in Tel-Aviv by Asaf Somekh, Yaron Haviv and Yaron Segev, founder of XtremIO, and raised so far $15M in Series A from Magma Venture Partners and Jerusalem Venture Partners. Today the company has 40+ employees all with strong background in fast networks and storage with experience at XIV, XtremIOVoltaire and Mellanox to name a few. And for the name, think about the Iguazu falls and you get the idea about the data deluge challenge the company wishes to address and solve.
The founders of the company having worked in demanding environment have realized that Big Data Services suffer from lack of simplicity, always based on classic layers not optimized for big data processing for a complex data pipeline outcome. Data and data tools are siloed as users and vendors finally just unified and glued without rethinking the I/O stack and think about a radical new approach to remove performance barriers and bottleneck at various points in the architecture. took this challenge super seriously to design, build and develop a new approach, perfectly aligned with big data challenges and able to be integrated with all famous big data processing tools and products. In fact, some users have started to manage this challenge internally with lots of difficulties especially in people skills and some others refuse to deal with all this horizontal complexity of integration of data pipelining solutions (data movement, several copies, ETL and security). So 2 solutions finally exist and of course the obvious first one is to consider Amazon AWS or Microsoft Azure approach, everything is externalized, but still complex, with unpredictable performance, not optimized with multiple copies that impact timing and above all very expensive. The second approach is who redefines all layers with only one copy, super fast pipelining capability with cost optimization in mind. realized that many computers related aspects have made significant progress over last decade but the storage software stack is still based on things linked to the HDD world. And above all, at different layers, you find piece of software doing pretty similar things, consuming lots of cpu cycles.

First defines a common data repository with a set of data services that sit above this data lake. The solution is fully storage agnostic and provides multiple data access methods (file, objects, streams, HDFS, KV, Records, new APIs...) to be integrated to now classic big data products such Hadoop, Spark or ElasticSearch. You can see the product as a super fast, scalable, universal and hyper resilient access and data layer between consumers (big data applications and users) and back-end storage.

This what named a 3 layered architecture with Application & APIs, Data Services, and Media:
  • The Applications & APIs layer is stateless, so failure resistance is delivered by nature, model is extensible and elastic, and it commits all updates to a zero-latency and concurrent storage. It is responsible to map and virtualize standard files, objects, streams, or NoSQL APIs to the common data services. Also, key in approach and in big data world, changes are immediately visible in a consistent fashion.
  • The second step provides key data processing with inspection, indexation, compression and storage with a intelligent and efficient way on low-latency non-volatile memory or fast NVMe flash drives. Then data can be moved to the appropriate storage tier. has introduced a data container notion to store objects that provides consistency, durability and availability.
  • The last service is the Media one with K/V application-aware API mapped directly to different types of storage, including NV memory, flash, block, file, or cloud.
In term of adoption model, promotes a self service approach that allows rapid deployment and production-ready systems. has made a significant industry progress and takes an immediate leadership in that segment without real competition. Congratulations to the team.
To attend also a live introduction of this approach, Yaron Haviv, CTO of, will be interviewed at TheCube this afternoon June 7 at 3pm PST.

No comments :