March 23, 2016

Trifacta promotes Data Preparation before any processing

Trifacta (, leader in data preparation with the Data Wrangling approach, continues its rapid market penetration. We have met the executives team yesterday during the second day of The 18th edition of The IT Press Tour. Back to the genesis of the product, Trifacta has started at Berkeley and Stanford universities with the DataWrangler project from Joe Hellerstein and Sean Kandel among others. Trifacta, as a company, was founded pretty recently in 2012 and has raised so far $76.3M in 4 rounds from 4 investors. In addition to Joe Hellerstein and Sean Kandel, Jeffrey Heer joined the pair to launch the company having built a perfect trio very complementary to act fast in a rapid changing market.
So what's the mission of Trifacta ? In a nutshell, the company wishes to bridge raw data and analysis from an IT and Business point of view. The team has realized that 80% of the work in any data project is dedicated in preparing the data for analysis and not in the analysis process itself. In other words, reducing and optimizing this portion of the project would have a drastic impact on the overall project duration. This opportunity got sized by analysts and Gartner early 2016 predicts that the Self-Service Data Preparation Software Market will reach $1.9B by 2019 with a 16.6% CAGR. To solve that they build a data preparation platform named Trifacta Wrangler Enterprise able to manipulate tons of data in various format in a reasonable time. It is sold with an annual subscription with unlimited volumes and scalability. The product has a small companion - Trifacta Wrangler - limited in features and data size but largely enough to test it and evaluate it on your Windows or Mac machine. You can download and play with the product as I did, it's very simple, just have a CSV ready.

How does it work ? In fact, the Trifacta approach is very well explained when you understand what is behind their Data Wrangling term. It consists of 6 clear steps: Discovering, Structuring, Cleaning, Enriching, Validating and Publishing. All the magic exist behind the 6 levels that help the scenario to be built in order to run it on all data at a larger scale on an Hadoop cluster. Trifacta Wrangler works on a a sample of data from the live and real data and you simulate and modelize the potential result you expect on the total volume of data you wish to analyze. In fact, it reminds me what QbE - Query by Example - was for database 20 years ago. With this self-service approach, when the user is satisfied by the model built and the data exploration he finally established, he can easily apply this to his entire dataset.
Like Datameer, Trifacta experiences a rapid traction and the installed base is pretty impressive with more than 3000 companies and 10,000+ users. Among them, some famous names such GoPro, PepsiCo, Orange, LinkedIn, RBS or Zurich.
Trifacta also wrote an O'Reilly book to define, explain and detail their Data Wrangling approach, you can download the book here.
Data Preparation is must, especially with the huge volume of data accumulated every day. Better response time and accurate results must be delivered fast and Data Wrangling really helps enterprise to achieve this grail. To really touch Trifacta and feel the power of the solution, download Trifacta Wrangler, you will love it.

0 commentaires: