Business Intelligence for the Real Time Enterprise

September 1, 2014 - Hangzhou, China

Invited Industrial Talks

"Building Analytics Engines for the Big Data Age"


Dr. Badrish Chandramouli

Microsoft Research


With the increasing volumes of data being acquired, stored, and processed in the Cloud today, online and offline analytics to derive value from such data has become very important. In this talk, we explore two dimensions of analytics: (1) the diversity, in terms of the settings and types of analytics found in today's Cloud application workflows; and (2) the cost of performing such analytics over large datasets. Specifically, this talk overviews how a temporal streaming engine can seamlessly support a diverse range of analytics, and can enable progressive queries with early results to reduce cost. We then describe Trill - a new streaming engine that we have architected as a library to support embedded execution within Cloud applications and distributed fabrics. What sets Trill apart is its high performance across the latency spectrum: for example, Trill's throughput for streaming queries is 2-4 orders of magnitude higher than comparable streaming engines. More information on Trill can be found at http://aka.ms/trill.

"Optimistic Failure Recovery in Distributed Stream Processing"


Dr. Qiming Chen

HP Labs


Support transaction property and fault-tolerance is the key to applying stream processing to industry-scale applications; however the corresponding latency overhead must be minimized for accommodating real-time analytics. In this work we have developed the backtrack based and the window oriented failure recovery mechanisms which are more optimistic than the previous approaches thus can significantly enhance the overall performance. We also tackled the hard problems found in implementing a transactional layer on-top of an existing stream processing platform, including how to keep track the physical messaging channels for resending the possible missing tuples in failure recovery, and how to ensure resending tuples not to disrupt the regular order of data flow. We have implemented the proposed mechanisms on Fontainebleau, the distributed stream analytics infrastructure we built on top of the open-sourced Storm platform. Our experiment results reveal the novelty of the proposed technologies and the feasibility to support fault-tolerance with minimal latency overhead for real-time stream processing.