Building an Enterprise Grade Distributed Online Analytics Platform

In this course, you'll develop an understanding for analytics capabilities, and you'll learn how to build a full-blown, wholistic, distributed analytics system using Kafka, Cassandra, Storm, and Elasticsearch.
Course info
Rating
(44)
Level
Intermediate
Updated
May 8, 2017
Duration
3h 46m
Table of contents
Course Overview
Introduction to Online BigData Analytics
Utilizing Apache Kafka as a Data Backbone
Introducing Distributed Computation with Apache Storm
Integrating Apache Cassandra as Our Distributed Database
Gathering Insights with Elasticsearch
Summary
Description
Course info
Rating
(44)
Level
Intermediate
Updated
May 8, 2017
Duration
3h 46m
Description

In this course, Building an Enterprise Grade Distributed Online Analytics Platform, you'll learn how to build a full-blown distributed analytics system using Kafka, Cassandra, Storm, and Elasticsearch. First, you'll begin by understanding what is online analytics and how it differs from offline analytics. You'll further discuss and analyze the parts of a modern online analytics system, including the data backbone, storage, processing, and insight generation. Next, you'll develop an understanding of your choice of technology, its features, and why it was chosen for a specific task. Finally, you'll explore how to properly integrate the technology into your solution in a manner that's most beneficial. Each technology you use will be placed under an observant eye, and you'll see how each technology provides scalability, fault tolerance, and most importantly how it contributes in achieving the functionality you desire. By the end of this course, you'll be ready to immediately enrich your enterprise with amazing analytics capabilities.

About the author
About the author

Kobi Hikri is a self-proclaimed software addict, providing a diverse range of professional software consultancy services for commercial clients worldwide. His commercial client-list spans a diverse range of industries, including: banking, human resource management, textual data analysis, image processing, and forensics. Kobi appreciates simple, elegant software architecture.

More from the author
Getting Started with OpenCV in .NET
Beginner
1h 27m
27 Jun 2015
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi everyone. My name is Kobi Hikri, and I would like to welcome you to my course, Building an Enterprise Grade Distributed Online Analytics Platform. I am a consultant for software architecture and development and have done that for over a decade now with various enterprises and at different scales. Are you in need of providing a scalable, fault tolerant, analytical solution for your large amount of incoming data? Do you need to gather insights about your data close to its creation time? In this course we are going to do just that. This course focuses on building an enterprise grade system which will be a composition of the following major topics: the data backbone as an entry point to our system implemented with the amazing Apache Kafka, the computation layer for which we will use Apache Storm, the storage layer which will implemented with Apache Cassandra, an insights engine provided by the most popular Elasticsearch. By the end of this course, you will have a clear understanding of what an online analytics platform looks like and be able to build one on your own. Before beginning the course, you should be comfortable with basic software engineering terms such as scalability and fault tolerance. A prior experience with Linux based operating systems will also help, though not mandatory. I hope you will join me on this journey to learn online Data Analytics with the Building an Enterprise Grade Distributed Online Analytics Platforms course at Pluralsight.

Introducing Distributed Computation with Apache Storm
Hi, and welcome to Pluralsight. I am Kobi Hikri, and this module is Introducing Distributed Computation with Apache Storm. In the previous modules, we've discussed and began understanding the nature of the domain we are dealing with. The domain of big data has a lot to do with properly distributing our hardware and software resources. We should do so in order to be able to process big datasets, where big refers to both the rate in which the datasets flow into our system, and to the amount of data we need to observe in order to gather insights. In this module, we will further extend the discussion of this course, and introduce an enterprise grade distributed computation layer, suited for an online analytics platform. The technology in which we will implement the computation layer will be Apache Storm, which is the core of this module. We will begin with a bird's eye view of the technology. We will then understand how an Apache Storm cluster is designed. Following that, we will be able to go deeper into our understanding of the technology, and we will discuss how Apache Storm abstractions work in our favor, in order to perform distributed computations. Once we've got a good understanding of what is Apache Storm, and how it works, we will move on to deploying a real Apache Storm cluster. To conclude the module, we will utilize the Apache Storm cluster we've just built, and integrate it with our distributed analytics system. We've got a lot to cover, so without further ado, let's get started.

Integrating Apache Cassandra as Our Distributed Database
Hi and welcome to Pluralsight. I am Kobi Hikri and this module is Integrating Apache Cassandra as Our Distributed Database. At this point in the course, I hope both you and I are on the same page and can agree that the core function of the system we are building is to manage data. By manage, we refer to retrieving data from data backbone, doing some work with it or on it in the computation layer while both consuming data and producing data to our storage layer. And the storage layer is the topic of this module. In this module, we will analyze our requirements from a Distributed Storage so that it fits well in an online analytics system. We will then begin our examination of Apache Cassandra. As a starting point, we will understand the type of database Cassandra is. We will also discuss the Tradeoffs we've made by making our choice. Moving forward, we will begin introducing Abstraction Terms used in Cassandra. These will be the Deployment abstraction terms as well as Data organization abstraction terms. Getting this far, we will be ready to build a real-life Multi-Datacenter Cassandra Cluster and so we shall do. At last, we will integrate our Cassandra-based storage layer with our online analytics system. Let's get started!

Gathering Insights with Elasticsearch
Hi, and welcome to Pluralsight. I am Kobi Hikri and this module is gathering insights with Elasticsearch. Up to this point we knew exactly what kind of insights we are looking for. They were strictly defined and we tailored our solution, especially the computation layer and the storage layer, to allow us gather such insights in an online matter. However, in reality, such strictness doesn't always represent all the enterprise requires. It might suddenly be the case that some ad-hoc analytical questions need to be asked. And that brings us to the final role in our analytic system: the insights engine. In this module we will discuss the role of an insights engine for our system. In particular, as a complementary part. We will begin by understanding the use cases an insights engine aims at solving. Moving forward, we will get familiar with Elasticsearch as our choice for an insights engine. At this point we will get introduced to Elasticsearch architecture and design and understand how the technology is able to work for us as a crucial analytical complementary. With our theoretical base laid, we will move onto deploying a distributed Elasticsearch cluster. To conclude this module we will integrate Elasticsearch with our online analytics system. Without further ado, let's get started.