Creating Your First Big Data Hadoop Cluster Using Cloudera CDH

Data by itself has no meaning, it is what you do with it that counts. In this course, you'll fast track to Hadoop & Big Data with the Cloudera QuickStart VM and then you'll learn how to set up a Hadoop cluster with Cloudera CDH.
Course info
Rating
(59)
Level
Intermediate
Updated
Dec 2, 2016
Duration
1h 33m
Table of contents
Description
Course info
Rating
(59)
Level
Intermediate
Updated
Dec 2, 2016
Duration
1h 33m
Description

"Ask Bigger Questions" is Cloudera's vision. You may not be familiar with this phrase, but you're likely familiar with "Knowledge is Power". To get knowledge you need to analyze and understand huge amounts of structured and unstructured data - Big Data. In this course, Creating Your First Big Data Hadoop Cluster Using Cloudera CDH, you'll get started on Big Data with Cloudera, taking your first steps with Hadoop using a pseudo cluster and then moving on to set up our own cluster using CDH, which stands for Cloudera's Distribution including Hadoop. First, you'll explore the case for Hadoop, Big Data, and Cloudera. Next, you'll learn about the fast track to Big Data with Cloudera's QuickStart VM and you'll also learn how to create a visualization environment with VirtualBox. Then, you'll discover how to create a Linux clean cluster with CentOS. Finally, you'll follow the steps to install and configure a cluster with the help of Cloudera Manager. By the end of this course, you'll have a Hadoop cluster, and you'll be ready to start your journey to Big Data.

About the author
About the author

Xavier is very passionate about teaching, helping others understand search and Big Data. He is also an entrepreneur, project manager, technical author, trainer, and holds a few certifications with Cloudera, Microsoft, and the Scrum Alliance, along with being a Microsoft MVP.

More from the author
Importing Data: Python Data Playbook
Beginner
1h 35m
Nov 17, 2018
More courses by Xavier Morera
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Xavier Morera, and I'm passionate about teaching. I would like to welcome you to my course, Creating Your First Big Data Hadoop Cluster Using Cloudera CDH. Did you know that by some estimates there are over 2. 5 quintillion bytes of data created daily? That's a lot of data, but beyond that, this data is very valuable as there are hidden insights and trends that can help save lives, make money, or help improve the world we live in for the best. In this course, we're going to get started on big data with Cloudera taking our first steps with Hadoop using a pseudo cluster, and then, we'll move on to set up our own cluster using CDH, which stands for Cloudera's distribution including Hadoop. Some of the major topics that we will cover include the case for Hadoop, big data, and Cloudera, the fast track to big data with Cloudera's QuickStart VM, learn how to create a virtualization environment with VirtualBox, then create a Linux cluster with CentOS, and finally, we will follow the steps to install and configure a cluster with the help of Cloudera Manager. By the end of this course you will have a Hadoop cluster to start your journey to big data. I hope you'll join me on this course to learn about Hadoop and Cloudera with the Creating Your First Big Data Hadoop Cluster Using Cloudera CDH course, at Pluralsight.

Fast Track: Getting Started with the Cloudera QuickStart VM
Fast Track: Getting Started with the Cloudera QuickStart VM. Big data usually means a lot of machines working together, bringing the computation to the data, and getting insights that otherwise might not be possible. But before thinking big, it is required to have the knowledge of how the multiple projects can to be used to solve a problem, should I use big, Hive, or Spark? When can I use Solr? Is there anything that I need to know about ZooKeeper? What is Hue? Should I save my data to HDFS, or can I use something else? Should I use Scala, Java, or Python? Well, that's a lot, but let's say that you're ready to get started, and you need to practice and learn, and you have to do it now, you just need to roll up your sleeves and get cracking. Oh boy, do I have good news for you. The Cloudera QuickStart VM is what you're looking for. With this VM you can learn Hadoop, try new ideas, test big data jobs, demo your application, and all from a single virtual machine, which is just one download away, in what's called a pseudo cluster. But what exactly is a pseudo cluster? And let me answer this question by explaining the difference between a cluster and a pseudo cluster. This will make it very clear to understand what the QuickStart VM is. In a cluster you have many machines working together, where you install multiple different services, and specific or dedicated roles. In a pseudo cluster you have a single machine that has all these services and roles installed, and they work together so that you can get started right away with big data. Long story short, the QuickStart VM is, your fast track to big data with Cloudera.

Installing Your First Big Data Cluster Using Cloudera CDH
Installing Your First Big Data Cluster Using Cloudera CDH. This is the main module of interest of this training. It is what we call the meat and potatoes. In this module we will learn about the different ways of installing Hadoop using Cloudera. As expected, there are multiple different ways of installing a cluster, with several factors that need to be taken into consideration, based on your preference or intention. And thus, we have the three installation paths, A, B, and C. Let's say that you want to set up a development cluster, so you can select a particular installation path because you're short on time. But in other cases, you might want a production cluster, so you need to go about a different path. At this point, we will just cover a little bit of theory to understand what are the three installation paths, and then we will go straight to the point and get a cluster up and running.