Big Data with Hadoop Spark Scala Kafka

About Course

Hadoop is a Big Data mechanism, which helps to store and process & analysis of unstructured data by using any commodity hardware. Hadoop is an open source software framework written in Java, which supports distributed application. It was introduced by Dough Cutting & Michael J. Cafarella in mid of 2006. Yahoo is the first commercial user of Hadoop(2008).

Hadoop works on two different generation Hadoop 1.0 & Hadoop 2.0 which, is based on YARN (yet another resource negotiator) architecture. Enterprises are now looking to leverage the big data environment require Big Data Architect who can design and build large-scale development and deployment of Hadoop applications.

Big Data is a collection of the huge or massive amount of data. We live in the data age. And it’s not easy to measure the total volume of data or to manage & process this enormous data. The flood of this Big Data is coming from different resources such as the New York stock exchange, Facebook, Twitter, AirCraft, Wallmart etc.

Apache Spark is a big data processing framework and its popularity lies in the fact that it is fast, easy to use and offers sophisticated solutions to data analysis. Its built-in modules for streaming, machine learning, SQL, and graph processing make it useful in diverse Industries like Banking, Insurance, Retail, Healthcare, and Manufacturing.

Completely understand Apache Hadoop Framework.
Learn to work with HDFS.
Discover how MapReduce works with data and processes it.
Design and develop big data applications using Hadoop Ecosystem.
Learn how YARN helps in managing resources into clusters.
Write as well as execute programs in YARN.
Fundamentals of Apache Spark and Scala
Difference between Spark and Hadoop
Implementing Spark on a cluster
Learning Scala programming language and its concepts
Scala Java interoperability

Stackodes Technologies

About Course

What Will You Learn?

Course Content

1. Hadoop Big Data

1.1 Big Data Introduction

1.2 Apache Hadoop and the Hadoop Ecosystem

1.3 Basic Java Overview for Hadoop

1.4 Getting started with Cloudera QuickStart VM

1.5 Apache Hadoop File Storage

1.6 Distributed Processing on an Apache Hadoop Cluster

1.7 MapReduce

2. Hadoop Ecosystem

2.1 Data Analysis using Pig

2.2 Data Analysis using Hive

2.3 Data Integration using Sqoop

2.4 Real time data streaming Flume

2.5 Hadoop No Sql database HBase

2.6 Zookeeper

3. Spark| Scala| Kafka

3.1 Scala Basics

3.2 Scala Programming

3.3 Programming with RDDs

3.4 Transforming Data with RDDs

3.5 Introduction to Pair RDDs

3.6 Spark SQL:DataFrames and Schemas

3.7 Querying tables using Spark SQL

3.8 Working with Datasets in Scala

3.9 Writing, Configuring, and Running Apache Spark Applications

3.10 Apache Spark Streaming: Introduction to D Streams

3.11 Apache Spark Streaming: Processing Multiple Batches

3.12 Apache Spark Streaming: Data Sources

3.13 Message Processing with Apache Kafka

3.14 Capturing Data with Apache Flume

3.15 Integrating Apache Flume and Apache Kafka

3.16 Hadoop on Google Cloud

3.17 Hadoop on AWS