Big Data with Spark and Scala


  • Batch Timings :
  • Starting Date :

Course Overview

Apache Spark is a big data processing framework and its popularity lies in the fact that it is fast, easy to use and offers sophisticated solutions to data analysis. Its built-in modules for streaming, machine learning, SQL, and graph processing make it useful in diverse Industries like Banking, Insurance, Retail, Healthcare, and Manufacturing.

COURSE FEATURES

  • Resume & Interviews Preparation Support
  • Hands on Experience on Project.
  • 100 % Placement Assistance
  • Multiple Flexible Batches
  • Missed Sessions Covered
  • Practice Course Material

At the end of Big Data with Spark and Scala Training Course, Participants will be able to:

  • Fundamentals of Apache Spark and Scala
  • Difference between Spark and Hadoop
  • Implementing Spark on a cluster
  • Learning Scala programming language and its concepts
  • Writing Spark applications in Scala, Java and Python
  • Scala Java interoperability
  • Learn Storm Architecture and basic distributed concepts.
  • Learn Big Data features
  • Understand Legacy architecture of real time systems
  • Understand Logic Dynamics and Components in Storm
  • Learn the difference between Hadoop and Apache Spark
  • Learn Scala and the programming implementation in Scala
  • Implement Spark on cluster
  • Gain insight into functioning of Scala
  • Develop Real-life Storm Projects

Course Duration

  • Weekend: 8-10 weekends
  • Weekdays: 2 Months

Prerequisites :

  • Basic knowledge of database, SQL and query language can help.
  • Basic knowledge of object-oriented programming is enough.

Who Should Attend?

  • Software Engineers looking to upgrade Big Data skills
  • Data Engineers and ETL Developers
  • Data Scientists and Analytics Professionals
  • Graduates looking to make a career in Big Data

Course

1.1 Big Data Introduction

  • What is data, Types of data, what is big data?
  • Evolution of big data, Need for Big data Analytics
  • Sources of data, how to define big data using three V’s

1.2 Scala Basics

  • Functional programming basics
  • Scala Programming constructs walkthrough
  • Apache Spark Basics
  • What is Apache Spark?
  • Starting the Spark Shell
  • Using the Spark Shell
  • Introduction to Spark framework
  • Spark Architecture

1.3 Scala Programming

  • Why Scala?
  • Data Types,Variables
  • Operators
  • While Loop
  • Do While Loop
  • For Loop
  • Loop Control Statement
  • Control Structures
  • Functions,Comments
  • ArrayBuffer
  • Tuples
  • Map
  • Sets

1.4 Programming with RDDs

  • RDD Overview
  • RDD Data Sources
  • Creating and Saving RDDs
  • RDD Operations
  • Activity: Working with RDDs

1.5 Transforming Data with RDDs

  • Writing and Passing Transformation Functions
  • Transformation Execution
  • Use Cases with RDDs
  • Activity: Transforming Data using RDDs

 1.6 Introduction to Pair RDDs

  • Key-Value Pair RDDs
  • Map-Reduce with Pair RDDs
  • Other Pair RDD Operations
  • Activity: Joining Data Using Pair RDDs

1.7 Spark SQL:DataFrames and Schemas

  • Getting Started with Datasets and DataFrames
  • DataFrames Operations
  • Activity: Exploring DataFrames Using the Apache Spark Shell
  • Creating DataFrames from Data Sources
  • Saving DataFrames to Data Sources
  • Data Frame Schemas
  • Eager and Lazy Execution
  • Querying DataFrames Using Column Expressions
  • Grouping and Aggregation Queries
  • Joining DataFrames
  • Activity: Analyzing Data with DataFrame Queries
  • Activity: Working with DataFrames and Schemas

1.8 Querying tables using Spark SQL

  • Querying Files and Views
  • The Catalog API
  • Comparing Spark SQL, Apache Impala, and Apache Hive-on-Spark
  • Activity: Querying Tables and Views with SQL

1.9 Working with Datasets in Scala

  • Datasets and DataFrames
  • Creating Datasets
  • Loading and Saving Datasets
  • Dataset Operations
  • Activity: Using Datasets in Scala

1.10 Writing, Configuring, and Running Apache Spark Applications

  • Writing a Spark Application
  • Building and Running an Application
  • Application Deployment Mode
  • The Spark Application Web UI
  • Configuring Application Properties
  • Activity: Writing, Configuring, and Running a Spark Application

1.11 Apache Spark Streaming: Introduction to D Streams

  • Apache Spark Streaming Overview
  • Example: Streaming Request Count
  • DStreams
  • Developing Streaming Applications
  • Activity: Writing a Streaming Application

1.12 Apache Spark Streaming: Processing Multiple Batches

  • Multi-Batch Operations
  • Time Slicing
  • State Operations
  • Sliding Window Operations
  • Activity: Processing Multiple Batches of Streaming Data

1.13  Apache Spark Streaming: Data Sources

  • Streaming Data Source Overview
  • Apache Flume and Apache Kafka Data Sources
  • Example: Using a Kafka Direct Data Source
  • Activity: Processing Streaming Apache Kafka Messages

1.14 Message Processing with Apache Kafka

  • What Is Apache Kafka?
  • Apache Kafka Overview
  • Apache Kafka Cluster Architecture
  • Apache Kafka Command Line Tools
  • Activity: Producing and Consuming Apache Kafka Messages

1.15 Capturing Data with Apache Flume

  • What Is Apache Flume?
  • Basic Architecture
  • Sources, Sinks, Channels and Configuration
  • Activity: Collecting Web Server Logs with Apache Flume

1.16 Integrating Apache Flume and Apache Kafka

  • Overview
  • Use Cases
  • Configuration
  • Activity: Sending Messages from Flume to Kafka

FAQ

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark’s standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

The training on “Spark with Scala” course is a hands-on training. All the code and exercises will be done in the classroom sessions. Our batch sizes are generally small so that personalized attention can be given to each and every learner.

Feel free to drop a mail to us at info@stackodes.com and we will get back to you at the earliest for your queries on “Spark with Scala” course.

Quick Enquiry