Overview
This course is an entry point for developers who need to create big data applications to analyse big data stored in Apache Hadoop using Spark.
Topics include: An overview of the Hortonworks Data Platform (HDP), including HDFS and YARN; using Spark Core APIs for interactive data exploration; Spark SQL and DataFrame operations; Spark Streaming and DStream operations; data visualisation, reporting, and collaboration; performance monitoring and tuning; building and deploying Spark applications; and an introduction to the Spark Machine Learning Library.
Duration
4 days
Who is the course for
Software engineers that are looking to develop data stream and in memory applications for time sensitive and highly iterative applications in an Enterprise HDP environment.
Prerequisites
Students should be familiar with programming skills and have previous experience in software development using either Python or Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.
Hands-On Lab Activities
Lab 0: Pre-lab Setup
About This Lab
Objective:
Set up the lab environment and confirm functionality of HDP2.5
Lab 1: Using HDFS Commands
About This Lab
Objective:
View, add, manipulate, and remove files and directories to and from HDFS using hdfs dfs
commands.
Lab 2: Introduction to Spark REPLs and Zeppelin
About This Lab
Objective:
Access and browse Spark REPLs and Zeppelin
File Locations:
N/A
Successful Outcome:
Use Spark REPLs and browse Zeppelin
Lab 3: Creating and Manipulating RDDs (Scala/Python)
About This Lab
Objective:
Create and Manipulate RDDs using Scala and Zeppelin
Lab 4: Create and Manipulate Pair RDDs (Scala/Python)
About This Lab
Objective:
Create pair RDD’s and use various functions to transform these RDD’s using Scala in Zeppelin.
File Locations:
/home/zeppelin/spark/data/
Successful Outcome:
REQUIRED: Create pair RDDs and perform various operations.
OPTIONAL: Complete challenge labs performing more complex operations.Lab 5: Basic Spark Streaming (Scala)
Lab 5: Basic Spark Streaming (Scala/Python)
About This Lab
Objective:
Set up basic Spark Streaming operations using the REPL
File Locations:
/root/spark/data/
Successful Outcome:
Stream data from HDFS directories and TCP sockets using Spark Streaming
Lab 6: Basic Spark Streaming Transformations (Scala/Python)
About This Lab
Objective:
Learn to use basic Spark Streaming transformations on data streams
File Locations:
/root/spark/data/
Successful Outcome:
Perform several basic transformations on streaming dataLab 8: Create and Save DataFrames & Tables (Scala)
Lab 7: Spark Streaming Window Transformations (Scala/Python)
About This Lab
Objective:
Use Spark Streaming Window Transformations
File Locations:
NA
Successful Outcome:
Perform several Spark Streaming Window Transformations
Lab 8: Create and Save DataFrames & Tables (Scala/Python)
About This Lab
Objective:
Create and save DataFrames and tables
Files Locations:
NA
Successful Outcome:
Use various methods to create and save DataFrames and tables
Lab 9: Working with DataFrames (Scala/Python)
About This Lab
Objective:
Learn to use the DataFrames API.
File Locations:
NA
Successful Outcome:
Manipulate DataFrames using the DataFrames API
Lab 10: Data Visualization, Reporting andCollaboration using Zeppelin (Scala/Python)
About This Lab
Objective:
Learn to use Zeppelin to perform data visualizations, collaborate, and integrate visualizations into
reports.
Files Locations:
NA
Successful Outcome:
Use Zeppelin to perform data visualization, collaboration, and reporting tasks.
Lab 11: Job Monitoring (Scala/Python)
About This Lab
Objective:
Monitor Spark jobs using the Spark Application UI
File Locations:
NA
Successful Outcome:
Monitor Spark jobs.
Lab 12: Performance Tuning (Scala/Python)
About This Lab
Objective:
Practice performance tuning techniques
File Locations:
/home/zeppelin/spark/data/
Successful Outcome:
Code performance tuning techniques from the lesson
Lab13: Build and Submit Applications to YARN (Scala/Python)
About This Lab
Objective:
Apply programming knowledge into stand-alone applications submitted to a YARN cluster
File Locations:
NA
Successful Outcome:
Build and submit a cluster-mode application to YARN
Lab 14: Machine Learning Walkthrough
About This Lab
Objective:
Observe and run code examples that demonstrate machine learning processes.
File Locations:
NA
Successful Outcome:
Import a preconfigured note that contains machine learning code samples, read through the note, and
run those examples.
Nous contacter sur le 27 862 155 , 54 828 018, 71 866 142
Durée :
4 jours
Cours du jours :
De 9h à 14h
Cours du soir & weekend :
De 18:30 à 21h , Samedi matin,dimanche matin ,de 9h à 13h
Parrainez une seul personne et recevez une réduction de 30 %
Parrainez une deuxième personne et recevez une réduction immédiate de 100 %
vous êtes Entreprise :
Nous sommes à votre entière disposition pour vous fournir les documents nécessaires au vu de la déduction des frais de formation sur la TFP(Taxe à la Formation Professionnelle).