Upload
skillspeed
View
168
Download
5
Embed Size (px)
Citation preview
© 2015 BlueCamphor Technologies (P) Ltd.
Hadoop 2.0 & Yarn
Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
ᗍ Introduction to Big Data and Hadoop
ᗍ Understanding Hadoop 2.0 and its features
ᗍ Understanding the differences between Hadoop 1.x and 2.x
ᗍ Understanding YARN
Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?Today, it is becoming a problem for all of us to manage such BIG DATA….Get Started with BIG Data & Hadoop
Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be used for easy processing of such huge Data…..We will answer how?
Before that let’s understand what is Hadoop?
Get Started with BIG Data & Hadoop
Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop Characteristi
cs
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS(Hadoop Distributed File System)
Pig LatinData Analysis
HiveDW System
MapReduce Framework HBase
Other YARN
Frameworks (MPI, GIRAPH)
YARNCluster Resource Management
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Next Generation Hadoop
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 1.x
Client
NameNode Secondary NameNode
Job Tracker
Data Node Data Node
Task Tracker
Map Reduce
Task Tracker
Map Reduce
Task Tracker
Map Reduce
Data Node
Task Tracker
Map Reduce
Data Node
Data Blocks
…….
HDFS Map Reduce
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Challenges for Hadoop 1.x
Problem Description
NameNode – No horizontal Scalability
Single NameNode and Single Namespaces, limited by NameNode RAM
NameNode – No high Availability (HA)
NameNode is single point of failure, need manual recovery using Secondary NameNode in case of failure
Job Tracker – Overburdened Spends significant amount of time and effort managing the life-cycle of applications
MRv1 – Only Map and Reduce TasksHumongous amount of data stored in HDFS remains unutilized and cannot be used for other workloads such as graph processing etc.
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 2.x Features
Property Hadoop 1.0 Hadoop 2.0
Federation One Namenode and Namespaces
Multiple Namenode and Namespaces
High Availability Not Present Highly Available
YARN – Processing Control and Multi-tenancy
JobTracker, Task TrackerResource Manager, Node Manager, App Master, Capacity Scheduler
Other Important Hadoop 2.0 Features
ᗍ HDFS Snapshots
ᗍ NFSv3 access to data in HDFS
ᗍ Support for running Hadoop on MS Windows
ᗍ Binary Compatibility for MapReduce applications built on Hadoop 1.0
ᗍ Substantial amount of Integration testing with rest of the projects (Such as PIG, HIVE) in Hadoop ecosystem
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
HDFS 1.x Vs 2.x
Pool k Pool n
NS 1 NS k NS n
NN-1 NN-k NN-n
Block Pools
DataNode 1
….
DataNode 2
….
DataNode m
….
Common Storage
Blo
ck S
tora
ge
Nam
esp
ace
…. ….
Hadoop 2.0
NameNode
NS
Block Management
.….
Storage
Nam
esp
ace
Blo
ck S
tora
ge
Hadoop 1.0
Pool 1
Datanode Datanode
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 2.x – High Availability
Client
Secondary NameNode
ActiveNameNode
Shared edit logs
StandbyNameNode
Resource Manager
Data Node Data Node
Node Manager
ContainerApp
Master
Node Manager
ContainerApp
Master
Node Manager
ContainerApp
Master
Node Manager
ContainerApp
Master
Data Node Data Node
HDFS YARN
Read edit logs and applies to its own namespace
All name space edits logged to shared NFS storage; single writer (fencing)
Next Generation MapReduce
NameNode High
Availability
**Not necessary to configure secondary NameNode
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 1.x Vs 2.x Ecosystem
Apache Oozie (Workflow)
HIVE DW System
Pig LatinData
Analysis
MapReduce Framework
HBase
HDFS (Hadoop Distributed File System)
Apache Oozie (Workflow)
HIVE DW System
Pig LatinData
Analysis
Other YARN
Frameworks
(MPI, GIRAPH) HBaseMapReduce Framework
YARN Cluster Resource Management
HDFS (Hadoop Distributed File System)
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
YARN Flow
YARN = Yet Another Resource Negotiator
JobHistoryServer
Resource Manager
Client
Client
Container
App Master
Node Manager
App Master
Container
Node Manager
Container
Container
Node ManagerMapReduce Status
Job Submission
Node Status
Resource Request
Resource Manager
ᗍ Cluster Level Resource Managerᗍ Long life, High Quality Hardware
Node Manager
ᗍ One per Data Nodeᗍ Monitors Resources on Data Node
Application Master
ᗍ One per application ᗍ Short lifeᗍ Manages task/scheduling
Get Started with BIG Data & Hadoop
Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and Data Loading
Module 3
Introduction to Map Reduce
Module 4
Advanced Map Reduce Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and Introduction to Hive
Module 7
Advanced Hive Concepts
Module 8
Extending Hive and HBase Introduction
Module 9
Advanced HBase and Oozie Introduction
Module 10
Project Set-up Discussion
Get Started with BIG Data & Hadoop
Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course Curriculum
from Industry Experts
Instructor Led Live Virtual Sessions
Lifetime access to Course
Content via LMS
100% Placement Assistance
24x7 Support
24x7
Get Started with BIG Data & Hadoop
Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND+91-90660-20904 USA1866-607-6547 (Toll Free)
Or reach us at
Contact us..