Download pdf - Hadoop Inside

Transcript
Page 1: Hadoop Inside

Hadoop Inside

TC 데이터플랫폼실 GFIS팀

이은조

Page 3: Hadoop Inside

Distributed Processing System

How to process data in distributed environment

how to read/write data

how to control nodes

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

process error, network error, hardware error, …

error handling

temporary error: retry -> duplication, data corruption, …

permanent error: fail over(which one?)

process hang: timeout & retry

• too long -> long response time

• too short -> infinite loop

Page 4: Hadoop Inside

Hadoop System Architecture

Job

Tracker

Name

Node

Task

Tracker

Data

Node

Task

Tracker

Data

Node

Task

Tracker

Data

Node

: Node : Process : Heart Beat : Data Read/Write

Secondary

Name

Node

HDFS + MapReduce

Page 5: Hadoop Inside

HDFS

vs. Filesystem

inode – namespace

cylinder / track – data node

blocks(bytes) – blocks(Mbytes)

Features

very large files

write once, read many times

support for usual file system operations

ls, cp, mv, rm, chmod, chown, put, cat, …

no support for multiple writers or arbitrary modifications

Page 6: Hadoop Inside

Block Replication & Rack Awareness

1 2

3 4

1 2

3 4

1

2

3

4

1 2 3 4

1 2

3 4

1 2

3

4

1 2

3

4 : Server

: Rack

: File

: Block

Page 7: Hadoop Inside

HDFS - Read

Name

Node

Data Node

: Node : Data Block : Data I/O

Data Read

: Operation Message

Client

Data Node

Data Node

1. Read Request

2. Response

3. Reqeust Data

4. Read Data

Page 8: Hadoop Inside

HDFS - Write

Name

Node

Data Node

Data Write

Client

Data Node

Data Node

1. Write Request

2. Response

3. Write Data

4. Write Replica

4. Write Replica

5. Write Done

: Node : Data Block : Data I/O : Operation Message

Page 9: Hadoop Inside

HDFS – Write (Failure)

Name

Node

Data Node

Data Write

Client

Data Node

Data Node

1. Write Request

2. Response

3. Write Data

4. Write Replica

5. Write Done

: Node : Data Block : Data I/O : Operation Message

Page 10: Hadoop Inside

HDFS – Write (Failure)

Name

Node

Data Node

Data Write

Client

Data Node

Data Node

: Node : Data Block : Data I/O : Operation Message

Data Node

Write Replica

Delete Partial block

Replica Arrangement

Page 11: Hadoop Inside

MapReduce

Definition

map: (+1) [ 1, 2, 3, 4, …, 10 ] -> [ 2, 3, 4, 5, …, 11 ]

reduce: (+) [ 2, 3, 4, 5, …, 11 ] -> 65

Programming Model for processing data sets in Hadoop

projection, filter -> map task

aggregation, join -> reduce task

sort -> partitioning

Job Tracker & Task Trackers

master / slave

job = many tasks

# of map tasks = # of file splits (default: # of blocks)

# of reduce tasks = user configuration

Page 12: Hadoop Inside

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

Page 13: Hadoop Inside

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

Page 14: Hadoop Inside

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

Page 15: Hadoop Inside

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

Page 16: Hadoop Inside

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

Page 17: Hadoop Inside

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

Page 18: Hadoop Inside

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

Page 19: Hadoop Inside

Mapper - partitioning

double indexed structure

Spill Thread

data sorting: 2nd index (quick sort)

spill file generating

spill data file & index file

flush

merge sort (by key) per partition

key value key value … key value

partition key offset

value offset

partition key offset

value offset

key offset

key offset

key offset

….

Output Buffer (default: 100Mb)

1st Index

2nd Index

Page 20: Hadoop Inside

TaskTracker (reduce task)

Reducer –fetching

GetMapEventsThread

map event listener

MapOutputCopier

data fetching from completed mapper (HTTP)

concurrent running in some threads

Merger

key sorting (heap sort)

TaskTracker

(map task)

TaskTracker

(map task)

TaskTracker

(map task)

Job

Tracker

Copier

Copier

completion events

completion events

HTTP - GET

Reducer

Page 21: Hadoop Inside

Job Flow

Job

Client

MapReduce

Program

Job

Tracker

Task

Tracker

Child

Map/

Reduce

Task

1. runJob

3. submit job

5. add job

6. heartbeat

7. assign task

9. launch

10. run

Shared

File System 2. copy job

resources

4. retrieve

input spilts

8. retrieve

job resources

: Node

: JVM

: Class

: Job Queue

: Method Call

: I/O

11. read data/

write result

: Job

: Task

Client Node

JobTracker Node

TaskTracker Node

Page 22: Hadoop Inside

Monitoring

Heart beat

task tracker status checking

task request / alignment

other commands (restart, shudown, kill task, …)

Cluster Status

Job / Task Status

JobInProgress

TaskInProgress

Reporter & Metrics

Black list

Page 23: Hadoop Inside

Monitoring (Summary)

Heart beat

task tracker status checking

task request / alignment

other commands (restart, shudown, kill task, …)

Cluster Status

Job / Task Status

JobInProgress

TaskInProgress

Reporter & Metrics

Black list

Page 24: Hadoop Inside

Monitoring (Cluster Info)

Page 25: Hadoop Inside

Monitoring (Job Info)

Page 26: Hadoop Inside

Monitoring (Task Info)

Page 27: Hadoop Inside

Task Scheduler

job queue

red-black tree ( java.util.TreeMap)

sort by priority & job id (request time)

load factor

remain tasks / capacity

task alignment

high priority

new task > speculative execution task > dummy splits task

map task (local) > map task (non-local) > reduce task

padding

padding = MIN(total tasks * pad faction, task capacity)

for speculative execution

Page 28: Hadoop Inside

Error Handling

Retry

configurable (default 4 times)

Timeout

configurable

Speculative Execution

current – start >= 1 minute

average progress – progress > 20%

Page 29: Hadoop Inside

Distributed Processing System

How to process data in distributed environment

how to read/write data

how to control nodes

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

process error, network error, hardware error, …

error handling

temporary error: retry -> duplication, data corruption, …

permanent error: fail over(which one?)

process hang: timeout & retry

• too long -> long response time

• too short -> infinite loop

Page 30: Hadoop Inside

Distributed Processing System

How to process data in distributed environment

how to read/write data

how to control nodes

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

process error, network error, hardware error, …

error handling

temporary error: retry -> duplication, data corruption, …

permanent error: fail over(which one?)

process hang: timeout & retry

• too long -> long response time

• too short -> infinite loop

HDFS Client master / slave

replication / rack awareness job scheduler

Page 31: Hadoop Inside

Distributed Processing System

How to process data in distributed environment

how to read/write data

how to control nodes

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

process error, network error, hardware error, …

error handling

temporary error: retry -> duplication, data corruption, …

permanent error: fail over(which one?)

process hang: timeout & retry

• too long -> long response time

• too short -> infinite loop

heart beat job/task status

reporter / metrics

Page 32: Hadoop Inside

Distributed Processing System

How to process data in distributed environment

how to read/write data

how to control nodes

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

process error, network error, hardware error, …

error handling

temporary error: retry -> duplication, data corruption, …

permanent error: fail over(which one?)

process hang: timeout & retry

• too long -> long response time

• too short -> infinite loop

black list time out & retry

speculative execution

Page 33: Hadoop Inside

Limitations

map -> reduce network overhead

iterative processing

full(or theta) join

small size but many splits data

Low latency

polling & pulling

job initializing

optimized for throughput

job scheduling

data access

Page 34: Hadoop Inside

Q&A


Recommended