Hadoop Inside

  • View
    1.170

  • Download
    4

Embed Size (px)

Transcript

  • 1.Hadoop Inside TC GFIS

2. What is Hadoop Hadoop is a Framework & System for parallel processing of large amounts of data in a distributed computing environment http://searchbusinessintelligence.techtarget.in/tutorial/Apache-Hadoop-FAQ-for-BI-professionals Apache project open source java based google system clone GFS -> HDFS MapReduce -> MapReduce 3. Distributed Processing System How to process data in distributed environment how to read/write data how to control nodes load balancing Monitoring node status task status Fault tolerance error detection process error, network error, hardware error, error handling temporary error: retry -> duplication, data corruption, permanent error: fail over(which one?) process hang: timeout & retry too long -> long response time too short -> infinite loop 4. Hadoop System ArchitectureHDFS + MapReduceSecondaryJobNameNameTrackerNodeNode TaskDataTaskDataTask DataTrackerNode TrackerNode Tracker Node : Node : Process : Heart Beat: Data Read/Write 5. HDFS vs. Filesystem inode namespace cylinder / track data node blocks(bytes) blocks(Mbytes) Features very large files write once, read many times support for usual file system operations ls, cp, mv, rm, chmod, chown, put, cat, no support for multiple writers or arbitrary modifications 6. Block Replication & Rack Awareness1 21 2 133 41 3 243 4 421 1 22123 43 4 : File: Server3 4: Block : Rack 7. HDFS - ReadData Read1. Read Request Name Client Node2. Response3. Reqeust4. Read DataDataData Node Data Node Data Node : Node: Data Block: Data I/O : Operation Message 8. HDFS - WriteData Write1. Write RequestName ClientNode2. Response3. Write5. Write DataDoneData Node4. WriteData Node 4. Write Data NodeReplica Replica : Node : Data Block: Data I/O: Operation Message 9. HDFS Write (Failure)Data Write1. Write RequestName ClientNode2. Response3. Write5. Write DataDoneData NodeData Node Data Node 4. WriteReplica : Node : Data Block: Data I/O : Operation Message 10. HDFS Write (Failure)Data Write Name Data NodeClient NodeReplicaArrangementDeleteWritePartial block ReplicaData Node Data Node Data Node : Node: Data Block: Data I/O: Operation Message 11. MapReduce Definition map: (+1) [ 1, 2, 3, 4, , 10 ] -> [ 2, 3, 4, 5, , 11 ] reduce: (+) [ 2, 3, 4, 5, , 11 ] -> 65 Programming Model for processing data sets in Hadoop projection, filter -> map task aggregation, join -> reduce task sort -> partitioning Job Tracker & Task Trackers master / slave job = many tasks # of map tasks = # of file splits (default: # of blocks) # of reduce tasks = user configuration 12. MapReduceMap / Reduce Task : Distributed File System : Map Task: Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition 13. MapReduceMap / Reduce Task : Distributed File System : Map Task: Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition 14. MapReduceMap / Reduce Task : Distributed File System : Map Task: Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition 15. MapReduceMap / Reduce Task : Distributed File System : Map Task: Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition 16. MapReduceMap / Reduce Task : Distributed File System : Map Task: Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition 17. MapReduceMap / Reduce Task : Distributed File System : Map Task: Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition 18. MapReduceMap / Reduce Task : Distributed File System : Map Task: Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition 19. Mapper - partitioning double indexed structureOutput Bufferkey value keyvalue key value(default: 100Mb)1st Indexpartition keyvalue partition keyvalue offset offsetoffset offset2nd Indexkey key key. offsetoffsetoffset Spill Thread data sorting: 2nd index (quick sort) spill file generating spill data file & index file flush merge sort (by key) per partition 20. Reducer fetching GetMapEventsThread map event listener MapOutputCopier data fetching from completed mapper (HTTP) concurrent running in some threads Merger key sorting (heap sort) completionJob completion events eventsTracker TaskTracker TaskTracker (reduce task)(map task)HTTP - GET Copier TaskTracker(map task)Reducer Copier TaskTracker(map task) 21. Job FlowJobTracker Node5. add job Job3. submit jobTrackerClient Node6. heartbeat4. retrieve input spiltsMapReduce 1. runJob Job Task Program ClientTracker 7. assign task SharedFile System 8. retrieve9. launch2. copy job resources job resourcesChild 11. read data/10. run write result : Node: Job Queue : JobMap/ Reduce : JVM : Method Call : Task Task : Class : I/OTaskTracker Node 22. Monitoring Heart beat task tracker status checking task request / alignment other commands (restart, shudown, kill task, ) Cluster Status Job / Task Status JobInProgress TaskInProgress Reporter & Metrics Black list 23. Monitoring (Summary) Heart beat task tracker status checking task request / alignment other commands (restart, shudown, kill task, ) Cluster Status Job / Task Status JobInProgress TaskInProgress Reporter & Metrics Black list 24. Monitoring (Cluster Info) 25. Monitoring (Job Info) 26. Monitoring (Task Info) 27. Task Scheduler job queue red-black tree ( java.util.TreeMap) sort by priority & job id (request time) load factor remain tasks / capacity task alignment high priority new task > speculative execution task > dummy splits task map task (local) > map task (non-local) > reduce task padding padding = MIN(total tasks * pad faction, task capacity) for speculative execution 28. Error Handling Retry configurable (default 4 times) Timeout configurable Speculative Execution current start >= 1 minute average progress progress > 20% 29. Distributed Processing System How to process data in distributed environment how to read/write data how to control nodes load balancing Monitoring node status task status Fault tolerance error detection process error, network error, hardware error, error handling temporary error: retry -> duplication, data corruption, permanent error: fail over(which one?) process hang: timeout & retry too long -> long response time too short -> infinite loop 30. Distributed Processing System How to process data in distributed environment how to read/write data how to control nodes HDFS Client load balancingmaster / slave Monitoring replication / rack awareness node status job scheduler task status Fault tolerance error detection process error, network error, hardware error, error handling temporary error: retry -> duplication, data corruption, permanent error: fail over(which one?) process hang: timeout & retry too long -> long response time too short -> infinite loop 31. Distributed Processing System How to process data in distributed environment how to read/write data how to control nodes load balancing Monitoring node statusheart beat task statusjob/task status Fault tolerance reporter / metrics error detection process error, network error, hardware error, error handling temporary error: retry -> duplication, data corruption, permanent error: fail over(which one?) process hang: timeout & retry too long -> long response time too short -> infinite loop 32. Distributed Processing System How to process data in distributed environment how to read/write data how to control nodes load balancing Monitoring node status black list task status time out & retry Fault tolerancespeculative execution error detection process error, network error, hardware error, error handling temporary error: retry -> duplication, data corruption, permanent error: fail over(which one?) process hang: timeout & retry too long -> long response time too short -> infinite loop 33. Limitations map -> reduce network overhead iterative processing full(or theta) join small size but many splits data Low latency polling & pulling job initializing optimized for throughput job scheduling data access 34. Q&A