MapReduce 資工碩一 黃威凱. Outline Purpose Example Method Advanced 資工碩一 黃威凱

Preview:

Citation preview

資工碩一 黃威凱

MapReduce

資工碩一 黃威凱

OutlinePurposeExampleMethodAdvanced

資工碩一 黃威凱

PURPOSE

資工碩一 黃威凱

PurposeData miningData processing

資工碩一 黃威凱

EXAMPLE

資工碩一 黃威凱

ExampleFind the maximum temperature of

yearNational Climatic Data Center(NCDC)

◦The data is stored using a line-oriented ASCII format , in which each line is a record

◦There is a directory for each year from 1901 to 2001 ,each containing a gzipped file for each weather station with its readings for that year

資工碩一 黃威凱

Example(Data format)

資工碩一 黃威凱

Example(Gzipped file, example for 1990)

◦% ls raw/1990 | head◦010010-99999-1990.gz◦010014-99999-1990.gz◦010015-99999-1990.gz◦010016-99999-1990.gz◦010017-99999-1990.gz◦010030-99999-1990.gz◦010040-99999-1990.gz◦010080-99999-1990.gz◦010100-99999-1990.gz◦010150-99999-1990.gz

資工碩一 黃威凱

METHOD

資工碩一 黃威凱

MethodAnalzing the data with Unix toolsAnalzing the data with Hadoop

資工碩一 黃威凱

Method(Unix tools)

資工碩一 黃威凱

Method(Unix tools)Here is the beginning of a run:

◦% ./max_temperature.sh◦1901 317◦1902 244◦1903 289◦1904 256◦1905 283◦ ...

The complete run for the century took 42 minutes in one run single EC2 High-CPU Extra Large Instance.

資工碩一 黃威凱

Method(Hadoop)Use MapReduce

◦Map Shuffle

◦Reduce

資工碩一 黃威凱

Method(Hadoop)Map function

◦Pull out the year and the air temperature

◦Transform key-value pairs

資工碩一 黃威凱

Method(Hadoop)Map function

◦The shuffle Each reduce task is fed by many map

tasks.

資工碩一 黃威凱

Method(Hadoop)Reduce function

◦Iterate through the list and pick up the maximum reading

◦Input (1949, [111, 78]) (1950, [0, 22, -11])

◦Output: (1949, 111) (1950, 22)

資工碩一 黃威凱

Method(Hadoop)Data flow

資工碩一 黃威凱

Method(Hadoop)Java MapReduce-Mapper

example

資工碩一 黃威凱

Method(Hadoop)Java MapReduce-Reduce example

資工碩一 黃威凱

Method(Hadoop)Java MapReduce-Job example

Support multiple path

資工碩一 黃威凱

ADVANCED

資工碩一 黃威凱

AdvancedCase1

資工碩一 黃威凱

AdvancedCase2

資工碩一 黃威凱

AdvancedCase3

資工碩一 黃威凱

AdvancedCombiner Functions on Map

output◦Example

Map input1: (1950, 0), (1950, 20), (1950, 10)

Map input2: (1950, 25), (1950, 15) After shuffle:

Map1: (1950, [0,20,10]) Map2: (1950, [25,15])

No Use Combiner to reduce input (1950, [0, 20, 10, 25, 15])

Use Combiner to reduce input (1950, [20, 25])