12
Jaegwang Lim Dongguk University

Getting More for Less in Optimized MapReduce Workflows

Embed Size (px)

Citation preview

Page 1: Getting More for Less in Optimized MapReduce Workflows

Jaegwang LimDongguk University

Page 2: Getting More for Less in Optimized MapReduce Workflows

Introduction

Page 3: Getting More for Less in Optimized MapReduce Workflows

Introduction• MapReduce performance depends on some factors

User must specify the number of reduce tasks User must specify the input size

Page 4: Getting More for Less in Optimized MapReduce Workflows

MapReduce Processing Pipeline

Page 5: Getting More for Less in Optimized MapReduce Workflows

ReduceMap

Platform Performance Model

Read ShuffleCollect Spill Merge Write

• Record Dataset (Input size & Duration)

Page 6: Getting More for Less in Optimized MapReduce Workflows

Platform Performance Model

No. DataMB

Readmsec

1 16 20102 16 20113 32 40564 18 2200… … …

No. DataMB

Col-lectmsec

1 8 12102 8 13503 16 24554 16 2411… … …

No. DataMB

Spillmsec

1 16 32132 16 32223 24 40024 16 3200… … …

……

..

………

Page 7: Getting More for Less in Optimized MapReduce Workflows

Platform Performance ModelRead Collect Spill

Merge Shuffle Write

Page 8: Getting More for Less in Optimized MapReduce Workflows

Platform Performance Model• Evaluation Error

Page 9: Getting More for Less in Optimized MapReduce Workflows

Platform Performance Model• 2.5GB, Less 10%

Page 10: Getting More for Less in Optimized MapReduce Workflows

Work-Flow Performance Model• Dataset• The overall input data size• The Map/Reduce selectivity• The Processing time per record of function

Map/Reduce Map/ReduceOutputSize

Map/Reduce

InputSize

InputSize

Selectivity = Input size / Output size

Page 11: Getting More for Less in Optimized MapReduce Workflows

Work-Flow Performance Model• Record Dataset• Pig Program TPC-H Query

Reduce Selectivity = 0.9

Suggested Number of reduce tasks128*0.9 = 115

Page 12: Getting More for Less in Optimized MapReduce Workflows

Conclusion• Automated Performance Management System• Help users to optimize their Map/Reduce application