Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 9962161 江嘉福...

Preview:

Citation preview

Cloud MapReduce:A MapReduce

Implementationon top of a

Cloud Operation System

9962161 江嘉福100062228 徐光

成100062229 章博

遠2011, 11th IEEE/ACM International Symposium on

Huan Liu, Dan OrbanAccenture Technology Labs

1

OUTLINE

I. IntroductionII. Cloud MapReduceArchitecture & ImplementationIII. Pros & Cons of Cloud MapReduceIV. Experimental EvaluationV. Conclusions & Future WorksVI. References

29962161 江嘉福 100062228 徐光成 100062229 章博遠

INTRODUCTION

1. What is Cloud OS ?

2. Challenges posed by a cloud OS

3. Cloud MapReduce?

4. Advantages of Cloud MapReduce

39962161 江嘉福 100062228 徐光成 100062229 章博遠

What is Cloud OS ?

1.Managing the low level cloud resources

2.Presenting a high level interface to

the application programmers3.key difference : scalable

圖一

49962161 江嘉福 100062228 徐光成 100062229 章博遠

Challenges posed by a cloud OS

1.Scalability comes at a price.

2.Data consistency, system availability, and tolerance to network partition.

圖二

59962161 江嘉福 100062228 徐光成 100062229 章博遠

Cloud MapReduce?

1.MapReduce programming model

2.horizontal scaling

3.eventual consistency

4.overcome limitations

69962161 江嘉福 100062228 徐光成 100062229 章博遠

Advantages of Cloud MapReduce1.Incremental scalability:

Can scale incrementally in the number of computing nodes.

2.Symmetry and Decentralization:Node has the same set of responsibilities.

3.Heterogeneity:Nodes have varying computation capacity.

79962161 江嘉福 100062228 徐光成 100062229 章博遠

Cloud MapReduceArchitecture and Implementation1.The architecture

2.Cloud challnenges

3.General solution approaches

89962161 江嘉福 100062228 徐光成 100062229 章博遠

The Architecture

99962161 江嘉福 100062228 徐光成 100062229 章博遠

Cloud challenges &General solution approaches

1.Long latency

2.Horizontal scaling

3.Don’t know when a queue is created for the first time

109962161 江嘉福 100062228 徐光成 100062229 章博遠

Con’t

4.Duplicate message

5.Potential node failure

6.Indeterminstic eventual consistency windows

119962161 江嘉福 100062228 徐光成 100062229 章博遠

Pros

●3000 lines of Java code(L.O.C) vs 285375 Hadoop L.O.C

●Large & Reliable FS

●High Bandwidth(fast read/write)

●Single point of contact(high throughput)

129962161 江嘉福 100062228 徐光成 100062229 章博遠

Cons

●Uses only network(no local storage)

●Leads to bottleneck

139962161 江嘉福 100062228 徐光成 100062229 章博遠

Evaluation

Almost twice as fast!

149962161 江嘉福 100062228 徐光成 100062229 章博遠

Evaluation

● Hadoop - 385s total, network/CPU under utilized● CMR - 210s, more efficient network/CPU usage

159962161 江嘉福 100062228 徐光成 100062229 章博遠

Evaluation

Wiki Word Count

●Combiner:Hadoop - 747sCMR - 436s

●No Combiner:Hadoop - 1733sCMR - 1247s

169962161 江嘉福 100062228 徐光成 100062229 章博遠

Evaluation

Amazon

●Word Count -> 400GB using 100 nodes●Approx. 1hr●983,152 Requests -> $0.98

●Using SimpleDB?●3.7hrs -> $0.52

179962161 江嘉福 100062228 徐光成 100062229 章博遠

Evaluation

Comparison

●Distributed Grep Word Count -> 13GB of data●CMR = 962 seconds●Hadoop 1047 seconds

●Results are almost the same, why?●More CPU intensive tasks

189962161 江嘉福 100062228 徐光成 100062229 章博遠

Evaluation

12GB - 923670 HTML files

●Hadoop -> 6hrs+

●CMR -> 297 seconds

●Hadoop - High overhead from task creation

199962161 江嘉福 100062228 徐光成 100062229 章博遠

Conclusion

●Cloud cannot be implemented on any system●Poor Performance

●CMR techniques overcome cloud limitations●0 Performance Degradation●Good to use for other systems

209962161 江嘉福 100062228 徐光成 100062229 章博遠

REFERENCES

圖一: http://techcrunch.com/

圖二: http://blog.csdn.net/zouqingfang/article/details/7269920

http://zh.wikipedia.org/

https://code.google.com/p/cloudmapreduce/

http://searchcloudcomputing.techtarget.com/definition/MapReduce

http://myblog-maurice.blogspot.tw/2012/08/nosqlcap.html

219962161 江嘉福 100062228 徐光成 100062229 章博遠