HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon,...

Preview:

Citation preview

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul MaengIEEE 2007

Dec 3, 2014Kyung-Bin Lim

2 / 35

Outline

Introduction Methodology Experiments Conclusion

3 / 35

Apache HAMA

Easy-of-use tool for data-intensive scientific computation Massive matrix/graph computations are often used as primary

functionalities Fundamental design is changed from MapReduce with matrix

computation to BSP with graph processing Mimic of Pregel running on HDFS

– Use zookeeper as a synchronization barrier

4 / 35

Our Focus

This paper is a story about previous version 0.1 of HAMA– Latest version: 0.7.0, Mar. 2014 released

Only Focus on matrix computation with MapReduce Shows simple case studies

5 / 35

The HAMA Architecture

We propose distributed scientific framework called HAMA (based on HPMR)– Provide transparent matrix/graph primitives

6 / 35

The HAMA Architecture

HAMA API: Easy-to-use Interface HAMA Core: Provides matrix/graph primitives HAMA Shell: Interactive User Console

7 / 35

Contributions of HAMA

Compatibility– Take advantage of all Hadoop features

Scalability– Scalable due to compatibility

Flexibility– Multiple Compute Engines Configurable

Applicability– HAMA’s primitives can be applied to various applications

8 / 35

Outline

Introduction Methodology Experiments Conclusion

9 / 35

Case Study

With case study approach, we introduce two basic primitives with MapReduce model running on HAMA– Matrix multiplication and finding linear solution

And compare with MPI versions of these primitives

10 / 35

Case Study

Representing matrices– As a defaults, HAMA use HBase (NoSQL database)

HBase is modeled after Google’s Bigtable Column oriented, semi-structured distributed database with high scalability

11 / 35

Case Study – Multiplication: Iterative Way

Iterative approach (Algorithm)

12 / 35

Case Study – Multiplication: Iterative Way

Simple, naïve strategy

Works well with sparse matrix

Sparse matrix: most entries are 0

13 / 35

Multiplication: Iterative Way

14 / 35

Multiplication: Iterative Way

15 / 35

Multiplication: Iterative Way

16 / 35

Multiplication: Iterative Way

17 / 35

Multiplication: Iterative Way

18 / 35

Multiplication: Iterative Way

19 / 35

Case Study – Multiplication: Block Way

Multiplication can be done using sub-matrix

Works well with dense matrix

20 / 35

Case Study – Multiplication: Block Way

Block Approach– Minimize data movement (network cost)

21 / 35

Case Study – Multiplication: Block Way

Block Approach (Algorithm)

22 / 35

Case Study – Finding Linear Solution

Ax =b– x = ?

A: known square symmetric positive-definite matrix b: known vector

Use Conjugate Gradient approach

25 / 35

Case Study – Finding Linear Solution

Conjugate Gradient Method– Find a direction (conjugate direction)– Find a step size (Line search)

26 / 35

Case Study – Finding Linear Solution

Conjugate Gradient Method (Algorithm)

27 / 35

Outline

Introduction Methodology Experiments Conclusion

28 / 35

Evaluations

TUSCI (TU Berlin SCI) Cluster– 16 nodes, two Intel P4 Xeon processors, 1GB memory– Connected with SCI (Scalable Coherent Interface) network interface in a 2D

torus topology– Running in OpenCCS (similar environment of HOD)

Test sets

29 / 35

HPMR’s Enhancements

Prefetching– Increase Data Locality

Pre-shuffling– Reduces Amount of intermediate outputs to shuffle

30 / 35

Evaluations

The comparison of average execution time and scaleup with Ma-trix Multiplication

31 / 35

Evaluations

The comparison of average execution time and scaleup with CG

32 / 35

Evaluations

The comparison of average execution time with CG, when a single node is overloaded

33 / 35

Outline

Introduction Methodology Experiments Conclusion

34 / 35

Conclusion

HAMA provides the easy-of-use tool for data-intensive computa-tions– Matrix computation with MapReduce– Graph computation with BSP

Recommended