36
GFARM V2: A grid file system that supports high-performance distributed and parallel data computing Osamu Tatebe, Satoshi Sekiguchi, AIST, Tsukuba, Japan Youhei Morita, KEK, Tsukuba, Japan Noriyuki Soda, SRA, Nagoya, Japan Satoshi Matsuoka, Titech / NII, Tokyo, Japan Presentation: Chawanat Nakasan / M1 Laboratory for Software Design and Analysis Nara Institute of Science and Technology Seminar II, First Presentation 2013.12.04 1

Gfarm presentation and thesis topic introduction

Embed Size (px)

DESCRIPTION

This slide outlines general information of Gfarm file system and the basis for the presenter's thesis.

Citation preview

Page 1: Gfarm presentation and thesis topic introduction

1

GFARM V2:A grid file system that supports high-performance distributed and parallel

data computingOsamu Tatebe, Satoshi Sekiguchi, AIST, Tsukuba, Japan

Youhei Morita, KEK, Tsukuba, JapanNoriyuki Soda, SRA, Nagoya, Japan

Satoshi Matsuoka, Titech / NII, Tokyo, Japan

Presentation: Chawanat Nakasan / M1Laboratory for Software Design and AnalysisNara Institute of Science and Technology

Seminar II, First Presentation2013.12.04

Page 2: Gfarm presentation and thesis topic introduction

2

Agenda

• What is Gfarm• Things similar to Gfarm• Replication in Gfarm

• Networking issues in Gfarm• Research introduction

Paper

Application

O. Tatebe, S. Sekiguchi, Y. Morita, N. Soda, and S. Matsuoka, “Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing,” in Computing in High Energy Physics and Nuclear Physics, 2004, pp. 1172–1175.

Page 3: Gfarm presentation and thesis topic introduction

3

Introduction

Page 4: Gfarm presentation and thesis topic introduction

4

What is Gfarm?

• Distributed File System• with Parallel Processing

CPU CPU CPU CPU

META

Metaserver

Storage

Processor

Storage Nodes

Page 5: Gfarm presentation and thesis topic introduction

5

What’s different about Gfarm?

• Other clustering solutions send files to where the jobs are.

CPU CPU

META

File

FileFileDoesn’t work well

with BIG DATA.

Job Job

Page 6: Gfarm presentation and thesis topic introduction

6

What’s different about Gfarm?

• Instead, Gfarm sends jobs to nodes with files.

CPU CPU CPU CPU

META

File File

Job Job Job Job

Page 7: Gfarm presentation and thesis topic introduction

7

Replica Management

Page 8: Gfarm presentation and thesis topic introduction

8

One Big Issue in Distributed Storage:Replication and replica management• Same files are copied and spread across the system.• Reasons:• Redundancy• Locality• In Gfarm: job location

• Problem: Consistency.

CPU CPU CPU CPU

META

File File File

Page 9: Gfarm presentation and thesis topic introduction

9

Gfarm directs file opens to the same place.• This method is very effective for consistency control.• But, it requires more coordination between the nodes i.e. more

network load and overhead.

P1

F1

P2

F2

(1) P1 opens file replica F1

(2) P2 tries to open replica F2 (same file different place)

(3) P2 is redirected to use F1 too, to limit # copies open

Page 10: Gfarm presentation and thesis topic introduction

10

Summary: What is Gfarm?

• A distributed file system …• with a parallel processing scheduler …• that sends jobs to files, not files to jobs, …• and only one replica can be written at a time!

Page 11: Gfarm presentation and thesis topic introduction

11

Application: Improving GfarmWhy do we have to improve it?

Page 12: Gfarm presentation and thesis topic introduction

12

It sounds good, until implementations get too large.• When it becomes global-scale, we have to think differently.• This is what appears to us:

CPU CPU CPU

META

Page 13: Gfarm presentation and thesis topic introduction

13

It sounds good, until implementations get too large.• But this is reality:

META CPU

CPU

CPU

Page 14: Gfarm presentation and thesis topic introduction

14

So how do we simplify this problem? We put an overlay network on top.

Physical Network(Reality)

Overlay Network(Gfarm sees)

Page 15: Gfarm presentation and thesis topic introduction

15

1. It doesn’t care about locality.

• In this case, the two red arrows are “same length” according to this topology, because it’s just one hop apart.

Overlay Network

Page 16: Gfarm presentation and thesis topic introduction

16

1. It doesn’t care about locality.

• However, it’s not when we look at physical diagram.• Gfarm’s overlay network doesn’t recognize the true distances.

Physical Network

Page 17: Gfarm presentation and thesis topic introduction

17

2. Conventional network doesn’t use every route.• Examine this topology: there’s more than one way for the circled

nodes to reach each other.

Physical Network

Best Route: Always used

Other Route(s): Rarely used

Page 18: Gfarm presentation and thesis topic introduction

18

3. If we use every route, which would we use?

CPU CPU

High latency, more bandwidthGood for data transfer

Low latency, less bandwidthGood for control messages

Page 19: Gfarm presentation and thesis topic introduction

19

We are about to use the SDN.

• SDN = Software-defined network• Concept: Use software to

dynamically add or change network data flows.

Figure:McKeown, N., & Anderson, T. (2008). OpenFlow: enabling innovation in campus networks. Retrieved from http://dl.acm.org/citation.cfm?id=1355746

Page 20: Gfarm presentation and thesis topic introduction

20

What can the SDN do?

• SDN can practically let us make a whole new protocol by programming a specific “controller” to do the job.• With SDN, we can:• Change settings dynamically• Implement specialized Quality-of-Service (QoS)• Differentiate many kinds of connections

• By application, port, users, network addresses, groups, etc.• Use multi-path routing efficiently• and much more!

Page 21: Gfarm presentation and thesis topic introduction

21

So what do we want to do?

We want to

acceleratewide-area distributed storage

by using

software defined networkto

optimize the overlay network.

Page 22: Gfarm presentation and thesis topic introduction

22

Page 23: Gfarm presentation and thesis topic introduction

23

INFORMATION for GENERAL PUBLIC

• This work was made by a member of Laboratory for Software Design and Analysis, Graduate School of Information Science, Nara Institute of Science and Technology.• This presentation is the first of two required for Master’s degree

graduation and is presented to faculty and students of the Institute.• This file has been modified for public disclosure. Actual content

during presentation was different.

Page 24: Gfarm presentation and thesis topic introduction

24

BACKUP SLIDES

• Some of them may not make sense.

Page 25: Gfarm presentation and thesis topic introduction

25

Gfarm job execution relies on file presence.

Page 26: Gfarm presentation and thesis topic introduction

26

BACKUP: Gfarm’s not Hadoop

• Gfarm isn’t Hadoop: it provides job scheduling that’s not MapReduce. Of course, Gfarm works with Hadoop if you want it to.

http://www.ibm.com/developerworks/cloud/library/cl-openstack-deployhadoop/figure4.gif

Let’s just say Gfarm doesn’t do this:

Page 27: Gfarm presentation and thesis topic introduction

27

How to work with file replicas

• To open a file in READ mode: Any replica is OK.

Replica Replica Replica Replica

Process Process

Writing Reading

Process

Page 28: Gfarm presentation and thesis topic introduction

28

How to work with file replicas

• To open a file in WRITE mode (in this order):• If somebody is writing, use a replica already opened in WRITE mode• If nobody is writing, use a replica already opened in READ mode• If nobody is reading, use any replica

Replica Replica Replica Replica

Process Process

Writing Reading

Process

Page 29: Gfarm presentation and thesis topic introduction

29

BACKUP:2. Why don’t we use every possible route?• So what we can do might be:• Transfer File A over the red path• Transfer File B over the orange path

• The overall bandwidth would be increased!

Physical Network

Page 30: Gfarm presentation and thesis topic introduction

30

BACKUP:2. Why don’t we use every possible route?• Problems of this solution:• TCP segmentation & reordering• UDP will result in A LOT of unwanted and uncorrectable reordering

• Mitigation:• Separate data & control• Just divide the link at file level, so one file on link A, another file on link B, etc.• We can do this because it’s a file system and may make use of many files at

the same time.

Page 31: Gfarm presentation and thesis topic introduction

31

BACKUP:Why bandwidth and latency don’t correlate?

• Bandwidth is limited by the link capacity and rate of transmission and receiving.• Latency is caused by processing time.• Per-router processing time is increased in the WAN due to routers being

overwhelmed by general public usage of the Internet• There can be more than 10 hops to reach a node in another country.

Page 32: Gfarm presentation and thesis topic introduction

32

Actually, why NOT SDN?

• Configuration delay: takes some time for a new route to be installed• Single Point of Failure (for centralized SDNs like OpenFlow)• Cannot easily implement multiple SDN instances• We can however pre-slice the network and run SDN on each “subnet”, or• use solutions like FlowVisor (proxy OpenFlow)

• Controller bugs can break the existing thing (even the simplest controllers can have bugs!)

Page 33: Gfarm presentation and thesis topic introduction

33

How can we use it with Gfarm?

• Data• Use multiple paths• Prefer bandwidth path

• Control• QoS• Prefer low-latency path

• These methods can be implemented in SDN

Page 34: Gfarm presentation and thesis topic introduction

34

How can we use it with Gfarm?

• We can use multiple paths to add up bandwidth.• SDN can differentiate between each flow so paths can be separated.

Multi-path routing?

Physical Network

Page 35: Gfarm presentation and thesis topic introduction

35

How can we use it with Gfarm?

• Control messages prefer low latency• Data transfers prefer greater bandwidth.• SDN knows difference between these uses and can optimize.

Application-aware routing?

CPU CPUMore latencyMore bandwidth Less latency

Less bandwidth

Page 36: Gfarm presentation and thesis topic introduction

36

How can we use it with Gfarm?

• Critical uses such as control messages can be given priority so they can “skip the (potentially very long) queue” of data packets.• Some SDNs like OpenFlow are beginning to support QoS.

Quality of Service?

Important Can Wait

• VoIP• Streaming data• Control Messages• Synchronous msgs

• Scheduled jobs• Data backup• Background tasks• Unimportant things