Gfarm presentation and thesis topic introduction

Preview:

DESCRIPTION

This slide outlines general information of Gfarm file system and the basis for the presenter's thesis.

Citation preview

1

GFARM V2:A grid file system that supports high-performance distributed and parallel

data computingOsamu Tatebe, Satoshi Sekiguchi, AIST, Tsukuba, Japan

Youhei Morita, KEK, Tsukuba, JapanNoriyuki Soda, SRA, Nagoya, Japan

Satoshi Matsuoka, Titech / NII, Tokyo, Japan

Presentation: Chawanat Nakasan / M1Laboratory for Software Design and AnalysisNara Institute of Science and Technology

Seminar II, First Presentation2013.12.04

2

Agenda

• What is Gfarm• Things similar to Gfarm• Replication in Gfarm

• Networking issues in Gfarm• Research introduction

Paper

Application

O. Tatebe, S. Sekiguchi, Y. Morita, N. Soda, and S. Matsuoka, “Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing,” in Computing in High Energy Physics and Nuclear Physics, 2004, pp. 1172–1175.

3

Introduction

4

What is Gfarm?

• Distributed File System• with Parallel Processing

CPU CPU CPU CPU

META

Metaserver

Storage

Processor

Storage Nodes

5

What’s different about Gfarm?

• Other clustering solutions send files to where the jobs are.

CPU CPU

META

File

FileFileDoesn’t work well

with BIG DATA.

Job Job

6

What’s different about Gfarm?

• Instead, Gfarm sends jobs to nodes with files.

CPU CPU CPU CPU

META

File File

Job Job Job Job

7

Replica Management

8

One Big Issue in Distributed Storage:Replication and replica management• Same files are copied and spread across the system.• Reasons:• Redundancy• Locality• In Gfarm: job location

• Problem: Consistency.

CPU CPU CPU CPU

META

File File File

9

Gfarm directs file opens to the same place.• This method is very effective for consistency control.• But, it requires more coordination between the nodes i.e. more

network load and overhead.

P1

F1

P2

F2

(1) P1 opens file replica F1

(2) P2 tries to open replica F2 (same file different place)

(3) P2 is redirected to use F1 too, to limit # copies open

10

Summary: What is Gfarm?

• A distributed file system …• with a parallel processing scheduler …• that sends jobs to files, not files to jobs, …• and only one replica can be written at a time!

11

Application: Improving GfarmWhy do we have to improve it?

12

It sounds good, until implementations get too large.• When it becomes global-scale, we have to think differently.• This is what appears to us:

CPU CPU CPU

META

13

It sounds good, until implementations get too large.• But this is reality:

META CPU

CPU

CPU

14

So how do we simplify this problem? We put an overlay network on top.

Physical Network(Reality)

Overlay Network(Gfarm sees)

15

1. It doesn’t care about locality.

• In this case, the two red arrows are “same length” according to this topology, because it’s just one hop apart.

Overlay Network

16

1. It doesn’t care about locality.

• However, it’s not when we look at physical diagram.• Gfarm’s overlay network doesn’t recognize the true distances.

Physical Network

17

2. Conventional network doesn’t use every route.• Examine this topology: there’s more than one way for the circled

nodes to reach each other.

Physical Network

Best Route: Always used

Other Route(s): Rarely used

18

3. If we use every route, which would we use?

CPU CPU

High latency, more bandwidthGood for data transfer

Low latency, less bandwidthGood for control messages

19

We are about to use the SDN.

• SDN = Software-defined network• Concept: Use software to

dynamically add or change network data flows.

Figure:McKeown, N., & Anderson, T. (2008). OpenFlow: enabling innovation in campus networks. Retrieved from http://dl.acm.org/citation.cfm?id=1355746

20

What can the SDN do?

• SDN can practically let us make a whole new protocol by programming a specific “controller” to do the job.• With SDN, we can:• Change settings dynamically• Implement specialized Quality-of-Service (QoS)• Differentiate many kinds of connections

• By application, port, users, network addresses, groups, etc.• Use multi-path routing efficiently• and much more!

21

So what do we want to do?

We want to

acceleratewide-area distributed storage

by using

software defined networkto

optimize the overlay network.

22

23

INFORMATION for GENERAL PUBLIC

• This work was made by a member of Laboratory for Software Design and Analysis, Graduate School of Information Science, Nara Institute of Science and Technology.• This presentation is the first of two required for Master’s degree

graduation and is presented to faculty and students of the Institute.• This file has been modified for public disclosure. Actual content

during presentation was different.

24

BACKUP SLIDES

• Some of them may not make sense.

25

Gfarm job execution relies on file presence.

26

BACKUP: Gfarm’s not Hadoop

• Gfarm isn’t Hadoop: it provides job scheduling that’s not MapReduce. Of course, Gfarm works with Hadoop if you want it to.

http://www.ibm.com/developerworks/cloud/library/cl-openstack-deployhadoop/figure4.gif

Let’s just say Gfarm doesn’t do this:

27

How to work with file replicas

• To open a file in READ mode: Any replica is OK.

Replica Replica Replica Replica

Process Process

Writing Reading

Process

28

How to work with file replicas

• To open a file in WRITE mode (in this order):• If somebody is writing, use a replica already opened in WRITE mode• If nobody is writing, use a replica already opened in READ mode• If nobody is reading, use any replica

Replica Replica Replica Replica

Process Process

Writing Reading

Process

29

BACKUP:2. Why don’t we use every possible route?• So what we can do might be:• Transfer File A over the red path• Transfer File B over the orange path

• The overall bandwidth would be increased!

Physical Network

30

BACKUP:2. Why don’t we use every possible route?• Problems of this solution:• TCP segmentation & reordering• UDP will result in A LOT of unwanted and uncorrectable reordering

• Mitigation:• Separate data & control• Just divide the link at file level, so one file on link A, another file on link B, etc.• We can do this because it’s a file system and may make use of many files at

the same time.

31

BACKUP:Why bandwidth and latency don’t correlate?

• Bandwidth is limited by the link capacity and rate of transmission and receiving.• Latency is caused by processing time.• Per-router processing time is increased in the WAN due to routers being

overwhelmed by general public usage of the Internet• There can be more than 10 hops to reach a node in another country.

32

Actually, why NOT SDN?

• Configuration delay: takes some time for a new route to be installed• Single Point of Failure (for centralized SDNs like OpenFlow)• Cannot easily implement multiple SDN instances• We can however pre-slice the network and run SDN on each “subnet”, or• use solutions like FlowVisor (proxy OpenFlow)

• Controller bugs can break the existing thing (even the simplest controllers can have bugs!)

33

How can we use it with Gfarm?

• Data• Use multiple paths• Prefer bandwidth path

• Control• QoS• Prefer low-latency path

• These methods can be implemented in SDN

34

How can we use it with Gfarm?

• We can use multiple paths to add up bandwidth.• SDN can differentiate between each flow so paths can be separated.

Multi-path routing?

Physical Network

35

How can we use it with Gfarm?

• Control messages prefer low latency• Data transfers prefer greater bandwidth.• SDN knows difference between these uses and can optimize.

Application-aware routing?

CPU CPUMore latencyMore bandwidth Less latency

Less bandwidth

36

How can we use it with Gfarm?

• Critical uses such as control messages can be given priority so they can “skip the (potentially very long) queue” of data packets.• Some SDNs like OpenFlow are beginning to support QoS.

Quality of Service?

Important Can Wait

• VoIP• Streaming data• Control Messages• Synchronous msgs

• Scheduled jobs• Data backup• Background tasks• Unimportant things

Recommended