An Update Model for Network Coding in Cloud Storage Systems

Preview:

DESCRIPTION

An Update Model for Network Coding in Cloud Storage Systems. 2012 50th Annual Allerton Conference on Communication , Control, and Computing Mohammad Reza Zakerinasab Mea Wang Department of Computer Science University of Calgary. Outline. Introduction Related Works Proposed System - PowerPoint PPT Presentation

Citation preview

1

An Update Model for Network Coding in Cloud Storage Systems

2012 50th Annual Allerton Conference onCommunication, Control, and Computing

Mohammad Reza ZakerinasabMea Wang

Department of Computer Science University of Calgary

2

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

3

Network Coding (1/2)

ه There are different mechanisms for arranging file copies among storage nodes or devicesه standard RAID architecturesه erasure codeه network coding

ه The network coding in cloud storage systems allows storage nodes to collectively host multiple copies of a file.

4

Network Coding (2/2)

ه In a network-coding-assisted cloud storage systemه a file is divided into n blocks

ى encoded using random coefficients.

ه encoded blocks are distributed in the Cloud. ى decoded the n encoded blocks from any subset of the storage nodes.

5

Problem Definition

ه Existing works have been focusing on mechanisms for preserving the level of redundancy.

ه However, the most frequent operations maintaining coded information in the system up to date performed on files.ه file updates

ه Any change in the file will impact all coded blocks in the system.ه replace all traces of the file

6

Application

ه GoogleDocs : online collaborative office suites, let users create, edit and publish a document collaboratively from around the world.

ه When a file is updated, even changing a single byte can outdate all coded blocks in the system.ه re-computationsه re-deliveries

7

Problems

ه Re-computing coded blocks is very CPU intensive.

ه Replacing all the coded blocks consumes large amount of bandwidth.

8

Proposed Model

ه Sending only the modified parts with a minimum possible overhead.

ه The mathematical model of Differential Update Mechanism (DUM) was presented by this paper.ه update algorithms can be performed on all nodes.

ه The simulation results show that the proposed DUM saving a significant bandwidth in a cloud storage system.

9

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

10

Related Works (1/2)

ه Commercial cloud storage systems, such as Microsoft Azure [8] and Google Cloud [9], utilize source erasure codes.

ه Network coding was originally proposed in information theory in 2000 [1].

ه In contrast to source erasure codes, network coding applies coding at intermediate relay nodes throughout the network.

11

Related Works (2/2)

ه The benefits for coding at intermediate nodes include ه high throughput [1], [3]ه efficient routing algorithm design [17]ه energy savings in wireless networking [18]ه security [19]

ه The closest related works of update problem are on the repair problemه provide mechanisms for one or more nodes fail [25].ه preserve the level of redundancy.

12

Reference

ه [1] R. Ahlswede, N. Cai, S. R. Li, and R. W. Yeung, “Network Information Flow,” IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 1204–1216, July 2000.

ه [3] R. Koetter and M. Medard, “An Algebraic Approach to Network Coding,” IEEE/ACM Transactions on Networking, vol. 11, no. 5, pp. 782–795, October 2003.

ه [8] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ul Haq, M. I. ul Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, , and L. Rigas, “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency,” in Proc. of the 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, October 23-26 2011, pp. 143–157.

13

Reference

ه [9] D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan, “Availability in Globally Distributed Storage Systems,” in Proc. of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Vancouver, BC, October 4-6 2010, pp. 1–14.

ه [17] D. S. Lun, N. Ratnakar, R. Koetter, M. Medard, E. Ahmed, and H. Lee, “Achieving Minimum Cost Multicast: A Decentralized Approach Based on Network Coding,” in Proc. of the 24th Conference of the IEEE Communications Society (INFOCOM), Miami, FL, March 13- 17 2005, pp. 1607–1617.

ه [18] H. Rahul, W. Hu, D. Katabi, M. Medard, and J. Crowcroft, “XORs in the Air: Practical Wireless Network Coding,” IEEE/ACM Transactions on Networking, vol. 16, no. 3, pp. 497–510, June 2008.

ه [19] C. Gkantsidis and P. Rodriguez, “Cooperative Security for Network Coding File Distribution,” in Proc. of the 25th Conference of the IEEE Communications Society (INFOCOM), Barcelona, Spain, April 23-29 2006, pp. 1–13.

14

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

15

Modeling the Storage Cloud System

Storage Cloud

End Hosts

16

Modeling the Storage Cloud System

ه Model simplification assumptions : 1. A single original copy of each file is hosted among the source

nodes in the Cloud.ى each source node owns a disjoint set of files.

2. Each node can only be a source node, a storage, or a target node at a time.ى nodes of the same type do not connect to each other.

3. It is common for a storage system to distribute R 1 copies of each file to provide data redundancy, where R is the replication factor.

17

Network Coding in the Storage Cloud System

ه With randomized network coding, a file is divided into n original blocks B = [b1, b2, …, bn], where bi has a fixed number of bytes s.

ه Encoding a new block ci

ه the source node first independently and randomly chooses a set of coding coefficients εi = [εi,1, εi,2, … , εi,n] in the Galois field GF(28).ى .

……B =

b1, b2, b3, ..…. bj

c1, c2, c3, . . . . . . , cR*n

b1, b2, b3,.. bn

18

Network Coding in the Storage Cloud System

ه Decoding : any n of the R n coded blocks are linearly independent and can be used to recover all original blocks of the corresponding file.ه a target node locates and downloads n coded blocks, C = [c1, c2,… , cn], from the storage nodes.

ه Given the encoding matrix ξ = [ε1, ε2, … , εn], the original blocks B = [b1, b2, …, bn] can be recovered by:

ى .

19

The Update Problem

ه For every single update, we must ه transmit R n new coded blocks from the source nodes to the

storage nodes.ه transmit K n coded blocks from the storage nodes to the target

nodes.

20

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

21

Differential Update Model (DUM)

ه They believe that the update problem is just as essential as the repair problem.

ه They propose the DUM to update coded blocks by delivering only the blocks that are affected by the updates.ه avoids transmissions of the entire file for each update.

22

Updating Coded Blocks

ه Assume that the current version number of a file is v, then version v 1 involves arbitrary updates in n’ n blocks of the file.ه B = [b1, b2, …, bn] be the original file of version v.ه B’ = [b1’, b2’, …, bn’] be the updated file of version v 1.ه For each block bi’ in version v 1, we can express it as bi δi, where

δi is the differential vector.

ى .

Δ = [δ1, δ2, δ3 , … , δn], differential matrixى

23

Updating Coded Blocks

ه To encode a new block for version v 1, the source node again randomly chooses a set of coding coefficients εi’ = [εi,1’, εi,2’, … , εi,n’] in the Galois field GF(28).

ه .

24

Updating Storage Nodes

ه A significant amount of bandwidth can be saved since most updates will affect only a smaller portion of a file.

ه Recover Δ from Δ’ه reconstructed by inserting the zero δ-vectors into Δ’ according to

the update vector u .

25

Updating Storage Nodes

ه Send the non-zero rows of Δ’ = [δ1, δ2, δ3, … , δn’]ه Update vector uv+1 = [uv+1,1, uv+1,2,..., uv+1,n]

ه .

ه Encode the matrix Δ’,

ه Decode the matrix Δ’,

26

Updating Storage Nodes

27

Updating Target Nodes

28

Aggregating Updates Across Multiple Versions (1/4)

ه Storage nodes and target nodes may not be always synchronized to the latest version.ه may miss several updates due to various reasons.

ه Assume that the node missed m update ه current version is v.ه actual version of file is v m.

29

Aggregating Updates Across Multiple Versions (2/4)

ه A coded block in version v may be expressed in terms of the coded blocks of version 0 and the summation of coded δ-blocks from version 0 to version m.

30

Aggregating Updates Across Multiple Versions (3/4)

ه To support such an aggregated update, the update table that stores ه the update vectorsه the coded δ- blocks

ه If a storage node misses one or more updates, then find the first non-empty entry following the empty entries. ه the aggregated Δ’ containing changes across the missing versions.

31

Aggregating Updates Across Multiple Versions (4/4)

ه Computational overheadه generation of the aggregated update vector

ى .

ه generation of n’ aggregated coded δ-vectorsى .

32

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

33

Numerical Analysis

ه The bandwidth saving in updating the storage nodes using DUM.ه .

ه The bandwidth saving in updating the target nodes using DUM.ه .

34

Experiment Results (1/7)

ه The number of blocks n should be no more than 100 to ensure that network coding operates at a rate faster than a typical transmission rate in a network.

ه We compare the performance of conventional network coding update (NC) and DUM.

35

Experiment Results (2/7)

ه Bandwidth usages

36

Experiment Results (3/7)

ه Bandwidth usage and Computational cost

37

Experiment Results (4/7)

ه Computational cost on storage nodes dominates the overall cost.

38

Experiment Results (5/7)

ه Aggregated updates

39

Experiment Results (6/7)

ه Update affects

40

Experiment Results (7/7)

ه Simulation study

ه Diff [31], bsDiff [32]

[31] J. W. Hunt and M. D. McIlroy, “An Algorithm for Differential File Comparison,” Bell Laboratories 41, Computing Science Technical Report, June 1976.[32] C. Percival, “Matching with Mismatches and Assorted Applications,” Ph.D. dissertation, Wadham College, University of Oxford, 2006.

41

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

42

Conclusion

ه DUM saves both the communication and computational costs, unless the update affects almost the entire file

ه DUM conserves CPU cycles for large files and when the data is more scattered in the Cloud.

ه This paper only considered n’ is smaller than n, what’s happened if n’ is large than n ?

Recommended