Transcript
Page 1: An Update Model for Network Coding in Cloud Storage Systems

1

An Update Model for Network Coding in Cloud Storage Systems

2012 50th Annual Allerton Conference onCommunication, Control, and Computing

Mohammad Reza ZakerinasabMea Wang

Department of Computer Science University of Calgary

Page 2: An Update Model for Network Coding in Cloud Storage Systems

2

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

Page 3: An Update Model for Network Coding in Cloud Storage Systems

3

Network Coding (1/2)

ه There are different mechanisms for arranging file copies among storage nodes or devicesه standard RAID architecturesه erasure codeه network coding

ه The network coding in cloud storage systems allows storage nodes to collectively host multiple copies of a file.

Page 4: An Update Model for Network Coding in Cloud Storage Systems

4

Network Coding (2/2)

ه In a network-coding-assisted cloud storage systemه a file is divided into n blocks

ى encoded using random coefficients.

ه encoded blocks are distributed in the Cloud. ى decoded the n encoded blocks from any subset of the storage nodes.

Page 5: An Update Model for Network Coding in Cloud Storage Systems

5

Problem Definition

ه Existing works have been focusing on mechanisms for preserving the level of redundancy.

ه However, the most frequent operations maintaining coded information in the system up to date performed on files.ه file updates

ه Any change in the file will impact all coded blocks in the system.ه replace all traces of the file

Page 6: An Update Model for Network Coding in Cloud Storage Systems

6

Application

ه GoogleDocs : online collaborative office suites, let users create, edit and publish a document collaboratively from around the world.

ه When a file is updated, even changing a single byte can outdate all coded blocks in the system.ه re-computationsه re-deliveries

Page 7: An Update Model for Network Coding in Cloud Storage Systems

7

Problems

ه Re-computing coded blocks is very CPU intensive.

ه Replacing all the coded blocks consumes large amount of bandwidth.

Page 8: An Update Model for Network Coding in Cloud Storage Systems

8

Proposed Model

ه Sending only the modified parts with a minimum possible overhead.

ه The mathematical model of Differential Update Mechanism (DUM) was presented by this paper.ه update algorithms can be performed on all nodes.

ه The simulation results show that the proposed DUM saving a significant bandwidth in a cloud storage system.

Page 9: An Update Model for Network Coding in Cloud Storage Systems

9

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

Page 10: An Update Model for Network Coding in Cloud Storage Systems

10

Related Works (1/2)

ه Commercial cloud storage systems, such as Microsoft Azure [8] and Google Cloud [9], utilize source erasure codes.

ه Network coding was originally proposed in information theory in 2000 [1].

ه In contrast to source erasure codes, network coding applies coding at intermediate relay nodes throughout the network.

Page 11: An Update Model for Network Coding in Cloud Storage Systems

11

Related Works (2/2)

ه The benefits for coding at intermediate nodes include ه high throughput [1], [3]ه efficient routing algorithm design [17]ه energy savings in wireless networking [18]ه security [19]

ه The closest related works of update problem are on the repair problemه provide mechanisms for one or more nodes fail [25].ه preserve the level of redundancy.

Page 12: An Update Model for Network Coding in Cloud Storage Systems

12

Reference

ه [1] R. Ahlswede, N. Cai, S. R. Li, and R. W. Yeung, “Network Information Flow,” IEEE Transactions on Information Theory, vol. 46, no. 4, pp. 1204–1216, July 2000.

ه [3] R. Koetter and M. Medard, “An Algebraic Approach to Network Coding,” IEEE/ACM Transactions on Networking, vol. 11, no. 5, pp. 782–795, October 2003.

ه [8] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ul Haq, M. I. ul Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, , and L. Rigas, “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency,” in Proc. of the 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, October 23-26 2011, pp. 143–157.

Page 13: An Update Model for Network Coding in Cloud Storage Systems

13

Reference

ه [9] D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan, “Availability in Globally Distributed Storage Systems,” in Proc. of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Vancouver, BC, October 4-6 2010, pp. 1–14.

ه [17] D. S. Lun, N. Ratnakar, R. Koetter, M. Medard, E. Ahmed, and H. Lee, “Achieving Minimum Cost Multicast: A Decentralized Approach Based on Network Coding,” in Proc. of the 24th Conference of the IEEE Communications Society (INFOCOM), Miami, FL, March 13- 17 2005, pp. 1607–1617.

ه [18] H. Rahul, W. Hu, D. Katabi, M. Medard, and J. Crowcroft, “XORs in the Air: Practical Wireless Network Coding,” IEEE/ACM Transactions on Networking, vol. 16, no. 3, pp. 497–510, June 2008.

ه [19] C. Gkantsidis and P. Rodriguez, “Cooperative Security for Network Coding File Distribution,” in Proc. of the 25th Conference of the IEEE Communications Society (INFOCOM), Barcelona, Spain, April 23-29 2006, pp. 1–13.

Page 14: An Update Model for Network Coding in Cloud Storage Systems

14

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

Page 15: An Update Model for Network Coding in Cloud Storage Systems

15

Modeling the Storage Cloud System

Storage Cloud

End Hosts

Page 16: An Update Model for Network Coding in Cloud Storage Systems

16

Modeling the Storage Cloud System

ه Model simplification assumptions : 1. A single original copy of each file is hosted among the source

nodes in the Cloud.ى each source node owns a disjoint set of files.

2. Each node can only be a source node, a storage, or a target node at a time.ى nodes of the same type do not connect to each other.

3. It is common for a storage system to distribute R 1 copies of each file to provide data redundancy, where R is the replication factor.

Page 17: An Update Model for Network Coding in Cloud Storage Systems

17

Network Coding in the Storage Cloud System

ه With randomized network coding, a file is divided into n original blocks B = [b1, b2, …, bn], where bi has a fixed number of bytes s.

ه Encoding a new block ci

ه the source node first independently and randomly chooses a set of coding coefficients εi = [εi,1, εi,2, … , εi,n] in the Galois field GF(28).ى .

……B =

b1, b2, b3, ..…. bj

c1, c2, c3, . . . . . . , cR*n

b1, b2, b3,.. bn

Page 18: An Update Model for Network Coding in Cloud Storage Systems

18

Network Coding in the Storage Cloud System

ه Decoding : any n of the R n coded blocks are linearly independent and can be used to recover all original blocks of the corresponding file.ه a target node locates and downloads n coded blocks, C = [c1, c2,… , cn], from the storage nodes.

ه Given the encoding matrix ξ = [ε1, ε2, … , εn], the original blocks B = [b1, b2, …, bn] can be recovered by:

ى .

Page 19: An Update Model for Network Coding in Cloud Storage Systems

19

The Update Problem

ه For every single update, we must ه transmit R n new coded blocks from the source nodes to the

storage nodes.ه transmit K n coded blocks from the storage nodes to the target

nodes.

Page 20: An Update Model for Network Coding in Cloud Storage Systems

20

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

Page 21: An Update Model for Network Coding in Cloud Storage Systems

21

Differential Update Model (DUM)

ه They believe that the update problem is just as essential as the repair problem.

ه They propose the DUM to update coded blocks by delivering only the blocks that are affected by the updates.ه avoids transmissions of the entire file for each update.

Page 22: An Update Model for Network Coding in Cloud Storage Systems

22

Updating Coded Blocks

ه Assume that the current version number of a file is v, then version v 1 involves arbitrary updates in n’ n blocks of the file.ه B = [b1, b2, …, bn] be the original file of version v.ه B’ = [b1’, b2’, …, bn’] be the updated file of version v 1.ه For each block bi’ in version v 1, we can express it as bi δi, where

δi is the differential vector.

ى .

Δ = [δ1, δ2, δ3 , … , δn], differential matrixى

Page 23: An Update Model for Network Coding in Cloud Storage Systems

23

Updating Coded Blocks

ه To encode a new block for version v 1, the source node again randomly chooses a set of coding coefficients εi’ = [εi,1’, εi,2’, … , εi,n’] in the Galois field GF(28).

ه .

Page 24: An Update Model for Network Coding in Cloud Storage Systems

24

Updating Storage Nodes

ه A significant amount of bandwidth can be saved since most updates will affect only a smaller portion of a file.

ه Recover Δ from Δ’ه reconstructed by inserting the zero δ-vectors into Δ’ according to

the update vector u .

Page 25: An Update Model for Network Coding in Cloud Storage Systems

25

Updating Storage Nodes

ه Send the non-zero rows of Δ’ = [δ1, δ2, δ3, … , δn’]ه Update vector uv+1 = [uv+1,1, uv+1,2,..., uv+1,n]

ه .

ه Encode the matrix Δ’,

ه Decode the matrix Δ’,

Page 26: An Update Model for Network Coding in Cloud Storage Systems

26

Updating Storage Nodes

Page 27: An Update Model for Network Coding in Cloud Storage Systems

27

Updating Target Nodes

Page 28: An Update Model for Network Coding in Cloud Storage Systems

28

Aggregating Updates Across Multiple Versions (1/4)

ه Storage nodes and target nodes may not be always synchronized to the latest version.ه may miss several updates due to various reasons.

ه Assume that the node missed m update ه current version is v.ه actual version of file is v m.

Page 29: An Update Model for Network Coding in Cloud Storage Systems

29

Aggregating Updates Across Multiple Versions (2/4)

ه A coded block in version v may be expressed in terms of the coded blocks of version 0 and the summation of coded δ-blocks from version 0 to version m.

Page 30: An Update Model for Network Coding in Cloud Storage Systems

30

Aggregating Updates Across Multiple Versions (3/4)

ه To support such an aggregated update, the update table that stores ه the update vectorsه the coded δ- blocks

ه If a storage node misses one or more updates, then find the first non-empty entry following the empty entries. ه the aggregated Δ’ containing changes across the missing versions.

Page 31: An Update Model for Network Coding in Cloud Storage Systems

31

Aggregating Updates Across Multiple Versions (4/4)

ه Computational overheadه generation of the aggregated update vector

ى .

ه generation of n’ aggregated coded δ-vectorsى .

Page 32: An Update Model for Network Coding in Cloud Storage Systems

32

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

Page 33: An Update Model for Network Coding in Cloud Storage Systems

33

Numerical Analysis

ه The bandwidth saving in updating the storage nodes using DUM.ه .

ه The bandwidth saving in updating the target nodes using DUM.ه .

Page 34: An Update Model for Network Coding in Cloud Storage Systems

34

Experiment Results (1/7)

ه The number of blocks n should be no more than 100 to ensure that network coding operates at a rate faster than a typical transmission rate in a network.

ه We compare the performance of conventional network coding update (NC) and DUM.

Page 35: An Update Model for Network Coding in Cloud Storage Systems

35

Experiment Results (2/7)

ه Bandwidth usages

Page 36: An Update Model for Network Coding in Cloud Storage Systems

36

Experiment Results (3/7)

ه Bandwidth usage and Computational cost

Page 37: An Update Model for Network Coding in Cloud Storage Systems

37

Experiment Results (4/7)

ه Computational cost on storage nodes dominates the overall cost.

Page 38: An Update Model for Network Coding in Cloud Storage Systems

38

Experiment Results (5/7)

ه Aggregated updates

Page 39: An Update Model for Network Coding in Cloud Storage Systems

39

Experiment Results (6/7)

ه Update affects

Page 40: An Update Model for Network Coding in Cloud Storage Systems

40

Experiment Results (7/7)

ه Simulation study

ه Diff [31], bsDiff [32]

[31] J. W. Hunt and M. D. McIlroy, “An Algorithm for Differential File Comparison,” Bell Laboratories 41, Computing Science Technical Report, June 1976.[32] C. Percival, “Matching with Mismatches and Assorted Applications,” Ph.D. dissertation, Wadham College, University of Oxford, 2006.

Page 41: An Update Model for Network Coding in Cloud Storage Systems

41

Outline

ه Introductionه Related Worksه Proposed Systemه Differential Update Modelه Evaluationه Conclusion

Page 42: An Update Model for Network Coding in Cloud Storage Systems

42

Conclusion

ه DUM saves both the communication and computational costs, unless the update affects almost the entire file

ه DUM conserves CPU cycles for large files and when the data is more scattered in the Cloud.

ه This paper only considered n’ is smaller than n, what’s happened if n’ is large than n ?