36
ده ع ش ی وز ت های م ت س ی ش م دز ی م ر ت ل ص ف12 # اب ت ک ازsinghal Advanced Operating Systems Sharif University of Technology

ترمیم در سیستمهای توزیع شده

  • Upload
    tyme

  • View
    91

  • Download
    0

Embed Size (px)

DESCRIPTION

ترمیم در سیستمهای توزیع شده. فصل 12 از کتاب singhal Advanced Operating Systems Sharif University of Technology. ترمیم در سیستمهای توزیع شده. هدف : بازگرداندن سیستم به حالت معمولی و نرمال خود. تغییرات داده شده بوسیله پردازه خطا در undo شوند. منابع اختصاص داده شده پس گرفته شوند. - PowerPoint PPT Presentation

Citation preview

12 singhalAdvanced Operating SystemsSharif University of Technology1 : . undo . . : (). .

! .

2 Failure Recovery ( ) . : Forward Error Recovery : Backward Error RecoveryPerformance penalty 3 (B.E.R) (Recovery Points) . . :

. Log

CPU Stable StorageSecondary Storage4 BER : (Operation Based) : undo . :

(UPDATE-IN-PLACE) Log :: Log

5 BER :do: LogUndo: doRedo: do log WAL WAL: undo log . undo log, redo log .

6 BER 2- ( checkpointing ) checkpoint : : rollback

rollback checkpoint .

Shadow paging 7 ( - )

8 (Domino) X X3 Y Y2 (!)

m ! ( ) X X2 Z Z2 X Y X1 Y1 Z Z1 :: :: XYZX1X2X3Y1Y2Z1Z2m9Lost msg X Y X1 Y1m .XYX1Y110LiveLock .

n1 . Y Y1 m1 X X1 . m1 n1 . Y . n1 m2 . Y n2 m2 n1 . Y .....

XYX1Y1m1n1XYm2n2Roll-backn111 ( ) . : . : . .

12 . . . k !

13 Toueg ,Koo FIFO : (): . : . 14 Toueg ,Koo - :Pi C C . Pi . Pi .

: Pi C . C . .15 Koo ... C !

X m C {X2, Y2, Z2} X2, Y2, Z1}} . C .

XYZX1X2Y1Y2Z1Z2m16 . . m.l . T . Y ,X m C .

last-label-rcvdX[Y] =

first-label-sentX[Y] =

17 X Y C last-label-rcvdX[Y] . Y C last-label-rcvdX[Y] first-label-sentY[X] >

X C Y . Y .

Chkpt-cohortX = {Y | last-label-rcvdX[Y] > } C .

18The Checkpoint AlgorithmInitial state at all processes p:For all processes q do first-label-sentp[q] := ;

OK-to-take-ckptp =

At initiator process Pi:For all processes p ckpt-cohort pi doSend Take-a-tentative-ckpt(Pi, last-label-rcvd pi[p]) message;If all processes replied yes thenFor all processes p ckpt-cohort pi doSend Make-tentative-ckpt-permanent;elseFor all processes p ckpt-cohort pi doSend Undo-tentative-ckpt.

19The algorithm ContinuedAt all processes p:Upon receiving Take-a-tentative-ckpt(q, last-label-rcvd q[p]) message from q doBeginIf OK-to-take-ckptp = yes AND last-label-rcvd q[p] first-label-sentp[q] > thenbegintake a tentative checkpoint;for all processes r ckpt-cohort p do Send Take-a-tentative-ckpt(P,last-label-rcvd p[r]) message;If all processes r ckpt-cohort p replied yes thenOK-to-take-ckptp := yes elseOK-to-take-ckptp = noEnd; Send (p, OK-to-take-ckptp) to q; end;

20The algorithm ContinuedAt all processes p:Upon receiving Make-tentative-ckpt-permanent message doBeginMake tentative checkpoint permanent;For all processes r ckpt-cohort p doSend Make-tentative-ckpt-permanent message;End;Upon receiving Undo-tentative-ckpt-permanent message doBeginUndo tentative checkpoint;For all processes r ckpt-cohort p doSend Undo-tentative-ckpt-permanent message;End;

21Rollback-Recovery: .

: (Pi) C C R "no" . . : Pi . .22Rollback-Recovery Continued :

X Z .

XYZX1X2Y1Y2Z2Z123Rollback-Recovery Continued:

Last-Label-SentX[Y] =

x y C Last-Label-SentX[Y] . Y C Last-Label-RcvdY[X] > Last-Label-SentX[Y] X X y undo .roll-cohortX = {Y|X can send msgs to Y}

Largest Value24The Recovery AlgorithmInitial state at process P: Resume-execution := true;For all processes q, doLast-label-rcvdp[q] := T; Willing-to-rollp = At initiator process Pi:For all processes p roll-cohortpi doSend Prepare-to-rollback (Pi, last-label-sentPi[p]) message;If all processes replied Yes then for all p roll-cohortpi doSend Roll-back message; else for all processes proll-cohortpi do Send Donot-roll-back message;

25The algorithm ContinuedAt all processes p: Upon receiving Prepare-to-rollback(q, last-label-sentq[p]) message from q do BeginIf willing-to-rollp AND last-label-rcvdp[q] > last-label-sentq[p] AND (resume-executionp) Then Begin Resume-executionp := false; For all processes r roll-cohortp doSend Prepare-to-rollback(p, last-label-sentp[r]) message; If all processes r roll-cohortp replied yes thenwilling-to-rollp := yeselsewilling-to-rollp := noend;Send (p, willing-to-rollp) message tp q;End;

26The algorithm ContinuedUpon receiving Roll-back message AND if resume-executionp = false doBeginRestart from ps permanent checkpoint;For all processes r roll-cohortp doSend Roll-back message;End;Upon receiving Donot-roll-back message doBeginResume execution;For all processes r roll-cohortp doSend Roll-back message;End;

27Async Checkpointing & Recovery : C . C . C .

() C . C . C . undo Rollback Log redo .

28 Juang & Venkatesan checkpointing Log :: : :{s, m, msg-sent}

: event-driven fire . .

( ) 29 Juang & Venkatesan : NotationsRCVDij(chpti) j i . chpti .SENT ij(chpti) i j chpti . . rollback ( ). ......

30 Juang & Venkatesan Y eY1 Y X X Y . X eX2 Y . Z .

ex0XYZex1ex2ex3ey0ey1ey2ey3ez0ez1ez2ez331:: fail . . . i : : Log ckpti = ckpti = ( )

32 for k:=1 to N do (* N is the # of processors *)beginfor each neighboring process j doSend ROLLBACK(I, SENTij(ckpti)) msg;wait for ROLLBACK msg from every neighbor( ).for every ROLLBACK(j,c) received from j,i does the following:if RCVDij(ckpti) > c then /* */beginfind the latest event e such that RCVDij(e)cj;ckpti := e;end;end;

33 : :

Y Y1 . ey2 log . X Z .

XYZey0ey1ey2ey3ez0ez1ez2ez3ex0ex1ex2ex3X1Y1Z1failure34 X ckptX eX3 RollBack(X,2) to Y, RollBack(X,0) to ZY ckptY eY2RollBack(Y,2) to X, RollBack(Y,1) to ZZ ckptZ eZ2RollBack(Z,0) to X, RollBack(Z,1) to Y

X RCVDXY(ckptX) = 3> 2 ckptX eX2 :

RCVD XY(eX2) = 2 2 Z RCVDZY(ckptZ) = 2> 1 ckptZ = eZ1 Y

Y .

RollBack35 : Y RollBack(Y,2) to X, RollBack(Y,1) to ZX RollBack(X,0) to Z, RollBack(X,1) to YZ RollBack(Z,1) to Y, RollBack(Z,0) to X

ckpt Z, Y, X ex2 eY2 eZ1 . ex2} eY2 eZ1 { . .

36