Upload
tyme
View
91
Download
0
Embed Size (px)
DESCRIPTION
ترمیم در سیستمهای توزیع شده. فصل 12 از کتاب singhal Advanced Operating Systems Sharif University of Technology. ترمیم در سیستمهای توزیع شده. هدف : بازگرداندن سیستم به حالت معمولی و نرمال خود. تغییرات داده شده بوسیله پردازه خطا در undo شوند. منابع اختصاص داده شده پس گرفته شوند. - PowerPoint PPT Presentation
Citation preview
12 singhalAdvanced Operating SystemsSharif University of Technology1 : . undo . . : (). .
! .
2 Failure Recovery ( ) . : Forward Error Recovery : Backward Error RecoveryPerformance penalty 3 (B.E.R) (Recovery Points) . . :
. Log
CPU Stable StorageSecondary Storage4 BER : (Operation Based) : undo . :
(UPDATE-IN-PLACE) Log :: Log
5 BER :do: LogUndo: doRedo: do log WAL WAL: undo log . undo log, redo log .
6 BER 2- ( checkpointing ) checkpoint : : rollback
rollback checkpoint .
Shadow paging 7 ( - )
8 (Domino) X X3 Y Y2 (!)
m ! ( ) X X2 Z Z2 X Y X1 Y1 Z Z1 :: :: XYZX1X2X3Y1Y2Z1Z2m9Lost msg X Y X1 Y1m .XYX1Y110LiveLock .
n1 . Y Y1 m1 X X1 . m1 n1 . Y . n1 m2 . Y n2 m2 n1 . Y .....
XYX1Y1m1n1XYm2n2Roll-backn111 ( ) . : . : . .
12 . . . k !
13 Toueg ,Koo FIFO : (): . : . 14 Toueg ,Koo - :Pi C C . Pi . Pi .
: Pi C . C . .15 Koo ... C !
X m C {X2, Y2, Z2} X2, Y2, Z1}} . C .
XYZX1X2Y1Y2Z1Z2m16 . . m.l . T . Y ,X m C .
last-label-rcvdX[Y] =
first-label-sentX[Y] =
17 X Y C last-label-rcvdX[Y] . Y C last-label-rcvdX[Y] first-label-sentY[X] >
X C Y . Y .
Chkpt-cohortX = {Y | last-label-rcvdX[Y] > } C .
18The Checkpoint AlgorithmInitial state at all processes p:For all processes q do first-label-sentp[q] := ;
OK-to-take-ckptp =
At initiator process Pi:For all processes p ckpt-cohort pi doSend Take-a-tentative-ckpt(Pi, last-label-rcvd pi[p]) message;If all processes replied yes thenFor all processes p ckpt-cohort pi doSend Make-tentative-ckpt-permanent;elseFor all processes p ckpt-cohort pi doSend Undo-tentative-ckpt.
19The algorithm ContinuedAt all processes p:Upon receiving Take-a-tentative-ckpt(q, last-label-rcvd q[p]) message from q doBeginIf OK-to-take-ckptp = yes AND last-label-rcvd q[p] first-label-sentp[q] > thenbegintake a tentative checkpoint;for all processes r ckpt-cohort p do Send Take-a-tentative-ckpt(P,last-label-rcvd p[r]) message;If all processes r ckpt-cohort p replied yes thenOK-to-take-ckptp := yes elseOK-to-take-ckptp = noEnd; Send (p, OK-to-take-ckptp) to q; end;
20The algorithm ContinuedAt all processes p:Upon receiving Make-tentative-ckpt-permanent message doBeginMake tentative checkpoint permanent;For all processes r ckpt-cohort p doSend Make-tentative-ckpt-permanent message;End;Upon receiving Undo-tentative-ckpt-permanent message doBeginUndo tentative checkpoint;For all processes r ckpt-cohort p doSend Undo-tentative-ckpt-permanent message;End;
21Rollback-Recovery: .
: (Pi) C C R "no" . . : Pi . .22Rollback-Recovery Continued :
X Z .
XYZX1X2Y1Y2Z2Z123Rollback-Recovery Continued:
Last-Label-SentX[Y] =
x y C Last-Label-SentX[Y] . Y C Last-Label-RcvdY[X] > Last-Label-SentX[Y] X X y undo .roll-cohortX = {Y|X can send msgs to Y}
Largest Value24The Recovery AlgorithmInitial state at process P: Resume-execution := true;For all processes q, doLast-label-rcvdp[q] := T; Willing-to-rollp = At initiator process Pi:For all processes p roll-cohortpi doSend Prepare-to-rollback (Pi, last-label-sentPi[p]) message;If all processes replied Yes then for all p roll-cohortpi doSend Roll-back message; else for all processes proll-cohortpi do Send Donot-roll-back message;
25The algorithm ContinuedAt all processes p: Upon receiving Prepare-to-rollback(q, last-label-sentq[p]) message from q do BeginIf willing-to-rollp AND last-label-rcvdp[q] > last-label-sentq[p] AND (resume-executionp) Then Begin Resume-executionp := false; For all processes r roll-cohortp doSend Prepare-to-rollback(p, last-label-sentp[r]) message; If all processes r roll-cohortp replied yes thenwilling-to-rollp := yeselsewilling-to-rollp := noend;Send (p, willing-to-rollp) message tp q;End;
26The algorithm ContinuedUpon receiving Roll-back message AND if resume-executionp = false doBeginRestart from ps permanent checkpoint;For all processes r roll-cohortp doSend Roll-back message;End;Upon receiving Donot-roll-back message doBeginResume execution;For all processes r roll-cohortp doSend Roll-back message;End;
27Async Checkpointing & Recovery : C . C . C .
() C . C . C . undo Rollback Log redo .
28 Juang & Venkatesan checkpointing Log :: : :{s, m, msg-sent}
: event-driven fire . .
( ) 29 Juang & Venkatesan : NotationsRCVDij(chpti) j i . chpti .SENT ij(chpti) i j chpti . . rollback ( ). ......
30 Juang & Venkatesan Y eY1 Y X X Y . X eX2 Y . Z .
ex0XYZex1ex2ex3ey0ey1ey2ey3ez0ez1ez2ez331:: fail . . . i : : Log ckpti = ckpti = ( )
32 for k:=1 to N do (* N is the # of processors *)beginfor each neighboring process j doSend ROLLBACK(I, SENTij(ckpti)) msg;wait for ROLLBACK msg from every neighbor( ).for every ROLLBACK(j,c) received from j,i does the following:if RCVDij(ckpti) > c then /* */beginfind the latest event e such that RCVDij(e)cj;ckpti := e;end;end;
33 : :
Y Y1 . ey2 log . X Z .
XYZey0ey1ey2ey3ez0ez1ez2ez3ex0ex1ex2ex3X1Y1Z1failure34 X ckptX eX3 RollBack(X,2) to Y, RollBack(X,0) to ZY ckptY eY2RollBack(Y,2) to X, RollBack(Y,1) to ZZ ckptZ eZ2RollBack(Z,0) to X, RollBack(Z,1) to Y
X RCVDXY(ckptX) = 3> 2 ckptX eX2 :
RCVD XY(eX2) = 2 2 Z RCVDZY(ckptZ) = 2> 1 ckptZ = eZ1 Y
Y .
RollBack35 : Y RollBack(Y,2) to X, RollBack(Y,1) to ZX RollBack(X,0) to Z, RollBack(X,1) to YZ RollBack(Z,1) to Y, RollBack(Z,0) to X
ckpt Z, Y, X ex2 eY2 eZ1 . ex2} eY2 eZ1 { . .
36