OpenDaylight의 High Availability 기능 분석

OpenDaylight의 High Availability 기능 분석

2015.5.19 (주)파이오링크 SDN 개발실 백승훈([email protected])

mailto:[email protected]

© PIOLINK, Inc. SDN No.1

목차

▪ High Availability in OpenDaylight ▪ Raft Algorithm ▪ 2-Node Clustering ▪ Clustering Scenario ▪ Summary ▪ Reference

2


High Availability in OpenDaylight▪ ODL(OpenDaylight)은 Clustering을 통해 HA를 지원함

– Akka 프레임워크 기반의 clustering 구현과 Raft 알고리즘을 이용한 분산처리 작업으로 HA를 지원 – Node의 수가 2개인 경우는 수정된 Raft 알고리즘으로 HA를 지원 (2-node clustering)

▪ HA(High Availability) – 중단 없는 서비스를 제공

•서버에 장애 발생 시 다른 서버가 대신 작업을 처리 •운영 서버 업그레이드 작업 시 다른 서버로 대체

▪ Akka – Actor 모델 기반의 병렬 및 분산 처리 프로그램을 위한 툴킷

•Actor는 비동기 병렬 시스템 모델로 메시지를 이용해 정보를 공유함 •Actor는 아래 다섯 가지 특징을 가짐

1. Actor는 상태를 공유하지 않음 2. Actor 간 자원 공유는 메시지를 사용 (공유 메모리 X) 3. Actor 간 통신은 비동기 방식을 사용 (데드락 등 동기화 관련 문제 해결) 4. Actor는 전달받은 메시지를 큐에 보관하여 차례로 처리 5. Actor는 일종의 경량화된 프로세스임 (자원이 분리됨)

– Akka는 Java와 Scala API를 지원

3


Raft Algorithm▪ Raft 알고리즘

– 분산처리 작업에서 시스템의 신뢰성을 제공하기 위한 알고리즘 – Raft는 사용자가 알고리즘에 대해 쉽게 이해하도록 돕는 것이 가장 큰 목표 – 서버가 3개 이상부터 지원되며 홀수개의 서버 구성을 권장 – 간단한 Leader election 방법을 사용 – 강력한 Leader의 역할을 가짐

•오직 Leader만 Client와 통신함 •모든 서버 스토리지를 Leader에 동기화

– Raft 서버는 3가지 상태를 가짐 •Leader, Follower, Candidate

4

Raft 사용자 스터디 결과 (좌: 퀴즈 점수, 우: 설문 결과)

Server Failure Tolerance1 02 03 14 15 2

ODL은 알고리즘을 수정해 2-node 지원(2-node clustering)


Raft Algorithm▪ Leader Election

– 현재 Leader의 부재로 새로운 Leader를 결정하는 과정 – Follower의 내부 타이머가 종료될 때까지 Leader로부터 heartbeat가 없으면 election 시작

•Follower의 내부 타이머가 종료되면 “term +1” 후 Candidate로 상태 변경 (term은 election을 구분) •Candidate는 다른 Follower에게 election을 알리고 응답을 기다림

– Leader election 과정은 아래와 같은 상황이 되면 종료됨 •반 이상의 Follower가 투표할 경우 자신이 Leader로 상태 변경 •다른 Leader가 선출된 경우 Follower로 상태 변경 •Election 타임아웃에서는 term을 증가시키고 다시 election을 시작

5

Follower Candidate Leader

Election 타임아웃

과반수의 Follower에게투표 받음

더 높은 term의 Leader 발견

Heartbeat 타임아웃election 시작

다른 Leader가 선출됨

시작


Raft Algorithm▪ Log Replication

– Leader는 Client에게 받은 요청을 처리하기 전에 다른 서버에 복제함 – Leader는 다른 서버의 과반수에 데이터 복제에 성공하면 committed 작업을 수행

•committed는 스토리지 내구성을 보증하고 Client의 요청을 state machine에 넘기는 과정 – committed 후 Leader는 state machine을 통해 요청받은 작업을 수행 – Log Entry = term + command + index

6


Raft Algorithm▪ Nomal Operation

– 정상 동작에서는 Leader와 Follower만 존재함 (election이 발생할 일이 없으므로) – Leader는 주기적으로 모든 Follower에게 heartbeat을 보냄

•Heartbeat을 받은 Follower는 랜덤하게 내부 타이머를 초기화하고 응답을 보냄 – Client 요청 처리

•Client의 요청은 항상 Leader를 통해 처리됨 – Follower가 요청을 받은 경우 Leader에게 리다이렉션 시킴

•Leader는 Client의 요청을 Follower에게 복제함 •반 이상의 Follower로부터 복제 성공 메시지를 받으면 Leader는 committed 함 •committed 된 데이터는 state machine을 통해 처리 •모든 Follower의 복제가 성공할 때까지 Leader는 계속 시도함

7

L

CF

F

요청(Set 7) Set 7 L

CF

F

성공 L

CF

F

응답 committed

data: 7

data: 0

data: 0

data: 7

data: 7

data: 7

data: 7

data: 7

data: 7

(C = Client, L = Leader, F = Follower)


Raft Algorithm▪데이터 복구 과정

– Leader는 nextIndex와 term을 이용해 Follower가 가진 정보를 체크함 •nextIndex는 Leader가 기억하는 Follower entry의 위치

– 데이터를 동기화 방법 •Follower가 데이터를 잃은 경우(a) – 마지막 저장된 데이터 뒤에 Leader의 데이터를 복제

•Follower가 이상한 데이터를 복제한 경우(b) – 불 일치하는 정보를 모두 삭제 후 Leader의 데이터를 복제

8

삭제


2-Node Clustering▪ ODL은 Raft 알고리즘을 수정해 2-node로 HA를 지원할 예정(Raft는 3-node 이상만 지원)

– AA(Active-Active) 또는 AP(Active-Passive) 같은 2-node clustering 지원 요구가 많음

▪ HA는 네트워크 토폴리지에 따라 많은 영향을 받아 아래 그림과 같은 토폴리지를 추천

9

✓ Primary Controller(Full Primary)- 모든 장비의 Leader

✓ Partial Primary- 분할된 네트워크의 Leader

✓ Configured Primary- 관리자가 설정한 Primary

✓ Secondary Controller- 백업 컨트롤러

✓ Network Partition Dection- 네트워크 파티션을 감지해 보고하는 외부 Agent

ODL에서 추천하는 2-node 네트워크 토폴리지 모델

AggregationSwitches

Access Switches

Configured Primary

Controller A (Full Primary)

Switch X

SwitchX1

Controller B (Secondary)

FollowerLeader

Switch Y

SwitchX2

SwitchY1

SwitchY2


2-Node Clustering▪컨트롤러의 Active-Active/Active-Passive 동작

– Active-Active •정상 동작: 한 개의 컨트롤러가 Primary로 동작 •Primary의 고장 및 링크 에러 : Secondary가 Primary로 역할 •Full 네트워크 파티션: 분할된 네트워크를 독립적으로 관리

– Active-Passive •정상 동작: 한 개의 컨트롤러가 Primary로 동작 •Primary의 고장 및 링크 에러: Secondary가 Primary로 역할 •Full 네트워크 파티션: Configured Primary와 연결되지 않은 네트워크는 관리되지 않음

▪설정 옵션 – configurePrimary

: 두 개의 컨트롤러 중 Primary 컨트롤러 설정 (Configured Primary)

– failbackToPrimary(TRUE일 경우) : Configured Primary 복구 후 다시 Primary로 동작

– networkPartitionDetectionEnabled(TRUE일 경우) : 네트워크 파티션 상태를 외부에서 모니터링하고 상태를 알림

– activeActiveDeployment(TRUE일 경우) : Secondary와 Configured Primary가 동시에 Primary로 역할 (Active-Active)

10


2-Node Clustering▪ 2-Node Clustering을 지원하기 위해 수정된 특징

11

Feature Raft 2-Node Clustering

Leader Election Leader는 election 과정을 통해 과반수의 Follower에게 동의를 받아 결정

Leader가 다운되면 1 개의 컨트롤러만 남아 election 과정은 생략됨

Multiple Leaders /Data Sync Rules 다수의 Leader는 허용하지 않음

(조건: AA 동작에서 네트워크 파티션인 경우)- 각각의 컨트롤러가 분할된 네트워크의

Leader가 됨- 복구 후 Secondary 컨트롤러의 데이터를

Configured Primary의 데이터로 덮어씀

Leader Handoff 정상 동작 시 Leader는 변하지 않음

(조건: failbackToPrimary가 “TRUE"인 경우)- Configured Primary가 시작할 때

Primary 권한을 받음


Clustering Scenario

▪ 2-Node(AA)

12

1. Configured Primary

Controller A (Partial Primary)

Switch X

SwitchX1

Controller B (Pirtial Primary)

LeaderLeader

Switch Y

SwitchX2

SwitchY1

SwitchY2


Controller A (Secondary)

Switch X

SwitchX1

Controller B (Full Primary)

LeaderLeader

Switch Y

SwitchX2

SwitchY1

SwitchY2

Failover 네트워크 파티션으로 컨트롤러가 분할 관리

Failback 컨트롤러 A가 전체를 관리

Failover 컨트롤러 B가 전체 네트워크 관리

Failback “failbackToPrimary” 옵션에 따라 관리- “failbackToPrimary==TRUE”

컨트롤러 A가 전체 네트워크 관리- “failbackToPrimary==FALSE”

컨트롤러 B가 전체 네트워크 관리

partition


Clustering Scenario

13



Switch X

SwitchX1


LeaderLeader

Switch Y

SwitchX2

SwitchY1

SwitchY2


Controller A (Fail-Stop)

Switch X

SwitchX1

Controller B (Full Primary)

LeaderN/A

Switch Y

SwitchX2

SwitchY1

SwitchY2

Failover 컨트롤러 A가 전체 네트워크 관리

Failback 컨트롤러 A가 전체 네트워크 관리

Failover 컨트롤러 B가 전체 네트워크 관리

Failback “failbackToPrimary” 옵션에 따라 관리

▪ 2-Node(AA)


Clustering Scenario

14



Switch X

SwitchX1

Controller B (Fail-Stop)

N/ALeader

Switch Y

SwitchX2

SwitchY1

SwitchY2



Switch X

SwitchX1

Controller B (Partial Primary)

LeaderN/A

Switch Y

SwitchX2

SwitchY1

SwitchY2

Failover 컨트롤러 A가 전체 네트워크 관리

Failback 컨트롤러 A가 전체 네트워크 관리

Failover 컨트롤러 B가 Switch Y의 네트워크만 관리

Failback 1.컨트롤러 A만 복구 시 분할 관리2.파티션만 복구 시 컨트롤러 B가 전체 관리3.컨트롤러 A와 파티션 동시 복구 시

“failbackToPrimary” 옵션에 따라 관리

▪ 2-Node(AA)


Clustering Scenario

15



Switch X

SwitchX1

Controller B (Fail-Stop)

N/ALeader

Switch Y

SwitchX2

SwitchY1

SwitchY2



Switch X

SwitchX1


LeaderN/A

Switch Y

SwitchX2

SwitchY1

SwitchY2

Failover 컨트롤러 A가 Switch X의 네트워크만 관리

Failback 1.컨트롤러 B만 복구 시 분할 관리2.파티션만 복구 시 컨트롤러 A가 전체 관리3.컨트롤러 B와 파티션 동시 복구 시 컨트롤러 A가 전체 관리

Failover 모든 장비 관리 불가능

Failback 1.컨트롤러 A만 복구 시 컨트롤러 A가 전체를 관리2.컨트롤러 B의 Link만 복구 시 컨트롤러 B가 전체를 관리

3.컨트롤러 A와 컨트롤러 B의 Link 동시 복구 시 “failbackToPrimary” 옵션에 따라 관리

▪ 2-Node(AA)


Clustering Scenario

16


Controller A (Primary)

Switch X

SwitchX1


N/ALeader

Switch Y

SwitchX2

SwitchY1

SwitchY2


Failback 1.컨트롤러 B만 복구 시 컨트롤러 B가 전체를 관리2.컨트롤러 A의 Link만 복구 시 컨트롤러 A가 전체를 관리

3.컨트롤러 B와 컨트롤러 A의 Link 동시 복구 시 컨트롤러 A가 전체를 관리

▪ 2-Node(AA)


Clustering Scenario

▪ 2-Node(AP) : AA 시나리오의 1,6번 외에는 같게 처리됨

17

Failover 컨트롤러 A의 네트워크만 관리 됨

Failback 컨트롤러 A가 전체를 관리


Failback 1.컨트롤러 A만 복구 시 컨트롤러 A의 네트워크만 관리

2.파티션만 복구 시 컨트롤러 B가 전체를 관리3.컨트롤러 A와 파티션 동시 복구 시 컨트롤러 A가 전체를 관리



Switch X

SwitchX1

Controller B (Pirtial Primary)

CandidateLeader

Switch Y

SwitchX2

SwitchY1

SwitchY2

partition 6. Configured Primary


Switch X

SwitchX1

Controller B (Partial Primary)

CandidateN/A

Switch Y

SwitchX2

SwitchY1

SwitchY2


Clustering Scenario

18

▪ 3-Node 이상 : Leader 서버의 고장

A B

C

state: Leaderterm: 0 state: Follower

term: 0

state: Followerterm: 0 E

D

state: Followerterm: 0


A B

C

state: Leaderterm: 0 state: Candidate

term: 1


D



A B

C

state: Followerterm: 1 state: Leader

term: 1


D



A B

C

state: Leaderterm: 0 state: Leader

term: 1


D



Timeout!!

Leader 결함!

서버 복구!heartbeat

Leaderelection

election(B)


Clustering Scenario▪ 3-Node 이상 : 다른 서버와 통신할 수 없는 경우

19

LC F

F F

F

C

LC F

F F

F

C

term=0 term=0

term=0

term=0

term=0

term=0 term=0

term=0

term=0

term=0

LC F

L F

F

C

term=0 term=0

term=1

term=1

term=1

FC F

L F

F

C

term=1 term=1

term=1

term=1

term=1

1. 정상 동작 2. 네트워크 파티션!

3. 분할된 네트워크의 새로운 Leader election! 4. 네트워크 복구 후 높은 “term”의 서버가 Leader가 됨

partition

partition

(C = Client, L = Leader, F = Follower)


Clustering Scenario

20

▪ 3-Node 이상 : election timeout

A B

C

state: Candidateterm: 1



Dstate: Followerterm: 0

A B

C





A B

C





A B

C


state: Leaderterm: 2



ElectionTimeout!!

election(A) election(B)

election(B)


Summary▪ ODL(OpenDaylight)은 Clustering을 통해 HA를 지원함

– Akka 프레임워크 기반의 clustering 구현과 Raft 알고리즘을 이용해 HA를 지원 – Node의 수가 2개인 경우는 수정된 Raft 알고리즘으로 HA를 지원

▪ Akka란 Actor 모델 기반의 병렬 및 분산 처리 프로그램을 위한 툴킷 •Actor는 메시지 방식으로 정보를 공유하는 비동기 병렬 시스템

▪ Raft란 분산처리 작업에서 시스템의 신뢰성을 제공하기 위한 알고리즘 – Raft는 사용자가 알고리즘에 대해 쉽게 이해하도록 돕는 것이 가장 큰 목표 – Raft에서 서버는 Leader, Follower, Candidate 세 가지 상태 존재 – Leader 부재 시 election을 통해 오직 하나의 Leader를 뽑음

•Leader는 주기적으로 Follower에게 Heartbeat을 전송 •Follower가 Heartbeat을 수신하지 못하면 Candidate가 돼 election 과정 시작 •과반수의 서버가 동의하면 Candidate는 Leader가 됨

– Leader를 통해 Client의 요청을 다른 분산 시스템에 복제 후 state machine을 통해 처리함

▪ ODL은 2-Node를 이용한 HA 서비스 지원을 위해 Raft 알고리즘을 수정함 – 복수의 Leader가 존재 가능 : Active-Active – 설정에 따른 Leader 핸드오프 기능 : 서버 복구 후 Primary 컨트롤러가 Leader가 됨 – Leader election 과정 생략 : Leader의 장애 발생시 남은 서버는 1 개이므로 생략

21


Reference▪ https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering ▪ https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering:2-

Node ▪ https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering:2-

Node:Failure_Modes ▪ https://raftconsensus.github.io ▪ http://thesecretlivesofdata.com ▪ http://www.brocade.com ▪ https://www.inocybe.com

22

https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering

https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering:2-Node

https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering:2-Node:Failure_Modes

https://raftconsensus.github.io

http://thesecretlivesofdata.com

http://www.brocade.com/index.page

https://www.inocybe.com

감사합니다.㈜파이오링크

서울시 금천구 가산디지털2로 98 (가산동 550-1) IT캐슬 1동 401호

TEL: 02-2025-6900 FAX: 02-2025-6901 www.PIOLINK.com

23

Technology

OpenDaylight의 High Availability 기능 분석