Download pdf - Networking Laboratory

Transcript
Page 1: Networking Laboratory

NetworkingLaboratory1/

Sungkyunkwan University

Copyright2000-2016NetworkingLaboratory

Recovering Device Drivers

Michael M. Swift, Muthukaruppan Annamalai, Brian N. Bershad, and Henry M. Levy

University of Washington, USASymposium on Operating Systems Design and Implementation, 2004

2016-05-16김대천, 김희진, 차민철

Page 2: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory2/44

목차

¢ Introduction

¢ Shadow driver

¢ Evaluation

¢ Conclusion

Page 3: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory3/44

Introduction¢ 대부분의 system failure는 device driver failure가원인이다.

► Driver failure를줄이면전반적인시스템 신뢰성증가

¢ 기존 Failure-isolation system에서 driver fail시 kernel은보호되지만application은에러발생

► failed driver를 kernel에올리지 않고 driver를초기상태로돌림► Failed driver에연결된 application state가날아감► 실행되던 application에잘못된 값전달

¢ Device driver가 fail되더라도연결된 application은영향을받지않는메커니즘제안

► Shadow Drivers

Page 4: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory4/44

왜 Device Drivers fail이발생하는가?¢ 대부분의 Device driver failure은예상치못한입력값이나이벤트때문

에발생한다.► Deterministic

« IO requests나환경설정에의해발생« generic tools로복구불가능

► Transient«디바이스에서추가적인 input이 들어오는경우발생

► Fail-stop« Failed driver가OS나디바이스, 애플리케이션에영향을미치기전에중지시킴

Page 5: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory5/44

Shadow Drivers(1/6)shadow drivers란?

¢ Shadow driver► Taps를이용함.► Kernel과드라이버간의 통신을모니터링함► Driver failure가발생하면 shadow driver가잠시동안 failed driver 역할을수행하는동시에 failed driver를복구

¢ 구현을위해 3가지서비스필요► Isolation service► Redirection mechanism► Object tracking service→ Nooks 사용

Page 6: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory6/44

Shadow Drivers(2/6)Shadow drivers

¢ Passive mode► 평소동작 모드► Kernel과 device driver 사이의통신을 모니터링하여복사본저장► 드라이버환경설정 정보기록

Page 7: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory7/44

Shadow Drivers(3/6)Shadow drivers

¢ Active mode► Driver fail시동작 모드► Failed driver 역할을수행하면서 kernel로부터의 call을가로챔► Failed driver를복구하기위해 kernel을흉내냄

« driver로부터 call을가로채서드라이버가재시작할수있도록함

Page 8: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory8/44

Shadow Drivers(4/6)Shadow drivers 동작

¢ Passive mode 동작► IO request

« Connection-oriented driver : 각각의 active connection 상태를저장« Request-oriented driver: 들어오는명령어들을대기시킨기록저장

► Kernel과 device driver 사이의통신을 모니터링하여복사본저장«실시간으로주고받는데이터는복사하지않음

► 드라이버환경설정과 인자값을기록« Persistent log만저장

Page 9: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory9/44

Shadow Drivers(5/6)Shadow drivers 동작

¢ Active mode 동작► Failed driver 동작멈춤► 드라이버초기화► 드라이버를 fail 발생하기전상태로 세팅

Page 10: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory10/44

Shadow Drivers(6/6)드라이버 복구 시스템 구조

Page 11: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory11/44

Evaluation(1/4)

¢테스트환경

[사용드라이버] [드라이버에연결된애플리케이션]

Page 12: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory12/44

►기존 failure-isolation환경과 shadow driver 추가한 환경 overhead 비교

Evaluation(2/4)performance

Page 13: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory13/44

► Driver fail시연결애플리케이션도 오류가발생하는지비교

Evaluation(3/4)Fault-Tolerance

Page 14: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory14/44

Evaluation(4/4)Limitations

► Non-fail-stop이얼마나발생하는지비교

Page 15: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory15/44

Conclusion(1/3)¢ Rx

► Software failure 시 checkpoint로 rollback하여다시 실행► 여러종류의 버그로부터프로그램복구

Page 16: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory16/44

Conclusion(2/3)¢ Shadow driver 한계

► 드라이버와커널 간주고받는데이터에서에러 발견못함► 잘못된인자값 발견못함► 많은수의 wrapper가수동적으로 쓰여지므로 fault들을포함할수 있음► 모든메모리에 write할수있는것을 막지못함

→ 하지만 커널신뢰성향상면에선 유용함

Page 17: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory17/44

¢ 현재는..

► Fault 수는 HAL을사용하면서 driver가여전히가장 높지만fault rate는 Arch가가장높음

► 연구에따라 fault 패턴이바뀌므로연구 우선순위를위해정기적인 source code 연구 필요

► 발생할수 있는모든종류의 fault들을 찾을수있는 tool 개발 필요

Conclusion(3/3)

Page 18: Networking Laboratory

NetworkingLaboratory18/

Sungkyunkwan University

Copyright2000-2016NetworkingLaboratory

IX : A Protected Dataplane Operating System for High Throughput and Low Latency

2016-05-16Mincheol Cha / Daecheon Kim / HeeJin Kim

Sungkyunkwan University

Page 19: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory19/44

Contents

¢ Introduction

¢ Terminology

¢ IX

¢ Evaluation

¢ Discussion & Conclusion

Page 20: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory20/44

IntroductionDatacenter Applications 문제(1/2)¢ Microsecond tail latency

► 많은수의서비스사이에서풍부한상호작용을위함► RPC 요청의 latency 분배

request latency 각서비스 node는 이상적으로 99%의 request latency를 옳바르게분배필요함

¢ High packet rates► Datacenter application을 구성하는 request와 reply는생각보다작음► 만약대규모의 connection counts를관리못하면 app에피해를줄수있음

높은 packet 율은높은수의 concurrent 접속과 많은접속 churn으로 인해지속되어야함

Page 21: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory21/44

IntroductionDatacenter Applications 문제(2/2)¢ Protection

► Kernel 기반 or hypervisor 기반 networking stack을사용하면보안문제발생함

Server를공유하는복수의 service에서 app간 isolation 필요함

¢ Resource efficiency► 주간패턴과 user traffic으로인해발생하는 datacenter app의 load가 상당히변함

각 service node는적은 resource로 packet rate를만족하고 tail latency 요구조건을만족해야함

Page 22: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory22/44

IntroductionThe Hardware – OS 와 hardware Mismatch

¢ The wealth of hardware resources but poor OS► 작은 latency와고 packet rate가필요함

Hardware 자원을효율적으로사용가능한 OS 개발필요

Page 23: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory23/44

IntroductionThe Hardware – Alternative Approaches

¢ User-space networking stacks► Kernel overhead 를줄이기위해 user-space의 networking stack을변환함► Kernel의복잡도를줄여 packet processing 최적화

packet rate와 latency tradeoff 함

¢ TCP 대안► UDP 사용

신뢰성있는통신 -> user level 처리

¢ POSIX API 대안► Mega pipe : software overhead와 packet rate 증가

kernel-based networking stack이가지는문제점계속가지고있음

¢ OS 향상► Tuning kernel-based stacks은 system 전개함에있어좋음

Epoll에최적화되어있지않음

Page 24: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory24/44

TerminologyModern system(1/3)

¢ Zero-copy API : context switch에서 copy를줄이고 data를 network에직접적으로보냄

Page 25: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory25/44

TerminologyModern system(2/3)

¢ mTCP : multicore system에서고성능 User-level TCP Stack으로활용

Page 26: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory26/44

TerminologyModern system(3/3)

¢ Hierarchical ring(protection ring) : faults와malicious behavior로부터data와기능을보호

Page 27: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory27/44

IXDesign of IX

¢ controlplane and dataplane를나누기► Controlplane : resource configuration, provisioning, and so on(simple kernel mode)► Dataplane: networking stack과 app logic 수행

¢ adaptive batching 기법으로 RTC 수행► Data와 instruction cache locality

¢ 분명한 flow control로 zero-copy API 실현

¢ Flow 일관성, 동기화-free processing

Page 28: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory28/44

IXIX implementation¢ The IX dataplane operating system

► Control과 Data Plane 분할► Adaptive Batching을 사용한RTC (network stack)

DataplaneControlplane

Page 29: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory29/44

IXIX Parts, Hierarchical rings¢ IX에서 Rings

FullLinuxKernelDune

VMXrootring0

=가상시스템의하이퍼바이져와비슷

Dataplane이 VMXnon-rootring0에서특수목적의OS로써작동하도록도와준다.(가상시스템에서guestkernel과같은것)

VMXNon- rootring3

App App1. Dataplane이 pagetables,exceptions등에직접연결도와줌2. Dataplane ->NIC연결도와줌3. Controlplane,dataplanes,신뢰성없는 appcode간

Full3-wayprotection제공

Page 30: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory30/44

IXIX Parts, Controlplane¢ IX에서 Controlplane

IXControlplane

FullLinuxKernelIXCP(Userlevelprogram)

1. PCIe(Lan card등등)장치초기화2. 기본리소스할당메커니즘제공(코어,메모리,네트워크)

1. Resource사용,dataplane기능감시2. 리소스할당정책구현

-리소스할당정책구현-*dataplane과

에너지비례원칙의어려운 tradeoff이해*인접 app의이해

(시간이지남에따라달라지는 load)

Controlplane dataplane

Coarsh-grainedmanner로 resource할당

Page 31: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory31/44

IXIX Parts, Dataplane(1/4)¢ Dataplane에서 Thread

IXDataplane 1. HighperformancenetworkIO2. 단일 app에서동작(libraryOS와비슷)+memoryisolation3. 다수의 kernel-levelservices제공

Single,multithreadapp지원단일주소공간OS

Thread지원Elasticthread:dataplane실행,network I/O소모Backgroundthread

=두 thread모두POSIXsystemcallissue가능

Elasticthread->blockingcall은안만듬(지연된 packetprossesing에서초래되는 network행동결과에부정적인영향을줌)

예측된 latency에서높은성능을내기위해 dataplane에할당된 hardwarethread나 core의독점적사용가능

Backgroundthread할당된 hardwarethread를공동사용

Controlplane이더자원을할당하면 elasticthread갯수조정가능 =exokernel과비슷

Page 32: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory32/44

IXIX Parts, Dataplane(2/4)¢ Dataplane에서 memory management

Hardwarethread

Memorypool에서dataobject할당

Hot-path활용 1. 사이즈가같은objectarray로써구성됨2. Page-sizedblock지급

1. Internalmemory의복잡도낮춤2. Memory효율성Up

*할당안된Object는리스트를만들어트랙킹->Freeobjects->Allocationroutines실행 (즉시할당해버림)*Single주소공간유지*kernel pages는 spervisor bit에의해보호됨*Swappablememory는지원안함(성능변화피하려고)

Page 33: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory33/44

IXIX Parts, Dataplane(3/4)¢ Dataplane에서남은기능

¢ Virtual address translation 관리 = nested paging 기법활용► 동시대OS들은 2MB large pages 사용► 이점 : 1. address translation overhead 감소

2. 현대서버에물리적memory 자원비교적풍부

¢ Mbuf : packet 전송에사용

¢ Hierarchical timing wheel 구현 : TCP 실패시재전송등을위해 timer 구현)► High-resolution timeout 지원(TCP incast congestion에서성능향상)

Page 34: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory34/44

IXIX Parts, Dataplane(4/4)¢ 그리고현재의 Dataplane

Dune,VT-x가상화사용

ARM,SPARC,Power등가상화지원으로 porting가능

Intel82599chipsetNIC사용(쉽게추가적인드라이버지원)

39kSLOC으로구성

RFC-Compliant:UDP,ARP,ICMP도지원

LwIP사용 :원래는임베디드용(메모리효율이좋음)->multicorescalability,fine-grainedtimermanagement용으로바꿈

DPDK41%IntelNICdevicedriver

LwIP 26% DUNELib16%

NewCode7kSLOC

Page 35: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory35/44

IXStructure of dataplane batched System Calls¢ The IX dataplane system call과 event condition API

► Elastic Threads는 batch system calls을발행하고 event conditions을사용함

Page 36: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory36/44

EvaluationExperimental Methodology 와 general evaluation

¢ NetPIPE 성능 : message sizes 변화와 system software 조정¢ IX 와 Linux, mTCP 비교

SPEC.► Quanta/Cumulus 48X10GBe switch► Mix of Xeon E5-2637► Xeon E5-2665 server► 256GB of DRAM► 8 cores and 16 hyperthreads► Intel x520 10GBE NICs

Page 37: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory37/44

EvaluationLatency 와 Single-flow Bandwidth (1/4)

¢ Multicore Scalability for Short Connections

Page 38: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory38/44

EvaluationLatency 와 Single-flow Bandwidth (2/4)

¢ N round-trips per connection

Page 39: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory39/44

EvaluationLatency 와 Single-flow Bandwidth (3/4)

¢ Different 메시지 sizes s (n=1)

Page 40: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory40/44

EvaluationLatency 와 Single-flow Bandwidth (4/4)

¢ Connection 확장성

Page 41: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory41/44

EvaluationThroughput 과 Scalability

¢ 2개의 memcached워크로드

Page 42: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory42/44

Discussion & ConclusionDiscussion

¢ IX를빠르게하는것은?

¢ adaptive batching의어려움

¢ current prototype의한계

Page 43: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory43/44

Discussion & ConclusionDiscussion

¢ IX vs Arrakis

Arrakis :IX와유사하게 per-app의 networkstack을개발(평상시batchinghandling동작을없앰)

IX Arrakis

Page 44: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory44/44

Discussion & ConclusionConclusion

¢ IX는과연좋은OS인가?

¢ Resource constrained issue

¢ IX 활용가능성


Recommended