Upload
phamanh
View
216
Download
1
Embed Size (px)
Citation preview
NetworkingLaboratory1/
Sungkyunkwan University
Copyright2000-2016NetworkingLaboratory
Recovering Device Drivers
Michael M. Swift, Muthukaruppan Annamalai, Brian N. Bershad, and Henry M. Levy
University of Washington, USASymposium on Operating Systems Design and Implementation, 2004
2016-05-16김대천, 김희진, 차민철
OperatingSystemsDesignNetworkingLaboratory2/44
목차
¢ Introduction
¢ Shadow driver
¢ Evaluation
¢ Conclusion
OperatingSystemsDesignNetworkingLaboratory3/44
Introduction¢ 대부분의 system failure는 device driver failure가원인이다.
► Driver failure를줄이면전반적인시스템 신뢰성증가
¢ 기존 Failure-isolation system에서 driver fail시 kernel은보호되지만application은에러발생
► failed driver를 kernel에올리지 않고 driver를초기상태로돌림► Failed driver에연결된 application state가날아감► 실행되던 application에잘못된 값전달
¢ Device driver가 fail되더라도연결된 application은영향을받지않는메커니즘제안
► Shadow Drivers
OperatingSystemsDesignNetworkingLaboratory4/44
왜 Device Drivers fail이발생하는가?¢ 대부분의 Device driver failure은예상치못한입력값이나이벤트때문
에발생한다.► Deterministic
« IO requests나환경설정에의해발생« generic tools로복구불가능
► Transient«디바이스에서추가적인 input이 들어오는경우발생
► Fail-stop« Failed driver가OS나디바이스, 애플리케이션에영향을미치기전에중지시킴
OperatingSystemsDesignNetworkingLaboratory5/44
Shadow Drivers(1/6)shadow drivers란?
¢ Shadow driver► Taps를이용함.► Kernel과드라이버간의 통신을모니터링함► Driver failure가발생하면 shadow driver가잠시동안 failed driver 역할을수행하는동시에 failed driver를복구
¢ 구현을위해 3가지서비스필요► Isolation service► Redirection mechanism► Object tracking service→ Nooks 사용
OperatingSystemsDesignNetworkingLaboratory6/44
Shadow Drivers(2/6)Shadow drivers
¢ Passive mode► 평소동작 모드► Kernel과 device driver 사이의통신을 모니터링하여복사본저장► 드라이버환경설정 정보기록
OperatingSystemsDesignNetworkingLaboratory7/44
Shadow Drivers(3/6)Shadow drivers
¢ Active mode► Driver fail시동작 모드► Failed driver 역할을수행하면서 kernel로부터의 call을가로챔► Failed driver를복구하기위해 kernel을흉내냄
« driver로부터 call을가로채서드라이버가재시작할수있도록함
OperatingSystemsDesignNetworkingLaboratory8/44
Shadow Drivers(4/6)Shadow drivers 동작
¢ Passive mode 동작► IO request
« Connection-oriented driver : 각각의 active connection 상태를저장« Request-oriented driver: 들어오는명령어들을대기시킨기록저장
► Kernel과 device driver 사이의통신을 모니터링하여복사본저장«실시간으로주고받는데이터는복사하지않음
► 드라이버환경설정과 인자값을기록« Persistent log만저장
OperatingSystemsDesignNetworkingLaboratory9/44
Shadow Drivers(5/6)Shadow drivers 동작
¢ Active mode 동작► Failed driver 동작멈춤► 드라이버초기화► 드라이버를 fail 발생하기전상태로 세팅
OperatingSystemsDesignNetworkingLaboratory10/44
Shadow Drivers(6/6)드라이버 복구 시스템 구조
OperatingSystemsDesignNetworkingLaboratory11/44
Evaluation(1/4)
¢테스트환경
[사용드라이버] [드라이버에연결된애플리케이션]
OperatingSystemsDesignNetworkingLaboratory12/44
►기존 failure-isolation환경과 shadow driver 추가한 환경 overhead 비교
Evaluation(2/4)performance
OperatingSystemsDesignNetworkingLaboratory13/44
► Driver fail시연결애플리케이션도 오류가발생하는지비교
Evaluation(3/4)Fault-Tolerance
OperatingSystemsDesignNetworkingLaboratory14/44
Evaluation(4/4)Limitations
► Non-fail-stop이얼마나발생하는지비교
OperatingSystemsDesignNetworkingLaboratory15/44
Conclusion(1/3)¢ Rx
► Software failure 시 checkpoint로 rollback하여다시 실행► 여러종류의 버그로부터프로그램복구
OperatingSystemsDesignNetworkingLaboratory16/44
Conclusion(2/3)¢ Shadow driver 한계
► 드라이버와커널 간주고받는데이터에서에러 발견못함► 잘못된인자값 발견못함► 많은수의 wrapper가수동적으로 쓰여지므로 fault들을포함할수 있음► 모든메모리에 write할수있는것을 막지못함
→ 하지만 커널신뢰성향상면에선 유용함
OperatingSystemsDesignNetworkingLaboratory17/44
¢ 현재는..
► Fault 수는 HAL을사용하면서 driver가여전히가장 높지만fault rate는 Arch가가장높음
► 연구에따라 fault 패턴이바뀌므로연구 우선순위를위해정기적인 source code 연구 필요
► 발생할수 있는모든종류의 fault들을 찾을수있는 tool 개발 필요
Conclusion(3/3)
NetworkingLaboratory18/
Sungkyunkwan University
Copyright2000-2016NetworkingLaboratory
IX : A Protected Dataplane Operating System for High Throughput and Low Latency
2016-05-16Mincheol Cha / Daecheon Kim / HeeJin Kim
Sungkyunkwan University
OperatingSystemsDesignNetworkingLaboratory19/44
Contents
¢ Introduction
¢ Terminology
¢ IX
¢ Evaluation
¢ Discussion & Conclusion
OperatingSystemsDesignNetworkingLaboratory20/44
IntroductionDatacenter Applications 문제(1/2)¢ Microsecond tail latency
► 많은수의서비스사이에서풍부한상호작용을위함► RPC 요청의 latency 분배
request latency 각서비스 node는 이상적으로 99%의 request latency를 옳바르게분배필요함
¢ High packet rates► Datacenter application을 구성하는 request와 reply는생각보다작음► 만약대규모의 connection counts를관리못하면 app에피해를줄수있음
높은 packet 율은높은수의 concurrent 접속과 많은접속 churn으로 인해지속되어야함
OperatingSystemsDesignNetworkingLaboratory21/44
IntroductionDatacenter Applications 문제(2/2)¢ Protection
► Kernel 기반 or hypervisor 기반 networking stack을사용하면보안문제발생함
Server를공유하는복수의 service에서 app간 isolation 필요함
¢ Resource efficiency► 주간패턴과 user traffic으로인해발생하는 datacenter app의 load가 상당히변함
각 service node는적은 resource로 packet rate를만족하고 tail latency 요구조건을만족해야함
OperatingSystemsDesignNetworkingLaboratory22/44
IntroductionThe Hardware – OS 와 hardware Mismatch
¢ The wealth of hardware resources but poor OS► 작은 latency와고 packet rate가필요함
Hardware 자원을효율적으로사용가능한 OS 개발필요
OperatingSystemsDesignNetworkingLaboratory23/44
IntroductionThe Hardware – Alternative Approaches
¢ User-space networking stacks► Kernel overhead 를줄이기위해 user-space의 networking stack을변환함► Kernel의복잡도를줄여 packet processing 최적화
packet rate와 latency tradeoff 함
¢ TCP 대안► UDP 사용
신뢰성있는통신 -> user level 처리
¢ POSIX API 대안► Mega pipe : software overhead와 packet rate 증가
kernel-based networking stack이가지는문제점계속가지고있음
¢ OS 향상► Tuning kernel-based stacks은 system 전개함에있어좋음
Epoll에최적화되어있지않음
OperatingSystemsDesignNetworkingLaboratory24/44
TerminologyModern system(1/3)
¢ Zero-copy API : context switch에서 copy를줄이고 data를 network에직접적으로보냄
OperatingSystemsDesignNetworkingLaboratory25/44
TerminologyModern system(2/3)
¢ mTCP : multicore system에서고성능 User-level TCP Stack으로활용
OperatingSystemsDesignNetworkingLaboratory26/44
TerminologyModern system(3/3)
¢ Hierarchical ring(protection ring) : faults와malicious behavior로부터data와기능을보호
OperatingSystemsDesignNetworkingLaboratory27/44
IXDesign of IX
¢ controlplane and dataplane를나누기► Controlplane : resource configuration, provisioning, and so on(simple kernel mode)► Dataplane: networking stack과 app logic 수행
¢ adaptive batching 기법으로 RTC 수행► Data와 instruction cache locality
¢ 분명한 flow control로 zero-copy API 실현
¢ Flow 일관성, 동기화-free processing
OperatingSystemsDesignNetworkingLaboratory28/44
IXIX implementation¢ The IX dataplane operating system
► Control과 Data Plane 분할► Adaptive Batching을 사용한RTC (network stack)
DataplaneControlplane
OperatingSystemsDesignNetworkingLaboratory29/44
IXIX Parts, Hierarchical rings¢ IX에서 Rings
FullLinuxKernelDune
VMXrootring0
=가상시스템의하이퍼바이져와비슷
Dataplane이 VMXnon-rootring0에서특수목적의OS로써작동하도록도와준다.(가상시스템에서guestkernel과같은것)
VMXNon- rootring3
App App1. Dataplane이 pagetables,exceptions등에직접연결도와줌2. Dataplane ->NIC연결도와줌3. Controlplane,dataplanes,신뢰성없는 appcode간
Full3-wayprotection제공
OperatingSystemsDesignNetworkingLaboratory30/44
IXIX Parts, Controlplane¢ IX에서 Controlplane
IXControlplane
FullLinuxKernelIXCP(Userlevelprogram)
1. PCIe(Lan card등등)장치초기화2. 기본리소스할당메커니즘제공(코어,메모리,네트워크)
1. Resource사용,dataplane기능감시2. 리소스할당정책구현
-리소스할당정책구현-*dataplane과
에너지비례원칙의어려운 tradeoff이해*인접 app의이해
(시간이지남에따라달라지는 load)
Controlplane dataplane
Coarsh-grainedmanner로 resource할당
OperatingSystemsDesignNetworkingLaboratory31/44
IXIX Parts, Dataplane(1/4)¢ Dataplane에서 Thread
IXDataplane 1. HighperformancenetworkIO2. 단일 app에서동작(libraryOS와비슷)+memoryisolation3. 다수의 kernel-levelservices제공
Single,multithreadapp지원단일주소공간OS
Thread지원Elasticthread:dataplane실행,network I/O소모Backgroundthread
=두 thread모두POSIXsystemcallissue가능
Elasticthread->blockingcall은안만듬(지연된 packetprossesing에서초래되는 network행동결과에부정적인영향을줌)
예측된 latency에서높은성능을내기위해 dataplane에할당된 hardwarethread나 core의독점적사용가능
Backgroundthread할당된 hardwarethread를공동사용
Controlplane이더자원을할당하면 elasticthread갯수조정가능 =exokernel과비슷
OperatingSystemsDesignNetworkingLaboratory32/44
IXIX Parts, Dataplane(2/4)¢ Dataplane에서 memory management
Hardwarethread
Memorypool에서dataobject할당
Hot-path활용 1. 사이즈가같은objectarray로써구성됨2. Page-sizedblock지급
1. Internalmemory의복잡도낮춤2. Memory효율성Up
*할당안된Object는리스트를만들어트랙킹->Freeobjects->Allocationroutines실행 (즉시할당해버림)*Single주소공간유지*kernel pages는 spervisor bit에의해보호됨*Swappablememory는지원안함(성능변화피하려고)
OperatingSystemsDesignNetworkingLaboratory33/44
IXIX Parts, Dataplane(3/4)¢ Dataplane에서남은기능
¢ Virtual address translation 관리 = nested paging 기법활용► 동시대OS들은 2MB large pages 사용► 이점 : 1. address translation overhead 감소
2. 현대서버에물리적memory 자원비교적풍부
¢ Mbuf : packet 전송에사용
¢ Hierarchical timing wheel 구현 : TCP 실패시재전송등을위해 timer 구현)► High-resolution timeout 지원(TCP incast congestion에서성능향상)
OperatingSystemsDesignNetworkingLaboratory34/44
IXIX Parts, Dataplane(4/4)¢ 그리고현재의 Dataplane
Dune,VT-x가상화사용
ARM,SPARC,Power등가상화지원으로 porting가능
Intel82599chipsetNIC사용(쉽게추가적인드라이버지원)
39kSLOC으로구성
RFC-Compliant:UDP,ARP,ICMP도지원
LwIP사용 :원래는임베디드용(메모리효율이좋음)->multicorescalability,fine-grainedtimermanagement용으로바꿈
DPDK41%IntelNICdevicedriver
LwIP 26% DUNELib16%
NewCode7kSLOC
OperatingSystemsDesignNetworkingLaboratory35/44
IXStructure of dataplane batched System Calls¢ The IX dataplane system call과 event condition API
► Elastic Threads는 batch system calls을발행하고 event conditions을사용함
OperatingSystemsDesignNetworkingLaboratory36/44
EvaluationExperimental Methodology 와 general evaluation
¢ NetPIPE 성능 : message sizes 변화와 system software 조정¢ IX 와 Linux, mTCP 비교
SPEC.► Quanta/Cumulus 48X10GBe switch► Mix of Xeon E5-2637► Xeon E5-2665 server► 256GB of DRAM► 8 cores and 16 hyperthreads► Intel x520 10GBE NICs
OperatingSystemsDesignNetworkingLaboratory37/44
EvaluationLatency 와 Single-flow Bandwidth (1/4)
¢ Multicore Scalability for Short Connections
OperatingSystemsDesignNetworkingLaboratory38/44
EvaluationLatency 와 Single-flow Bandwidth (2/4)
¢ N round-trips per connection
OperatingSystemsDesignNetworkingLaboratory39/44
EvaluationLatency 와 Single-flow Bandwidth (3/4)
¢ Different 메시지 sizes s (n=1)
OperatingSystemsDesignNetworkingLaboratory40/44
EvaluationLatency 와 Single-flow Bandwidth (4/4)
¢ Connection 확장성
OperatingSystemsDesignNetworkingLaboratory41/44
EvaluationThroughput 과 Scalability
¢ 2개의 memcached워크로드
OperatingSystemsDesignNetworkingLaboratory42/44
Discussion & ConclusionDiscussion
¢ IX를빠르게하는것은?
¢ adaptive batching의어려움
¢ current prototype의한계
OperatingSystemsDesignNetworkingLaboratory43/44
Discussion & ConclusionDiscussion
¢ IX vs Arrakis
Arrakis :IX와유사하게 per-app의 networkstack을개발(평상시batchinghandling동작을없앰)
IX Arrakis
OperatingSystemsDesignNetworkingLaboratory44/44
Discussion & ConclusionConclusion
¢ IX는과연좋은OS인가?
¢ Resource constrained issue
¢ IX 활용가능성