Upload
ji-woong-choi
View
3.982
Download
7
Embed Size (px)
Citation preview
OpenStack
오픈소스컨설팅
김호진(khoj@), 염진영(jyy@), 김익한(ihkim@)
2
1. 오픈소스기반 OpenStack 클라우드 시스템
2. OpenStack 기술 개요 및 동향
3. OpenStack 의 Community 개발 체계
4. OpenStack HA를 위한 방안
5. OpenStack SDN 개발 동향
6. Neutron OVS-DPDK 가속화와 구현방안
1. 오픈소스기반 OpenStack 클라우드 시스템
4
IT business Renaissance - data analytics / mobile / IoT
All powered by two common things cloud technology and Open Source
OpenStack create a ubiquitous infrastructure as s service platform in the Open Source Compute/storage/network
cloud technology and Open Source
https://www.openstack.org/summit/tokyo-2015/videos/presentation/hybrid-is-the-road-to-a-fast-and-furious-OpenStack-public-cloud-future
5
OpenStack을 기초로 한 cloud system
IBM cloud and open technologies
OAuth
OSLC Infrastructure as a Service
Platform as a Service
Software as a Service
API economy
Cloud operating
environment
Software- defined
environment
TOSCA
출처 : http://www.ibm.com/developerworks/cloud/library/cl-open-architecture/
6
CLOUD의 필수사항
출처 : http://www.slideshare.net/mirantis/overview-43176920
7 7 - Internal Use Only -
CLOUD
지금 OpenStack은 클라우드의 여러 영역중에 아래 영역을 차지
PaaS, SaaS,Service Provider Cloud solution - CCRA(Cloud Computing Reference Architecture) by IBM
출처 : CCRA 4.0 in IBM
• Cloud Enabled Data Center () • Platform-as-a-Service (PaaS) adoption pattern • Software-as-a-Service (SaaS) • Cloud Service Providers • Mobile • Analytics • Government - Cloud
8 8 - Internal Use Only -
CLOUD
Iaas 단계에서 이루어져야 하는 11단계의 Iaas 서비스
9
What is OpenStack?
The OpenStack Mission:
Aims to produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private clouds regardless of size, by being simple to implement and massively scalable.
10
why OpenStack?
Control and Flexibility
industry stanard
proven software
compatible and connected
2. OpenStack 기술 개요 및 동향
12
Release TimeLine
13
OpenStack Liberty
14
Liberty Is First OpenStack Release Under 'Big Tent' Model
'Big Tent' Model
15 15 - Internal Use Only -
OpenStack core service
Core services
https://www.openstack.org/summit/tokyo-2015/videos/presentation/lets-talk-roadmaps-OpenStack-style
16
프로젝트 네비게이터
https://www.openstack.org/software/project-navigator http://www.openstack.org/software/sample-configs
17
Aligns with the OpenStack Mission: to produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private clouds regardless of size, by being simple to implement and massively scalable.
Follows the OpenStack Way
– Open Source,
- Open Community,
- Open Development, and
- Open Design. (More here)
Strives for interoperability with other things that are openstack. - API services should support at least keystone
Subjects itself to the governance of the Technical Committee(TC)
Liberty Is First OpenStack Release Under 'Big Tent' Model
18
From Integrated release to Big Tent
19
Tags는 OpenStack의 모든 프로젝트를 찾아가기 위해 구분되었다.
Tagging the Big tent
20
Liberty New Features
Manageability
Scalability
Extensibility
• Common library adoption
• Better configuration management
• Role-based access control (RBAC) for Heat and Neutro
n fine tune security settings at all levels of network and
API
• magnum/kuryr for container support
• Initial version of Nova Cells v2 provides an updated mo
del to support very large and multi-location compute de
ployments
• Improvements in scale and performance across Nova, H
orizon, Neutron and Cinder
• Stronger support for OpenStack as the integration en
gine with new big tent model
• Magnum’s first full release supports Kubernetes, M
esos and Docker Swarm
• Extensible Nova scheduler
• NFV improvements for QoS policies, LBaaS
• opesource platform : virtual machines, containers
and bare metal instances
21
orchestration component
● Template-driven engine은 인프라와 애플리케이션의 자동화된 deployment를 가능하게 해 줌
Heat
http://www.slideshare.net/Cloudenablers/OpenStack-heat-how-autoscaling-works-52244215
Autoscaling workflow Heat Architecture
22
Open Source Database as a Service
Trove
23
Data processing program(Elastic Map Reduce)
Sahara
24
bare metal machines에 PXE boot나 IPMI 기술을 이용하여, OpenStack을 프로비져닝 시키는것
Ironic
http://www.slideshare.net/enigmadragon/ironic
25
Ironic
http://ja.community.dell.com/techcenter/b/weblog/archive/2015/03/16/OpenStack-ironic-part1
26
웹 개발자를 위한 multi-tenant cloud messaging과 notification service for OpenStack
zaqar
http://www.slideshare.net/enigmadragon/ironic
27
The OpenStack Shared File Service Program
OpenStack 내에서 공유파일 시스템을 만들고자 하는 프로젝트
shared volume을 active/inactvie화 가능
Manila
대부분 netapp/mirantis가 기여함
28
DNSaaS services를 제공하기 위한 project
REST API, Central, and Sink 로 구성됨
Designate
29
provides secure storage, provisioning and management of secret data
Cryptography for Managing Secrets in the Cloud
Barbican
30
6개의 core project와 14개의 big tent project
Mitaka Roadmap
Nova
Neutron
Cinder
Glance
Kolla
Heat Keystone Ceilometer
Swift Oslo
Horizon Ironic Manila
Sahara Magnum
Kuryr Designate Triple O
Trove OpenStack
Client
3. OpenStack 의 Community 개발 체계 (동향 중심으로)
32
오픈스택에 기여하기
오픈스택에 기여하고 있다는 광고의 예
https://www.openstack.org/summit/tokyo-2015/videos/presentation/hybrid-is-the-road-to-a-fast-and-furious-OpenStack-public-cloud-future
33
The OpenStack Foundation
Technical Committee
OpenStack Technical Committee는 OpenStack의 서로 연관된 문제뿐 아니라, 기술적 방향을 정하는 현재 contributors들 중에서 선발된 13명으로 구성된다.
Board of Directors
Board of Directors들은 OpenStack foundation의 resource와 staff의 전략적 /재정적 감사를 책임진다.
의장은 Suse 이사인 Alan Clark이 부의장은 cisco CTO인 Lew Tucker 이다.
User Committee
Board에서 지명한 1명 / TC에서 지정한 1명 / 추가 한명이 이 조직을 이끈다.
이들은 다양한 user들의 요구사항을 대표하는 것이다.
이들은 Product Work Group을 조직하여 활동한다.
34
PTL는 Project Team Lead 를 의미한다.
각 공식 프로젝트 팀은 특정 프로젝트 팀에서 최종 발언권을 가진 leader를 선출한다.
PTL은 Release 팀과 밀접한 관계를 가지게 된다.
6개월마다 새로 선출하고, 추가로 5명을 더 뽑는 "PTLs+5" 를 현재 모델로 사용하고 있음.
PTL
35
개발자 관점에서의 OpenStack
- 오픈소스 소프트웨어
- 현재까지 642개의 project ( 2015년 11월 19일)
- python으로 개발되어짐
- git로 운영
- 코드는 github.com
OpenStack 에서의 개발 상황 및 개발 환경 구축
http://www.slideshare.net/BharathKobagana/how-to-contribute-open-stack-part-1
OpenStack Deployment
manual installation
Automated installation
- Packstack ( RDO, Red Hat / Centos)
- Devstack ( all env )
Upstream current release of OpenStack
Downstream previous releases of OpenStack
Packaing from source code
36
Release name
Austin/--- liberty/Mitaka (alpahbetically)
Release # format : YYYY.N.M
Release 기간 : 6개월 (N)
매 2개월마다 3번 중간 Release(M)
이를 integrated release 형태로 배포
OpenStack 기본 정보
37
OpenStack Development Process
release process
http://www.slideshare.net/rainya/sa-OpenStackers-april-2014-meetup-so-you-want-to-be-an-OpenStack-contributor
Maintenance for Previous Releases
new features accepted feature freeze
Planning
community designs, discusses, & targets rel
ease cycle Implementation
community makes changes to OpenStack code & creates
new functionality Pre-Release
community focuses on bug fixes, docs, & te
sting
Final Release
Juno Design Summit May 13 – 16, ’14 Atlanta, GA
Icehouse Release 4/17/2014 release date
38
Mailing Lists >>
12 major list types (General, Announcements, Future Dev, Operators, QA, Foundation, Security, Community, Translation, Sub-teams, User Committee, and Language-Specific)
Sign up to as many as you want
Lurk or participate– up to you
Tip: Set up filters to manage the influx
Internet Relay Chat (IRC) >>
Centered on the Freenode network
42 different channels – join, lurk, participate
Ask OpenStack >>
Design Summit >>
Meetups
OpenStack Glossary >>
OpenStack communication
39
Join - Prove You are You Create Launchpad account
Upload SSH keys
Join the Foundation (free)
Verify it works on review.openstack.org
Work Files - Get your Git On Clone an OpenStack Repo
Setup git-review
Edit, test locally
Submit a patch
Get it reviewed
Review – Extra Eyes Add inline comments
Click Review - OR –
Get local copy of the patch
Patch your patch
Run tests, edit
Push it back to review.openstack.org
How to Contribute
Just 3 simple steps!
procedure
Account Setup
- launchpad account로 인증
join The OpenStack Foundation
review.openstack.org sign in
Sign the appropriate Individual Contributor License Agreement(CLA)
Installing git-review
Accessing Gerrit over HTTPS
Starting Work on a New Project
uploading your change for review
- get-review
40
OpenStack 사용Level user level
administrator level
developer level
type of contribution 질문 응답
Patches
Bugs
blueprints review
code 개발
개발 type 기존code refactoring
back porting
버그 수정
새로운 기능 구현
OpenStack 개발 관련
OpenStack portals
질문
- ask.openstack.org
Bugs / blueprints
- https://launchpad.net/OpenStack
Reviewing system
- review.openstack.org
Tracking of contribution
- stackalytics.com
OpenStack code
- github.com/OpenStack
41
OpenStack project Architecture
개발자 가이드
42
OpenStack project Architecture
Signing up for accounts: LaunchPad
• Bug tracking
• Blueprints
• Groups
• Mailing lists (sometimes)
sign후 원하는 프로젝트에 참가
ubuntu one으로 로긴하면, https://review.openstack.org 도 접속 가능
http://www.slideshare.net/rackspace/open-stack-development
43
• Freenode.net • #OpenStack
• #OpenStack-dev
• #OpenStack-infra
• #OpenStack-meeting
• #OpenStack-meeting-alt
IRC channels
44
bug 일 경우
bug/blueprints일 경우
blueprints 일 경우
45
모 통신사 프로젝트시 ceilometer에서 public IP의 대역을 모니터링 요청
[OpenStack ceilometer에서 구현하고자 하는 floating IP packet양 수집에 관한 건 ]
결론 : 현재 kilo version에서 구현되지 않았음. 이는 현재 OpenStack에서 contrail plugin으로 개발하는것을 추진중임.
case > Meters for floating IP traffic statistics
http://stackalytics.com/?release=all&project_type=all&metric=bpd&user_id=meghb
46
blueprints에 새로운 기능 추가를 요청
case > Meters for floating IP traffic statistics
review에 file update (git review –v)
자동적으로 review
47
case > Meters for floating IP traffic statistics
review에 file update reviewed https://review.openstack.org/#/c/166489/
48
commit message
github와 다르게 inline comments는 review button을 눌러야만 보임.
case > Meters for floating IP traffic statistics
49
case > Meters for floating IP traffic statistics
reviewed https://review.openstack.org/#/c/166491/
failed gate test
developer action “Abandon” “Work in Progress” “Rebase Change”
50
4. OpenStack HA를 위한 방안
4.1. OpenStack HA Concepts
4.2. OpenStack Controller Node HA 방안
4.3. OpenStack Network Node HA 방안
4.4. OSC Test Environment
4.1 OpenStack HA Concepts
52
OpenStack Architecture
이미지 출처: http://docs.openstack.org/kilo/install-guide/install/yum/content/ch_overview.html
53
OpenStack Architecture
이미지 출처: http://docs.openstack.org/kilo/install-guide/install/yum/content/ch_overview.html
54
Implementing HA
High-Availability Cluster
서비스 노드가 동작하지 않을 때 다른 노드로 서비스를 전환
SPOF(Single Point Of Failure) 해결
Solution: VCS, Keepalived, Pacemaker, 기타 HA 솔루션
DB Replication Cluster
클러스터 내의 서버들 간에 동일한 데이터를 동시 Read/Write 기능 필요
Multi Master Clustering
Replication Cluster Solution: MySQL or Mariadb Galera Oracle RAC
Load-Balancing Cluster
클러스터 노드에 요청 로드를 분산해서 대용량의 네트워크 서비스 요청을 처리
Solution: L4, L7 Switch, HAProxy
4.2. OpenStack Controller Node HA 방안
56
OpenStack Controller HA 필요 서비스
OpenStack APIs (nova, cinder, horizon, glance, keystone, neutron, etc)
RabbitMQ
MariaDB
57
OpenStack Controller HA 솔루션
OpenStack APIs (nova, cinder, horizon, glance, keystone, etc)
- High-Availability Cluster + Load-Balancing Cluster
- Solution: Keepalived + HAProxy or Pacemaker + HAProxy
RabbitMQ
- High-Availability Cluster or Load-Balancing Cluster
- Keepalived or Pacemaker or RabbitMQ Replication Cluster
MariaDB
- DB Replication Cluster
- Galera Clustering
58
Pacemaker
이미지 출처: http://clusterlabs.org/
전통적인 System level HA
Cluster Resource Manager
Corosync for cluster communication (UDP 방식)
Monitor and Controller Resource
- Floating Virtual IP Address(VIP)
- SystemD/LSB/OCF 기반 리소스 모니터링
- Cloned Services(Active/Active)
STONITH - Fence with Power Management
- 데이터 일관성 및 무결성을 보장하기 위한 장치
HAProxy 및 RabbitMQ Active-Passive 용도로 적용
59
Pacemaker OpenStack Service
이미지 출처: http://www.slideshare.net/arthurberezin/deep-dive-into-highly-available-open-stack-architecture-OpenStack-summit-vancouver-2015?next_slideshow=1
Virtual IP(VIP)
SystemD Cloned Resource
STONITH Fencing
60
Keepalived OpenStack Service
Software level HA
VRRP(Virtual Router Redundancy Protocol) 프로토콜로 HA 구현
HAProxy 및 RabbitMQ Active-Passive 용도로 적용
이미지 출처: http://www.slideshare.net/kamesh001/high-available-for-OpenStack
61
HAProxy Load Balancer
L4, L7 스위치 및 로드밸런스 기능 제공 (Round Robin,Stick-Table)
Health Checking
Failure Detection
OpenStack APIs, RabbitMQ 및 MariaDB Active-Active 용도로 적용
이미지 출처: http://www.slideshare.net/arthurberezin/deep-dive-into-highly-available-open-stack-architecture-OpenStack-summit-vancouver-2015?next_slideshow=1
62
DB Replication Clustering Galera
Synchronous로 동작하는 Multi Master Cluster
Innodb 적용
Galera 라이브러리를 사용하여 노드간 Data Replication 수행
이미지 출처: http://www.osci.kr
63
RabbitMQ Clustering
RabbitMQ Clustering with Mirrored Queues
Queue를 여러 노드에 걸쳐 Mirroring 함
한 노드가 장애시 Queue 선언과 Queue에 담긴 메시지를 다른 노드에 복사
이미지 출처: http://www.slideshare.net/arthurberezin/deep-dive-into-highly-available-open-stack-architecture-OpenStack-summit-vancouver-2015?next_slideshow=1
64
OpenStack Controller HA 방안 1
RackSpace 사례
RabbitMQ, DB, VIP - Keepalived
APIs(Keystone, Glance, nova etc) - HAProxy
65
OpenStack Controller HA 방안 2
Red Hat OpenStack Platform, openstack.org Community
VIP - Pacemaker
DB, APIs Load-Balance - HAProxy
DB Data Replication - Galera Cluster
RabbitMQ - RabbitMQ Cluster
66
KeepaliveD + Haproxy 1
Keepalived = VRRP + VIP 기능을 이용하여 고 가용성을 제공
HAProxy Load-Balancer 기능을 이용하여 서비스 부하 분산
Fail-Over No Problem
KeepaliveD
이미지 출처: http://www.slideshare.net/kamesh001/high-available-for-OpenStack
67
KeepaliveD + Haproxy 2
Keepalived = VRRP + VIP 기능을 이용하여 고 가용성을 제공
HAProxy Load-Balancer 기능을 이용하여 서비스 부하 분산
RabbitMQ Clustering + Galera
KeepaliveD
이미지 출처: http://www.slideshare.net/kamesh001/high-available-for-OpenStack
68
Keepalived + Haproxy But…
System level fault?
시스템 Hang UP, OOM 및 기타 변수가 발생 시 클러스터가 감지를 못할 수 있음
시스템 Down시 근본 원인 파악하기 어려움
그리고…
69
KeepaliveD + Haproxy But…
System level fault?
시스템 Hang UP, OOM 및 기타 변수가 발생 시 클러스터가 감지를 못할 수 있음
시스템 Down시 근본 원인 파악하기 어려움
그리고… Split Brain
이미지 출처: http://m.itwide.com/m_solution/mccs_03.asp
70
Pacemaker + HAProxy
Pacemaker + HAProxy Architecture
Pacemaker
71
Pacemaker + HAProxy
Kernel Panic, Kernel Hang UP Etc
System Fault..
Pacemaker
72
Pacemaker + HAProxy
Kernel Panic, Kernel Hang UP Etc
System Fault..
STONITH를 이용하여 장애 노드 reboot
Pacemaker
73
Pacemaker + HAProxy work
각 컴포넌트들은 VIP를 사용하여 서로 상호 작용
Compute, Network 노드에서 DB 접근 요청을 하였을 때 아래와 같이 동작
74
Scale-Out
필요에 따라 Controller Node를 추가
HAProxy 노드와 Service 노드를 구분해서 증설
75
Collapsed Architecture Scaling
이미지 출처: http://www.slideshare.net/DavidVossel/pacemaker-OpenStacks-pid-1
4.3. OpenStack Network Node HA 방안
77 77 - Internal Use Only -
Neutron
OpenStack에서 복잡한 cloud 네트워크 환경을 구현하는 컴포넌트
SDN 기반으로 구현
OpenVSwitch, Linux Bridge, Linux Network Namespace, VxLAN, VLAN, GRE 의 기술 지원
멀티테넌트 네트워크 지원
Load Balance, Firewall, VPN 기능 등 제공
다양한 plugin 제공
78 78 - Internal Use Only -
Legacy Neutron
Network node provides :
IP forwarding
– Inter-subnet (east-west) : VM간 통신
– Floating IP (north-south) : 외부 네트워크와 VM간 통신
– Default SNAT (north-south) : VM에서 외부 네트워크로의 통신
Metadata Agent
–Nova metadata service 접근
이슈 :
성능 저하
제한적인 확장성
SPOF(Single Point of Failure)
참고: http://www.slideshare.net/vivekkonnect/OpenStack-kilosummitdvrarchitecture20140506mastergroup?qid=74211292-5ccb-4c08-881b-f76b7f06a8d3&v=default&b=&from_search=1
79 79 - Internal Use Only -
Network node HA 서비스
DHCP Agent
L3 Agent
Metadata Proxy, dnsmasq
OVS, OVS Agent, Metadata Agent
80 80 - Internal Use Only -
Network node HA 방안 1
DHCP Agent
- Neutron.conf Configure
L3 Agent
- KeepaliveD: VRRP (Virtual Router Redundancy Protocol)
Metadata Proxy, dnsmasq
- Default Configure
- check_child_processes_action = respawn
- check_child_processes_period = 0
OVS, OVS Agent, Metadata Agent
- Pacemaker, KeepaliveD
81 81 - Internal Use Only -
Neutron L3 Agent Keepalived 기술 목적
L3 Agent 의 SPOF (Single Point Of Failure) 감소
VRRP(Virtual Router Redundancy Protocol) 사용
- 여러 대의 Router를 그룹으로 묶어 하나의 가상 IP 주소를 부여, Master 로 지정된 Router 장애
발생 시 VRRP 그룹 내의 Backup Router 가 Master 로 자동 전환되는 Protocol
참조:http://www.slideshare.net/rootfs32/20150818-jun-leeOpenStack-juno-release
82 82 - Internal Use Only -
DHCP Agent Configure
Network Node Configure
- # vim /etc/neutron/neutron.conf
- dhcp_agents_per_network = X (Node 숫자 만큼 Default = 1)
83 83 - Internal Use Only -
Neutron L3 Agent VRRP
KeepaliveD VRRP(Virtual Router Redundancy Protocol) HA
출처: http://docs.openstack.org/ha-guide/networking-ha-l3.html
84
Neutron L3 Agent with VRRP Demo
https://www.youtube.com/watch?v=2-VFTN0lO5k
4.3 OpenStack Network Node HA 방안 (DVR)
86 86 - Internal Use Only -
Network node HA 방안 2
DHCP Agent (Network Node)
- Neutron.conf Configure
L3 Agent (Compute Node)
- DVR (Distributed Virtual Router)
dnsmasq (Network Node)
- Default Configure
- check_child_processes_action = respawn
- check_child_processes_period = 0
OVS, OVS Agent(Network + Compute Node)
- Pacemaker, KeepaliveD
87 87 - Internal Use Only -
DVR 기술 목적
클라우드 네트워킹 기술 제공을 위해 Nova-Network 를 대신하여 Neutron (Quantum)
프로젝트 생성
Nova-Network 에서 제공되던 Multi-host 기능이 Neutron 에서는 제공되지 않음
- Multi-host 기능: 가상머신 (VM) 이 위치한 물리서버에서 다른 물리 서버를 거치지 않고 외부 망으로
트래픽을 전달할 수 있는 기능
Neutron Virtual Router에 고 가용성을 위함
- Network Node 의 트래픽 집중 문제 해결
- Network Node 에 집중 되었던 Virtual Router 를 각 Compute Node 에 분산 배치
- Network Node 에서 처리하던 Routing 기능을 Compute Node 에서 처리
참조:http://www.slideshare.net/rootfs32/20150818-jun-leeOpenStack-juno-release
88 88 - Internal Use Only -
Neutron with DVR
Compute Node :
IP forwarding
– Inter-subnet (east-west) : VM간 통신
– Floating IP (north-south) : 외부 네트워크와 VM간 통신
– Default SNAT (north-south) : VM에서 외부 네트워크로의 통신
Metadata Agent
–Nova metadata service 접근
장점 :
Floating IP 통신과 VM간의 east-west traffic 통신이 각 compute
node에서 직접 이루어지는 구조
네트워크 성능 향상
Fail 시, 대상 노드의 서비스만 영향
단점:
Default SNAT : 아직까지 네트워크 노드를 경유해야 하는
구조(SPOF)
Public IP를 Compute Node에 할당 필요
Packet control을 위한 Compute Node의 자원 이용
89 89 - Internal Use Only -
Packet 흐름
North – South without floating IP
North – South with floating IP
참고: http://www.slideshare.net/janghoonsim/open-stack-networking-juno-l3-ha-dvr
4.4 OSC Test Environment
91
OSC Infra HA environment
7대의 OS 환경으로 구성
Controller Node x 2 (HAProxy, Pacemaker)
- Network Line: MGMT, HA
- APIs, MariaDB(Galera), RabbitMQ(Cluster), Pacemaker
Network Node x 1
- Network Line: External, Tunnel, MGMT
- Neutron L3 Agent, DHCP, DVR
Block Node X 2
- Network Line: Storage, MGMT, HA(Option)
- Ceph Storage
Compute Node x 2
- Network Line: Storage, MGMT, Tunnel. External
- Nova, DVR, L3 Agent
92
OSC Infra HA architecture
5. OpenStack SDN 개발 동향
94
OpenStack SDN 관련 개발 기술
SNAT/DHCP 분산
Kyrur
Network acceleration
95
1 SNAT for one Router - Simple, but Too many public IP~~
1 SNAT for one Compute Node - Less public IP, but security issue(different tenants share)
L3 SNAT Agent 분산
https://www.openstack.org/summit/tokyo-2015/videos/presentation/network-node-is-not-needed-anymore-completed-distributed-virtual-router
96
L3 SNAT Agent 분산
1 SNAT for one Compute Node w/o Security Concern - vRouter 별로 SNAT 생성, compute node에 구현
Double Nat - physical router에서 IP 변환(Floating IP 변경 필요)
BGP - 비용, 신규 개발 필요
https://www.openstack.org/summit/tokyo-2015/videos/presentation/network-node-is-not-needed-anymore-completed-distributed-virtual-router
97
DHCP Agent 분산
HA 구현
분산 DHCP 구현 https://blueprints.launchpad.net/neutron/+spec/distributed-dhcp
https://www.openstack.org/summit/tokyo-2015/videos/presentation/network-node-is-not-needed-anymore-completed-distributed-virtual-router
98
KURYR
Docker-Engine Project에서 Libnetwork로 분리
VMs 과 containers 간의 네트워크 통신 가능
network namespaces, iptables rules for NAT, veth pairs, Linux bridges, and VXLAN
Big tent openstack project http://docs.openstack.org/developer/kuryr/
6. OVS-DPDK 가속화와 구현방안
100
SDN(Software Defined Network)
현재 network architecture의 문제점 • 변경의 어려움 / 트래픽 증가 대처 어려움 / 새로운 서비스 적용 어려움 / 지속적인 유지보수비 • 이를 프로그래밍을 통한 유연한 관리 -> SDN으로 해결하고자 함
Software Defined Networks (SDN) Architecture
출처 : http://http://aitpowersurge.co.uk
SDN(Software Defined Network) : 네트워크 장비의 Control plane과 Data plane로 분리, 기능 정의를 위한 오픈 API를 제공하여 프로그래밍으로 네트워크 경로 설정 및 제어 등을 구현하는 기술
NFV(Network Function Virtualization) : L4-L7 관련 기능(Router, Firewall, LB, CND, DPI etc)들을 H/W 기반 -> S/W 기반으로 가상화하는 기술
101
Virtual Switch을 이용한 인프라
OpenSource : - Open vSwitch, Snabb Switch Lagopus
Commercial : - Vmware ESXi, Wind River Titanium and 6Wind’s 6WINDGate
102
기존 서버의 네트워크 패킷 처리
NIC 10Mbps - 100Mbps – kernel network stack 구조
NIC 1Gbps – TCP checksum과 segment 처리 -> NIC
app와 app 사이의 네트워크 패킷 처리 루틴을 서로 다른 공간에서 분리하여 처리
네트워크 패킷 처리를 커널이 전담함으로써 개발자들의 자유도 향상
송신 : app -> socket -> kernel -> NIC
수신 : NIC -> kernel -> socket -> app
Kernel Overhead - System calls - Context switching on blocking I/O - Data copying from kernel to user space - Interrupt handling in kernel
https://www.usenix.org/sites/default/files/conference/protected-files/nsdi14_slides_hwang.pdf
103
네트워크 가상화
104
데이터 플레인 가속화(DPA)
네트워크 가상화
NIC 10Gbps – kernel network stack에서 처리할 수 있는 범위를 초과
기술 : SR-IOV(Single Root I/O Virtualization) Passthrough
대안 :
OpenOnLoad : 패킷 사용자 공간에서 처리
Netmap : 자원 사전할당, 다중 패킷 처리, 메타 데이터 및 메모리 버퍼 공유
PF_RING : DNA(Direct NIC Access)
PacketShader
DPDK
ODP(Open Data Plane)
https://www.usenix.org/sites/default/files/conference/protected-files/nsdi14_slides_hwang.pdf
105
Intel DPDK
DPDK(Data Plane Development Kit)
Intel architecture 기반의 시스템에서 패킷 처리 가속화를 구현하기 위해 애플리케이션을 작성하기 위한 “User space” 소프트웨어 라이브러리 및 드라이버 집합
Intel DPDK를 이용하면 사용자는 Linux 커널 대신에 네트워크 패킷을 처리하는 응용 프로그램을 직접 만들 수 있음
Open source project
Intel NICs에 PMD(Poll Mode Driver) 이용, User space에서 네트워크 패킷 처리
EAL(Environment Abstraction Layer) : 커널을 통과하여 직접 NIC로 접근
CPU core 수에 따라 성능 증가(Run-to-completion)
DPDK device driver : vfio, uio(igb_uio)
Vhost-user, vhost-cuse(DPDK 2.1 Support)
Hugepages
Supported NICs : http://dpdk.org/doc/nics
106
DPDK Core Components
Ring Manager : 병렬 처리를 위한 lockless 큐를 구현(librte_ring)
Memory pool Manager : hugepage 메모리 공간에 생성, ring 사용(librte_mempool)
Buffer Manager : controller message buffer, packet buffer 저장(librte_mbuf)
Packet Flow Classification: 네트워크 패킷 헤더에 대한 hash 정보 생성, 패킷들이 동일 플로우에 할당(librte_hash, librte_lpm)
Poll Mode Driver
107
Kernel Network Stack & DPDK
https://www.usenix.org/sites/default/files/conference/protected-files/nsdi14_slides_hwang.pdf
108
DPDK Installed
Header
Libraries
modules
109
EAL(Environment Abstraction Layer)
Intel Architecture 기반의 시스템 컨트롤 라이브러리 제공을 위한 추상 레이어(memory space, PCI devices, timers, consoles etc)
인터페이스 제공
CPU core 설정(User space에서 패킷 처리를 위한 core 할당)
Bus 설정
Ex) set ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:"$DB_SOCK“ => 0번 core, 4개의 memory channel 사용
110
EAL(Environment Abstraction Layer)
Intel Architecture 기반의 시스템 컨트롤 라이브러리 제공을 위한 추상 레이어(memory space, PCI devices, timers, consoles etc)
인터페이스 제공
CPU core 설정(User space에서 패킷 처리를 위한 core 할당)
Bus 설정
Ex) set ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:"$DB_SOCK“ => 0번 core, 4개의 memory channel 사용
111
PMD(Poll Mode Driver)
수신 프로세스 및 어플리케이션 사이에 패킷을 빠르게 전달(polling)하는 쓰레드
RX/TX descriptors에 직접 접속, 신속히 패킷 처리(PMD receive API, PMD transmit API)
Run-to-completion model(동기) / pipe-line model(비동기)
1, 10, 40GbE, 반가상화 virtio Ethernet driver 지원
Looping을 하기 때문에 해당 core의 cpu 사용률은 100%
112
Open vSwitch
2009년 첫 릴리즈
S/W로 정의된 Virtual Switch stack project
Linux Host-based application들의 스위칭
Apache 2.0 license
OpenFlow 지원
OpenStack, OpenNebula, OpenDaylight
Linux Based Hypervisor 지원
L2/L3 forwarding, ACL, VxLAN etc
http://www.slideshare.net/rkawkxms/open-vswitch-46148631?related=1
113
OpenvSwitch 2.4 for DPDK
2015년 08월 22일 릴리즈
Intel ONP(Open Network Platform) 프로젝트
DPDK 정식 지원(Intel Contribution)
DPDK vHost Cuse(user space character device)
DPDK vHost User(user space socket server)
OpenDaylight/OpenStack DPDK Ports 인식
DPDK tunneling 지원(VxLAN, GRE)
User Space Link Bonding
IVSHMEM(zero-copy)
vHost 성능 향상
Datapath 성능 향상
114
DPDK with OVS
1
115
SR-IOV & OVS
116
성능
117
DPDK 기술 개발 동향
2011년 오픈소스로 공개
DPDK + OpenvSwitch, Intel이 주도적으로 참여 중
참여 업체 : 6Wind(WindGate DPDK, WindGate OVS), Tieto(IP 스택), Windriver, Radisys(40Gbps NIC DPDK 지원) 등
Intel 주도, OpenStack + OVS(DPDK) 구현 프로젝트 공개(2014년 12월 15일) https://github.com/openstack/networking-ovs-dpdk https://download.01.org/packet-processing/ONPS1.5/Intel_ONP_Server_Release_1.5_Reference_Architecture_Guide_Rev1.2.pdf
Intel은 OpenvSwitch의 DPDK 지원에 계속 투자할 것으로 보임(OpenvSwitch 2.4에 많은 컨트리뷰션) http://www.intel.com/content/www/us/en/communications/open-vswitch-enables-sdn-and-nfv-transformation-paper.html?wapkw=dpdk#
하지만…..
118
DPDK in OpenStack
119
dpdk-2.1.0.tar.gz
openvswitch-2.4.0.tar.gz
qemu-2.4.1.tar.bz2
Installation on CentOS 7
[pre-requirements] # yum install kernel-devel-$(uname -r) # yum install autoconf automake libtool openssl openssl-devel fuse fuse-devel # yum install wget git pciutils
# mkdir /root/openvswitch_dpdk # cd /root/openvswitch_dpdk [Download] # wget http://dpdk.org/browse/dpdk/snapshot/dpdk-2.1.0.tar.gz # tar xfz dpdk-2.1.0.tar.gz -C /usr/src/ # export DPDK_DIR=/usr/src/dpdk-2.1.0 # cd $DPDK_DIR # vi config/common_linuxapp ------------------------------------------------ -CONFIG_RTE_BUILD_COMBINE_LIBS=n +CONFIG_RTE_BUILD_COMBINE_LIBS=y ------------------------------------------------ [config and install for Inter VM shared Memory] # make config T=x86_64-native-linuxapp-gcc # make install T=x86_64-native-linuxapp-gcc
120
Installation on CentOS 7
[hugepages setting] # vi /etc/default/grub ---------------------------------------------------------------------------------------------------------------------------------------------- +GRUB_CMDLINE_LINUX_DEFAULT="iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=1" ---------------------------------------------------------------------------------------------------------------------------------------------- # grub2-mkconfig --output=/boot/grub2/grub.cfg [If your system uses EFI, do like that] # vi /boot/efi/EFI/centos/grub.cfg ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- +linuxefi /vmlinuz-3.10.0-229.20.1.el7.x86_64 root=/dev/mapper/centos-root ro rd.lvm.lv=centos/root rd.lvm.lv=centos/swap crashkernel=auto rhgb quiet LANG=en_US.UTF-8 systemd.debug default_hugepagesz=1G hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=2048 iommu=pt intel_iommu=on isolcpus=0-1 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- [Reboot system] # init 6
[Check whether CPU supports 1G hugepage] # cat /proc/cpuinfo | grep pdpe
[Find Inter based NIC] # lspci -nn | grep -i ethernet # ls -al /sys/class/net/ [Load uio_pci_generic] # ifconfig eno2 down # modprobe uio # insmod ${RTE_SDK}/${RTE_TARGET}/kmod/igb_uio.ko # ${RTE_SDK}/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.0 # ${RTE_SDK}/tools/dpdk_nic_bind.py --status
121
Installation on CentOS 7
[OpenvSwitch Installation] # cd /root/openvswitch_dpdk # wget http://openvswitch.org/releases/openvswitch-2.4.0.tar.gz # cd ovs # ./boot.sh # ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-dpdk=${RTE_SDK}/${RTE_TARGET} --enable-ssl CFLAGS="-g -O2" # make CFLAGS='-O3 -march=native' # make install
# vi /usr/share/openvswitch/scripts/ovs-ctl ------------------------------------------------------------------------------ 218 Line - set ovs-vswitchd unix:"$DB_SOCK" + set ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:"$DB_SOCK" ------------------------------------------------------------------------------ # vi /usr/lib/systemd/system/openvswitch.service ------------------------------------------------------------------------------- [Unit] Description=Open vSwitch After=syslog.target network.target openvswitch-nonetwork.service Requires=openvswitch-nonetwork.service [Service] Type=oneshot ExecStart=/bin/true ExecStop=/bin/true RemainAfterExit=yes [Install] WantedBy=multi-user.target -------------------------------------------------------------------------------
122
Installation on CentOS 7
# vi /usr/lib/systemd/system/openvswitch-nonetwork.service ----------------------------------------------------------------------------------------------------- [Unit] Description=Open vSwitch Internal Unit After=syslog.target PartOf=openvswitch.service Wants=openvswitch.service [Service] Type=oneshot RemainAfterExit=yes EnvironmentFile=-/etc/sysconfig/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl start --system-id=random $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl stop ------------------------------------------------------------------------------------------------------
# systemctl start openvswitch [Add dpdk Interface to OpenVSwitch] # ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev # ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk # ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser # ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser # ovs-ofctl show ovsbr0
[qemu installation] # wget http://wiki.qemu-project.org/download/qemu-2.4.1.tar.bz2 # tar xf qemu-2.4.1.tar.bz2 # cd qemu-2.4.1/ # mkdir bin && cd bin # ../configure --target-list=x86_64-softmmu --enable-debug --extra-cflags='-g' # make
123
Installation on CentOS 7
[root@localhost ~]# ip a 5: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 6c:ae:8b:26:72:ba brd ff:ff:ff:ff:ff:ff 7: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 6c:ae:8b:26:72:bb brd ff:ff:ff:ff:ff:ff
[root@localhost ~]# ls -al /sys/class/net/ | grep -i 'eno2\|eno3' lrwxrwxrwx 1 root root 0 Nov 24 23:22 eno2 -> ../../devices/pci0000:00/0000:00:1c.0/0000:06:00.0/net/eno2 lrwxrwxrwx 1 root root 0 Nov 24 23:22 eno3 -> ../../devices/pci0000:00/0000:00:1c.0/0000:06:00.1/net/eno3
[root@localhost ~]# ${RTE_SDK}/tools/dpdk_nic_bind.py --status Network devices using DPDK-compatible driver ============================================ <none> Network devices using kernel driver =================================== 0000:06:00.0 'I350 Gigabit Network Connection' if=eno2 drv=igb unused=igb_uio *Active* 0000:06:00.1 'I350 Gigabit Network Connection' if=eno3 drv=igb unused=igb_uio *Active*
[root@localhost ~]# ifconfig eno2 down [root@localhost ~]# ifconfig eno3 down [root@localhost ~]# ${RTE_SDK}/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.0 [root@localhost ~]# ${RTE_SDK}/tools/dpdk_nic_bind.py --bind=igb_uio 06:00.1 [root@localhost ~]# ${RTE_SDK}/tools/dpdk_nic_bind.py --status Network devices using DPDK-compatible driver ============================================ 0000:06:00.0 'I350 Gigabit Network Connection' drv=igb_uio unused= 0000:06:00.1 'I350 Gigabit Network Connection' drv=igb_uio unused=
124
Installation on CentOS 7
[root@localhost ~]# systemctl start openvswitch [root@localhost ~]# ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev [root@localhost ~]# ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk [root@localhost ~]# ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser [root@localhost ~]# ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser
[root@localhost ~]# tail -n 100 /var/log/openvswitch/ovs-vswitchd.log 2015-11-24T14:35:32.787Z|00018|dpdk|INFO|Port 0: 6c:ae:8b:26:72:ba 2015-11-24T14:35:32.625Z|00011|dpdk|INFO|Port 1: 6c:ae:8b:26:72:bb
[root@localhost ~]# ovs-vsctl show bbb07857-5b13-41ba-b43b-427d6f8eff9c Bridge "ovsbr0" Port "dpdk0" Interface "dpdk0" type: dpdk Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuser Port "ovsbr0" Interface "ovsbr0" type: internal Port "dpdk1" Interface "dpdk1" type: dpdk Port "vhost-user2" Interface "vhost-user2" type: dpdkvhostuser ovs_version: "2.4.0"
125
[Run VMs with dpdk vhost-user device] # cd /root/openvswitch_dpdk/qemu-2.4.1/bin/x86_64-softmmu/ # ./qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 \ -chardev socket,id=char0,path=/var/run/openvswitch/vhost-user1 \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1,mac=52:54:00:02:d9:09 \ -object memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -net user,hostfwd=tcp::10022-:22 -net nic \ /vmdata/jyy-test.qcow2 # ./qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 \ -chardev socket,id=char1,path=/var/run/openvswitch/vhost-user2 \ -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ -device virtio-net-pci,netdev=mynet2,mac=52:54:00:02:d9:10 \ -object memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -net user,hostfwd=tcp::10032-:22 -net nic \ /vmdata/jyy-test2.qcow2
126 126 - Internal Use Only -
OPEN
SHARE
CONTRIBUTE
ADOPT
REUSE