Upload
guiyingshenxia
View
1.499
Download
6
Embed Size (px)
DESCRIPTION
中国互联网运维高峰论坛
Citation preview
从“路由”回归“交换” --探讨数据中心网络的演变
刘 洋
思科中国互联网运营商事业部技术总监
“交换”的烦恼
•物理连接层次
•透明生成树,二层多路径,网络收敛
•Unicast Flooding,环路,广播风暴
“路由”后的幸福生活
•ECMP(Equal Cost Multi Path);
•平滑扩展;
•快速收敛;
•防止广播风暴;
烦恼
•集群的规模
•网段地址规划
•路由控制平面
•虚机
•开放平台,云计算
•价格
•Dumb Big Flat
从“路由”回归“交换” --大型数据中心的交换网络
• Turn your network into a Fabric!
• 关键技术:FabricPath / Trill
FabricPath
FabricPath对于二层交换的创新
• 实现交换机间多条路径同时转发流量ECMP(Equal
Cost Multi Path);去除透明生成树
• 类似路由网络的平滑扩展;
• 快速收敛;
• 防止广播风暴(TTL);
• 保持原有二层网络
• 基于会话的MAC地址学习
• 成本降低
FabricPath的设计目标
Switching Minimal Configuration
Plug & Play
Auto Discovery
Auto Learning
Flat Addressing
Spanning Tree Protocol (STP)
Slow Convergence
Single Path
Edge-to-Root Rigid Design
Single Multicast Tree
Constrained Scaleability
FabricPath Routing Configuration Intense
Configured Learning
Configured Discovery
Plan & Play
Fast Convergence
Multiple Paths
Load Balancing
Multiple Multicast Trees
Hierarchical Forwarding
Any-to-any Flexible Design
Highly Scalable
Cisco FabricPath Frame
Classical Ethernet Frame
FabricPath 封装结构 16-Byte MAC-in-MAC Header
Switch ID – Unique number identifying each FabricPath switch
Sub-Switch ID – Identifies devices/hosts connected via VPC+
Port ID – Identifies the destination or source interface
Ftag (Forwarding tag) – Unique number identifying topology and/or multidestination distribution tree
TTL – Decremented at each switch hop to prevent frames looping infinitely
DMAC SMAC 802.1Q Etype CRC Payload
DMAC SMAC 802.1Q Etype Payload CRC (new)
FP Tag (32)
Outer SA (48)
Outer DA (48)
Endnode ID (5:0)
Endnode ID (7:6)
U/L
I/G
RS
VD
OO
O/D
L
Etype
6 bits 1 1 2 bits 1 1 12 bits 8 bits 16 bits 10 bits 6 bits 16 bits
Switch ID Sub
Switch ID Ftag TTL Port ID
Original CE Frame
FabricPath 控制平面:L2 IS-IS
L2 IS-IS 替代STP作为控制平面协议
引入链路状态协议以支持二层环境下的ECMP能力
交换Switch IDs的可达性并构建转发拓扑
提升故障检测,网络收敛及高可用性
Minimal IS-IS knowledge required –无需用户手动配置
保持了二层的即插即用特性
STP FabricPath
STP BPDU FabricPath IS-IS
STP BPDU
A few key reasons:
仅维系设备之间的可达性信息,而无需IP地址的信息 – 非L3协议,是解决L2 环境下MAC地址传递的协议创新
易扩展–可使用定制的TLVs来传递信息
具备SPF功能– 优秀的拓扑构建及收敛能力
FabricPath Port
CE Port
L2 Fabric
FabricPath 的数据平面
FabricPath Core
→ FabricPath interface
→ CE interface
MAC A MAC B
S10 S20
DMAC→B
SMAC→A
Payload
DMAC→B
SMAC→A
Payload
Ingress FabricPath Switch
Egress FabricPath Switch
DMAC→B
SMAC→A
Payload
DSID→20
SSID→10
DMAC→B
SMAC→A
Payload
DSID→20
SSID→10
DMAC→B
SMAC→A
Payload
DMAC→B
SMAC→A
Payload
入口FabricPath 交换机决定目的交换机ID 并且插入FabricPath 头封装
目的交换机ID 作为路由决策参考
核心内部无需终端MAC 的学习和查找
出口FabricPath 交换机去除FabricPath 头封装并转发给CE设备
FabricPath MAC 转发表 Edge switches maintain both MAC address table and Switch ID table
Ingress switch uses MAC table to determine destination Switch ID
Egress switch uses MAC table to determine output switchport
Local MACs point to switchports
Remote MACs point to Switch IDs
S10 S20 S30 S40
S100 S101 S200 FabricPath
MAC A MAC C MAC D MAC B
FabricPath MAC Table on S100
MAC IF/SID
A e1/1
B e1/2
C S101
D S200
FabricPath Routing 转发表 FabricPath IS-IS manages Switch ID (routing) table
All FabricPath-enabled switches automatically assigned Switch ID (no user configuration required)
Algorithm computes shortest (best) paths to each Switch ID based on link metrics
Equal-cost paths supported between FabricPath switches
S10 S20 S30 S40
S100 S101 S200
FabricPath
FabricPath Routing Table on S100
Switch IF
S10 L1
S20 L2
S30 L3
S40 L4
S101 L1, L2, L3, L4
… …
S200 L1, L2, L3, L4
One ‘best’ path to S10 (via L1)
Four equal-cost paths to S101
L1 L2 L4 L3
FabricPath Routing 转发表项构建
S10 S20 S30 S40
S100 S101 S200 FabricPath
MAC A MAC C MAC D MAC B
L1 L2 L4 L3
L5 L6 L7 L8
L9 L10 L11 L12
Switch IF
S10 L1
S20 L2
S30 L3
S40 L4
S101 L1, L2, L3, L4
… …
S200 L1, L2, L3, L4
Switch IF
S20 L1,L5,L9
S30 L1,L5,L9
S40 L1,L5,L9
S100 L1
S101 L5
… …
S200 L9
Switch IF
S10 L4,L8,L12
S20 L4,L8,L12
S30 L4,L8,L12
S100 L4
S101 L8
… …
S200 L12
Switch IF
S10 L9
S20 L10
S30 L11
S40 L12
S100 L9, L10, L11, L12
S101 L9, L10, L11, L12
… …
Putting It All Together – Host A to Host B (1) Broadcast ARP Request
S10 S20 S30 S40
S100 S101 S200 FabricPath
Root for Tree 1
Root for Tree 2
MAC A MAC B
Multidestination Trees on Switch 100
Tree IF
1 L1,L2,L3,L4
2 L4
DMAC→FF
SMAC→A
Payload
DSID→FF Ftag→1
SSID→100
Broadcast →
DMAC→FF
SMAC→A
Payload
Multidestination Trees on Switch 10
Tree IF
1 L1,L5,L9
2 L9
L1 L2 L4 L3
L5 L6 L7 L8
L9 L10 L11 L12
Ftag →
Ftag →
DMAC→FF
SMAC→A
Payload
DSID→FF Ftag→1
SSID→100
FabricPath MAC Table on S200
MAC IF/SID
Multidestination Trees on Switch 200
Tree IF
1 L9
2 L9,L10,L11,L12
FabricPath MAC Table on S100
MAC IF/SID MAC IF/SID
A e1/1 (local)
DMAC→FF
SMAC→A
Payload
Learn MACs of directly-connected devices unconditionally
Don’t learn MACs in flood frames
Putting It All Together – Host A to Host B (2) Unicast ARP Reply
S10 S20 S30 S40
S100 S101 S200 FabricPath
MAC A MAC B
Multidestination Trees on Switch 100
Tree IF
1 L1,L2,L3,L4
2 L4
DMAC→A
SMAC→B
Payload
DSID→MC1 Ftag→1
SSID→200
Ftag →
DMAC→A
SMAC→B
Payload
Multidestination Trees on Switch 10
Tree IF
1 L1,L5,L9
2 L9
Ftag →
Unknown →
DMAC→A
SMAC→B
Payload
DSID→MC1 Ftag→1
SSID→200
FabricPath MAC Table on S200
MAC IF/SID
Multidestination Trees on Switch 200
Tree IF
1 L9
2 L9,L10,L11,L12
FabricPath MAC Table on S100
MAC IF/SID
A e1/1 (local) DMAC→A
SMAC→B
Payload
MAC IF/SID
B e12/2 (local)
A →
MAC IF/SID
A e1/1 (local)
B S200 (remote)
L1 L2 L4 L3
L5 L6 L7 L8
L9 L10 L11 L12
A → If DMAC is known, then learn remote MAC
FabricPath MAC Table on S200
MAC IF/SID
B e12/2 (local)
FabricPath MAC Table on S100
MAC IF/SID
A e1/1 (local)
B S200 (remote)
Putting It All Together – Host A to Host B (3) Unicast Data
S10 S20 S30 S40
S100 S101 S200 FabricPath
MAC A MAC B S200 →
DMAC→B
SMAC→A
Payload
L1 L2 L4 L3
L5 L6 L7 L8
L9 L10 L11 L12
S200 →
DMAC→B
SMAC→A
Payload
DSID→200 Ftag→1
SSID→100
MAC IF/SID
A S100 (remote)
B e12/2 (local)
DMAC→B
SMAC→A
Payload
B → B →
FabricPath Routing Table on S100
Switch IF
S10 L1
S20 L2
S30 L3
S40 L4
S101 L1, L2, L3, L4
… …
S200 L1, L2, L3, L4
DMAC→B
SMAC→A
Payload
DSID→200 Ftag→1
SSID→100
FabricPath Routing Table on S30
Switch IF
… …
S200 L11
FabricPath Routing Table on S30
Switch IF
… …
S200 – S200 →
Hash
MAC C
基于会话的MAC学习
FabricPath Core
MAC A
MAC B
FabricPath MAC Table on S100
MAC IF/SID
A e1/1 (local)
B S200 (remote)
S100
S200
S300
FabricPath MAC Table on S200
MAC IF/SID
A S100 (remote)
B e12/1(local)
C S300 (remote)
FabricPath MAC Table on S300
MAC IF/SID
B S200 (remote)
C e7/10 (local)
Conversational MAC Learning
500 MACs
500 MACs
500 MACs
500 MACs
250 MACs
250 MACs
250 MACs
250 MACs
ALL MACs needs to be learn on EVERY Switch
Large L2 domain and virtualization present challenges to MAC Table scalability
STP Domain
Local MAC: Source-MAC Learning only happen to traffic received on CE Ports
Remote MAC: Source-MAC for traffic received on FabricPath Ports are only learned if Destination-MAC is already
known as Local
S11
A C
B
L2 Fabric
MAC IF
C 3/1
A S11
MAC IF
B 2/1
MAC IF
优化资源利用率 – Learning only the MAC addresses required
Same node type used in all roles (Spine and Edge)
Fine Grain Redundancy
Additional density provided through density of node or additional layers
High density spine node
Smaller fixed leaf
Fewer control planes than pure Clos
Layer-1.5 Spine (Dumb Core)
Intelligent Edge
CLOS Scale-Up Spine Scale-Out Leaf
Lean Core Smart Edge
Architectural Approach for MSDC
Fabricpath 构建通用网络交换平台
POD 1
VLANs 100-199
POD 2 POD 3
VLANs 200-299 VLANs 300-399 VLANs 100-399
PODS 1-3
大规模数据中心的通用网络交换平台 --网络对业务部署灵活性的支持
模块化 易扩展
网络带宽及延时的一致性 与服务器所处位置无关
业务的快速部署 计算资源的灵活移动和调配
Any service on any server, at any time!!!
可扩展性 业务/集群的扩展不再受制于网络
服务器的使用效率 服务器重复利用
可管理性 即插即用,配置最简化,人工干预少
可靠性 单点故障对整体业务的影响
从“路由”回归“交换” --中小型数据中心的交换网络
• Turn your network into a Switch
• 关键技术:远端扩展模块,FEX as TOR
Nexus 7000/5000 Virtualized chassis
+
Nexus 5000
Nexus 2000 Fabric Extender
=
FEX Terminology
FEX can be connected to a parent switch in three ways:
single attached without any vPC running on the parent switch
single attached with vPC running on the parent switch
dual attached in vPC mode
Parent switch
vPC Primary
vPC Secondary
Fabric Links
vPC 1 vPC 2
Fabric Links
vPC Primary
vPC Secondary
Fabric Links
HIFs HIFs
HIFs
NIFs NIFs
NIFs
FEX Inner Functioning Inband Management Model
Fabric extender is discovered by switch using an L2 Satellite Discover Protocol (SDP) that is run on the uplink port of fabric extender
Core Switch checks software image
Core Switch pushes programming data to Fabric Extender
1-48 GigE
N5k01
1,2,3,4
softw
are
im
age,
configura
tion
• 扁平化结构
• 应用在更大区域的灵活部署
• 线速的网络
Data Center-Wide Scalability at Layer 2
谢谢