Upload
darren-young
View
266
Download
6
Embed Size (px)
Citation preview
VL2: A Scalable and Flexible Data Center Network
Networking the CloudPresenter: b97901184 電機三 姜慧如
“Shared” Data Center for Cloud
Data Centers holding tens to hundreds of thousands of servers.
Concurrently supporting a large number of distinct services.
Economies of scale. Dynamically reallocate servers
among services as workload pattern changes.
High utilization is needed. Agility!
Agility is Important in Data Centers
Agility: Any machine to be able to
play any role. “Any Service, any Server.”
But…Utilization tend to be low Ugly Secret: 30% utilization is considered
good. Uneven application fit: -- Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others. Long provisioning timescales: -- New servers purchased quarterly at best. Uncertainty in demand: -- Demand for a new service can spike quickly. Risk management: -- Not having spare servers to meet demand brings failure just when success is at hand. Session state and storage constraints: -- If the world were stateless servers….
How to achieve agility?
Workload management-- Means for rapidly installing a service’s code on a server. Virtual Machines, disk images.Storage management-- Means for a server to access persistent data.Distributed filesystems.Network-- Means for communicating with other servers, regardless of where they are in the data center.
But: Today’s data center network prevent
agility in several ways.
Conventional DCN Problems
Conventional DCN Problems
Static network assignment Fragmentation of resource
Poor server to server connectivity
Traffics affects each other Poor reliability and utilization
CR CR
AR AR AR AR
SS
SS
A AA …
SS
A AA …
. . .
SS
SS
A AA …
SS
A AA …
I want moreI have spare ones,
but…1:5
1:80
1:240
Conventional DCN Problems (cont.) Achieve scale by assigning servers
topologically related IP addresses and dividing servers among VLANs.
Limited utility of VMs, cannot migrate out the original VLAN while keeping the same IP address.
Fragmentation of address space. Configuration needed when reassigned to
different services.
Virtual Layer 2 Switch
9
The Illusion of a Huge L2 Switch
1. L2 semantics
2. Uniform high capacity
3. Performance isolation
A AA … A AA …
. . .
A AA … A AA …
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
AAAA AAAA AAAA A A A A AA A AA AA AA
. . .
Network Objectives
Developers want a mental model where all their servers, and only their servers, are plugged into an Ethernet switch.1. Layer-2 semantics-- Flat addressing, so any server can have any IP Address.-- Server configuration is the same as in a LAN.-- VM keeps the same IP address even after migration2. Uniform high capacity-- Capacity between servers limited only by their NICs.-- No need to consider topology when adding servers.
3. Performance isolation-- Traffic of one service should be unaffected by others.
VL2 Goals and Solutions
11
SolutionApproachObjective
2. Uniform
high capacity
between servers
Enforce hose model using existing mechanisms only
Employ flat addressing
1. Layer-2 semantics
3. Performance
Isolation
Guarantee bandwidth forhose-model traffic
Flow-based random traffic indirection(Valiant LB)
Name-location separation
& resolution service
TCP
“Hose”: each node has ingress/egress bandwidth constraints
Reminder: Layer 2 vs. Layer 3
Ethernet switching (layer 2) Cheaper switch equipment Fixed addresses and auto-configuration Seamless mobility, migration, and failover
IP routing (layer 3) Scalability through hierarchical addressing Efficiency through shortest-path routing Multipath routing through equal-cost multipath
So, like in enterprises… Data centers often connect layer-2 islands by IP routers
12
Measurements and Implications of DCN
Data-Center traffic analysis: DC traffic != Internet traffic
Traffic volume between servers to entering/leaving data center is 4:1
Demand for bandwidth between servers growing faster Network is the bottleneck of computation
Flow distribution analysis: Majority of flows are small, biggest flow size is 100MB The distribution of internal flows is simpler and more
uniform 50% times of 10 concurrent flows, 5% greater than 80
concurrent flows
Measurements and Implications of DCN
Traffic matrix analysis: Poor summarizing of traffic patterns Instability of traffic patterns
Failure characteristics: Pattern of networking equipment failures: 95% < 1min,
98% < 1hr, 99.6% < 1 day, 0.09% > 10 days No obvious way to eliminate all failures from the top of
the hierarchy
VL2’s Methods
Flat Addressing:Allow service instances (ex. virtual machines) to be placed anywhere in the network.
Valiant Load Balancing:(Randomly) Spread network traffic uniformly across network paths.
End-system based address resolution:To scale to large server pools, without introducing complexity to the network control plane.
Virtual Layer Two Networking (VL2) Design principle:
Randomizing to cope with volatility:▪ Using Valiant Load Balancing (VLB) to do destination
independent traffic spreading across multiple intermediate nodes
Building on proven networking technology:▪ Using IP routing and forwarding technologies available in
commodity switches Separating names from locators:
▪ Using directory system to maintain the mapping between names and locations
Embracing end systems:▪ A VL2 agent at each server
VL2 Topology: Clos Network
Clos Network Topology
18
VL2
. . .
. . .
TOR
20 Servers
Int
. . . . . . . . .
Aggr
K aggr switches with D ports
20*(DK/4) Servers
. . . . . . . . . . .
Offer huge aggr capacity & multi paths at modest cost
Use Randomization to cope with Volatility
Valiant Load Balancing: Indirection
20
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
Cope with arbitrary TMs with very little overhead
Links used for up pathsLinks usedfor down paths
T1 T2 T3 T4 T5 T6
[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Obviate esoteric traffic engineering or
optimization• Ensure robustness to failures• Work with switch mechanisms available today
1. Must spread traffic2. Must ensure dst independenceEqual Cost Multi Path Forwarding
Random Traffic Spreading over Multiple Paths
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
Links used for up paths
Links usedfor down paths
T1 T2 T3 T4 T5 T6
Separating names from locations
How Smart servers use Dumb switches– Encapsulation.
Commodity switches have simple forwarding primitives. Complexity moved to servers -- computing the headers.
VL2 Directory System
RSM
DS
RSM
DS
RSM
DS
Agent
. . .
Agent
. . .. . . DirectoryServers
RSMServers
2. Reply2. Reply1. Lookup
“Lookup”
5. Ack
2. Set 4. Ack(6. Disseminate)
3. Replicate
1. Update
“Update”
VL2 Addressing and Routing
VL2
payloadToR3
. . . . . .
yx
Servers use flat names
Switches run link-state routing and
maintain only switch-level topology
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y, zpayloadToR3 z
. . .
Directory
Service…x ToR2
y ToR3
z ToR4
…
Lookup &Response
…x ToR2
y ToR3
z ToR3
…
LAs
AAs
Embracing End Systems
Data center Oses already heavily modified for VMs, storage clouds, etc.
No change to apps or clients outside DC.
Evaluation
Uniform high capacity: All-to-all data shuffle stress test:
▪ 75 servers, deliver 500MB
▪ Maximal achievable goodput is 62.3▪ VL2 network efficiency as 58.8/62.3 = 94%
Evaluation
Performance isolation: Two types of services:
▪ Service one: 18 servers do single TCP transfer all the time▪ Service two: 19 servers starts a 8GB transfer over TCP every
2 seconds
▪ Service two: 19 servers burst short TCP connections
Evaluation
Convergence after link failures 75 servers All-to-all data shuffle Disconnect links between intermediate and aggregation
switches