Network and TCP performance relationship workshop

Preview:

DESCRIPTION

Slide in TWNIC 14th OPM TWNOG Workshop. Date: July 2, 2010.

Citation preview

TWNOG WORKSHOP2010/7/2, Taipei

網路維運常見問題原因、偵錯 (Troubleshooting)技術解析

網路與 TCP效能關聯探討

智匯亞洲有限公司許至凱 CCIE/JNCIE

kaeatforum [at] gmail.com

2010.7.2 2TWNOG WORKSHOP 2010/7/2, Taipei

Objects

• 對象:網路設備操作、維運人員• 了解有那些網路環境因子會對於 TCP效能造成影響,以連結網路維運與網路應用程式效能,做為網路環境改善方式的參考。– 了解 TCP運作原理– 那些網路事件發生時將影響 TCP效能表現?– 因應對策

2010.7.2 3TWNOG WORKSHOP 2010/7/2, Taipei 3

Agenda

• TCP Briefing• TCP Performance Factors• Network Event Impact• Improvement – Network approach• Improvement – Appliance approach• Reference

2010.7.2 4TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP/IP stack in a computer system– Linux

Application

Socket Layer

(net/socket.c)

Inet Layer

(net/ipv4/af_inte.c)

IP Layer (various ip files in net/ipv4)

TCP Layer

(net/ipv4/tcp.c)

UDP Layer

(net/ipv4/udp.c)

Ethernet Device Driver

Ethernet

Card

Other

Drivers

Parallel/Serial/Other

Interface Drivers

2010.7.2 5TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP/IP stack in a computer system– Windows

TCP/IP Stack (Tcpip.sys)

Windows Sockets Applications

Windows Sockets

AFDWSK Clients

WSK

NetBT and other TDI clients

TDI

TDX

TCP UDP RAW

IPv6IPv4

802.3 PPP 802.11 LoopbackIPv4

Tunnel

NDIS

User

Kernel

2010.7.2 6TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP/IP position in computer and network environment

2010.7.2 7TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP header format (RFC793)

2010.7.2 8TWNOG WORKSHOP 2010/7/2, Taipei

TCP Briefing

• TCP header format (updated by RFC3168)

2010.7.2 9TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• TCP Performance Factors– Monitoring Tools

– Flow control

– Congestion control

2010.7.2 10TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Measurement tools• Monitoring tools

– tcpdump» On Windows platform - Wireshark

– tcpstat

• Benchmarking tools– ttcp

– Netperf

– NetPIPE

– DBS (Distributed Benchmark System)

2010.7.2 11TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Flow control• Sliding Window (window size = 6 in the example)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 Step 1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 Step 2

Step 3

Step 4

Time

已收到ACK等待 ACK中

可傳送區間

不可傳送區間

0 1 2 3 4 5 6 7 8 9 10 11 12 13

0 1 2 3 4 5 6 7 8 9 10 11 12 13

2010.7.2 12TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Flow control• Window Size

Adjustment– “Receiver

window size filed” in TCP header

2010.7.2 13TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

– Congestion Control• Flow control讓接收端控制進入之流量,避免 buffer overflow情況發生

– 藉由 AdvertisedWindow調整發送端 window size

– 無法反應網路連線狀況» 無法避免所經網路是否有類似 buffer overflow情況發生

• 為能偵測可能的網路壅塞, TCP使用 Congestion control。– 藉由 CongestionWindow (cwnd)來進行調整

• Congestion control主要含四種方式 (RFC5681):– Slow start

– Congestion avoidance

– Fast retransmit

– Fast recovery

2010.7.2 14TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Slow start– TCP connection剛建立時,使用小的 window size。等到收到ACK後再慢慢增加。

» cwnd初始值為 1» 旨在偵測網路頻寬狀況

– 每收到 1 個 ACK 則 cwnd+1» 如此一來,每經過一個 round-

trip time (RTT) , cwnd的值則變成上一次 RTT的兩倍

» 指數成長– 為避免 cwnd增加太快,俟

cwnd超過” slow start threshold, ssthresh”後,每一 RTT只增加1

» 線性成長

2010.7.2 15TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Congestion avoidance– 在此階段 :

» cwnd > ssthresh» cwnd + 1 for each RTT

– 當有 packet loss發生時,則 :

» ssthresh -> cwnd/2» cwnd -> 1» packet retransmission

– 一旦 packet loss發生時, TCP Performance將受到嚴重影響。

2010.7.2 16TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Slow start & Congestion avoidance characteristic

2010.7.2 17TWNOG WORKSHOP 2010/7/2, Taipei

TCP Performance Factors

• Fast retransmit (Tahoe)– 仍套用 slow start + congestion avoidance

– sender收到 3 個 duplicate ACK後即重新傳送封包» 避免 sender timeout後,因必須調整 ssthreh/cwnd造成 TCP效能嚴重下降

• Fast recovery (Reno)– 先套用 fast retransmit

» 收到 duplicate封包後即進入 congestion avoidance

– 再執行 fast recovery» ssthresh -> cwnd/2» 重送封包» cwnd -> ssthresh + 3

• NewReno, SACK, Vegas…..– 都在 TCP端進行效能改善

2010.7.2 18TWNOG WORKSHOP 2010/7/2, Taipei

Network Event Impact

• Packet loss– By TCP congestion control, packet loss will launch TCP

retransmission• 儘管 TCP congestion control做的再好, packet loss都會造

成 TCP Performance downgrade

2010.7.2 19TWNOG WORKSHOP 2010/7/2, Taipei

Network Event Impact

• Packet out-of-order– Packet out-of-order 時 , 雖然 TCP能夠將封包組回 , 但若

TCP fast recovery作用時反可能會造成資源浪費• Reno在收到 duplicate ACK後即會開始重送封包,直到收到

Partial ACK後才停止。– 若 packet只是慢點到而不是不到,則 sender勢必會重傳不需要重傳的封包,造成資源浪費。

• NewReno為改善 Reno的效率,會在收到 Final ACK後才停止重傳遺失封包。

– NewReno會重覆送的封包數量有可能比 Reno還多。

2010.7.2 20TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

• Reduce packet loss– Packet loss 對 TCP Performance影響很大,網路環境中所

有 packet loss都應儘量排除。– Layer 1, layer 2 error

• Unqualified physical media– CRC, P3 error etc…

– Layer 3• Router/Switch hardware or software error

– Congestion– Reduce congestion impact by QoS deployment

– Avoid packet drop for high sensitive TCP application

2010.7.2 21TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

– Packet forward process without QoS• Tail-drop

– 網路設備 hardware queue因線路擁塞而被佔滿,在無法容納更多待傳送封包後直接將待傳送封包丟棄。

– Hardware queue無法判斷 packet priority,一但發生 queue塞滿的情況時則無差別的將封包丟棄。

» 此類情況即為 Tail-drop

– 要儘量避免發生 Tail-drop情況。

2010.7.2 22TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

– Packet forward process with QoS• 先使用不同的 logical queue來存放 priority不同的封包,再置

入 h/w queue中。在 H/W queue塞滿之前,主動丟棄某些暫存於 low priority queue的封包,防止 Tail-drop情況發生。

– RED – Random Early Detection

– WRED – Weighted Random Early Detection

2010.7.2 23TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Network approach

• Reduce out-of-order packets– 避免同一 TCP session走在不同的 path上

• Per-packet load-sharing– Load-sharing by destination IP only

• Per-flow load-sharing– Load-sharing by IP packet hash value. Hash index includes:

» Source IP 、 Destination IP» Protocol» Source Port 、 Destination Port

– 有著相同 hash值的封包會走相同的 next-hop interface,避免packet out-of-order情況發生。

– TCP實作 Selective Acknowledgements• RFC2018• RFC2883

2010.7.2 24TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• Operating System has to handle TCP session routine– It’s CPU/Memory dependent

• Huge TCP session will occupy system resource like CPU cycles and memory utilization, and shrink the real service processes in asking CPU/Memory

• Reduce system resource consumption in TCP session handling– TCP Offload

– TCP Optimization

2010.7.2 25TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Offload– Migrate TCP handling out of

kernel• Use dedicate hardware to

handle TCP• Save system resource for

real service processes

– TOE (TCP Offload Engine) NIC

• Handle TCP/IP on NIC

2010.7.2 26TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Offload– NIC w/o TOE and NIC w/ TOE comparison

2010.7.2 27TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Offload– TOE is wide deployed in iSCSI environment

• iSCSI:

2010.7.2 28TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– Migrate huge TCP session out of system

– For any TCP session, 3-way handshaking and 4-way handshaking is necessary

• 3-way handshaking for TCP connection establishment• 4-way handshaking for TCP connection termination

– Reduce TCP connection number will reduce connection “overhead”

• Deploy dedicate hardware in the front of servers

2010.7.2 29TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– Regular TCP connection

Client ServerSYN

ACK

SYN+ACK

GET

FIN

ACK

ACK

Data

DataData

FIN

2010.7.2 30TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– Reduce server TCP connection number

• Only ONE 3-way handshaking is necessary in early stage

Client ServerTCP ProxySYN

ACK

SYN+ACK

GET

FINACK

ACK

Data

DataData

GET

Data

DataData

FIN

2010.7.2 31TWNOG WORKSHOP 2010/7/2, Taipei

Improvement – Appliance approach

• TCP Optimization– 現實環境中很少僅用來改善 TCP效能

• 多搭配其它功能• L4~L7 load-balance

– 由於 Client TCP connection end-to-end是建立在 TCP Proxy上,更多其它功能可以被加入

• SSL加速• Reverse cache

2010.7.2 32TWNOG WORKSHOP 2010/7/2, Taipei

Reference

• Books– High-Speed Networks and Internets – Performance and Quality

of Service, 2nd Ed.• By William Stallings; Prentice Hall

– High Performance TCP/IP Networking – Concepts, Issues and Solutions

• By Mahbub Hassan and Raj Jain; Pearson Prentice Hall

– TCP/IP Illustrated, Volume 1• By W. Richard Stevens; Addison Wesley

• Articles– TCP Performance

• By Geoff Huston; The Internet Protocol Journal - Volume 3, No. 2

– A very good “sliding window” description• http://www.it.uu.se/edu/course/homepage/datakom/

civinght04/schema/sliding_window.pps

2010.7.2 33TWNOG WORKSHOP 2010/7/2, Taipei

Q & A

Recommended