22
Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Embed Size (px)

Citation preview

Page 1: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Global Observatory for Advanced Network Operations

APAN Hawaii meeting, January 2004

Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi

APAN Tokyo XP

Page 2: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Outline

Background

Lessons learned from high performance experiments

Comparison between APAN JP NOC pages and Abilene NOC/Observatory

Discussion on Global Observatory

Page 3: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Background

High performance demonstration become to be held constantly through a year.Some demonstrations begins to be done without notifications.

Know-how for provisioning and troubleshoot has been accumulated by network engineers.

Operators struggle with deployment of new service and technology.Measurement for new items such as IGP stability

Performance experiment such as SLAC, etc.

Conventional services should be maintained

Needs to share high technology and know-how to operators toward new services with higher technology.

Introduction of new tools with advanced functions enables operators to tackle new service easily.

View of collected data leads to the smooth and stable operation at a result.

Page 4: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Experiences of High Performance Experiment Support

Osaka UniversityHD over IPv6 transmission to NPACI, SC2003 and OptIPuter

• Osaka Univ., JGNv6, Tokyo XP, Abilene and SCInet/SDSC, 100+ Mbit/s

University of WashingtonHD over IP transmission at APAN Busan meeting, Aug 2003

• Tyco DC, IEEAF-WIDE, Tokyo XP, JGN, Genkai XP and KOREN, 200+ Mbit/s

National Institute of Advanced Industrial Science and Technology JapanSC2003 Bandwidth Challenge

• Tokyo XP/SINET Abilene and SCInet, 3.8Gbit/s

University TokyoSC2003 Bandwidth Challenge

• Tokyo Univ., Tokyo XP, WIDE, Tyco DC(Seattle), NTT Communications, Abilene, SCInet, 7.8Gbit/s

SURFNET/TransLIGHT/APAN Tokyo XPPerformance test has just started.Bottleneck is 1Gbit/s between Japan and Netherlands.

Page 5: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Lessons Learned from Experiments

Importance of understanding the availability on the pathPerformance test should be performed hop by hop.Resource sharing technique is required.

Effects of multiple TCP connections and rate control at source stations.Dynamic rate control mechanism is one of the next key technologies items.

VLAN ID assignment policy applied to HPRENs connections is not discussed yet.

It is anticipated that connections among three continentals accelerates VLAN ID consumption.A VLAN for Tokyo XP/TransLIGHT/SURFnet was already been assigned.

Importance of management of used ports and assigned VLAN was pointed out

Much time was spent to establish the VLAN between Tokyo and Netherlands.It’s seems that few people can administrate VLAN and networks connected are not clarified.

Page 6: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Comparison between APAN-JP NOC page and Abilene NOC/Observatory-1

Network Monitoring

Function APAN-JP NOCAbilene NOC/Observatory

Our Impression

Useful for Trouble shooting

Useful for operation

View current alerts on network - Alertmon Network Monitor ☆☆☆ ☆

View geographical network usage data - Animated Traffic Map ☆☆ ☆☆

Show traffic graph of router interface

All Links Traffic Graphs by MRTG

RRD Connector Graphs (5-minute avg)

☆☆ ☆☆

Show high-performance traffic graph of router interface

http://mrtg.jp.apan.net/cricket/router-interfaces/ & Traffic Report by RRD

RRD Connector Graphs (1-minute avg)

☆☆☆ ☆☆☆

Aggregate traffic graphs of the whole network or routers - Aggregate Traffic on Abilene ☆ ☆☆

View router CPU utilization CPU Utilization by MRTG Contained in Visible Network Toolset

☆☆☆ ☆☆

View router Memory utilization Memory Utilization by MRTG Contained in Visible Network Toolset

☆☆ ☆☆

View router Temperature measurement Temperature Measure by MRTG

Contained in Visible Network Toolset

☆ ☆☆

Page 7: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Comparison between APAN-JP NOC page and Abilene NOC/Observatory-2

Network Monitoring

Function APAN-JP NOCAbilene NOC/Observatory

Our Impression

Useful for Trouble shooting

Useful for operation

Collect & analyze XML Network data - Visible Network Toolset ☆☆☆ ☆☆☆

Archive text reports of daily aggregate statistics - Daily Connector Statistics ☆☆ ☆☆

View recent outage and availability reports - Weekly Reports ☆☆ ☆☆☆

View daily reports generated from the Netflow data collected from routers

- Netflow Reports ☆☆☆ ☆☆

View errors and discards on the link Error Packets & Discard Frame Animated Traffic Map ☆☆ ☆☆

Show daily snapshots of the BGP table and BGP events - Ixia IxTraffic BGP tables ☆☆ ☆☆

Show time-series graph of the number of BGP routes per peer Number of BGP routes - ☆☆☆ ☆☆

View the VLAN information VLAN View - ☆☆ ☆☆☆

Page 8: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

View traffic of the network displayed on the map Excels in grasping the traffic Usage of the whole network. APAN JP is considering the introduction of this tool in the near future. Requirements

• Hardware: Traffic Graph Server (Ready)• Library: GD Library (Ready)• Software: http://tseg.uits.iu.edu/dist/wxmap/index.html

Collect and analyze XML Network dataUseful for remote trouble shooting from the other networks.Requirements

• Know-how of JUNOScript• Hardware: WWW Server with huge hard disk (Ready)• Software: http://sourceforge.net/projects/visiblebackbone/

View daily reports generated from the Netflow data collected from routers Useful for routing troubleshoot, traffic analysis and DoS detection. APAN JP is considering the realization with cflow.Requirements

• Software: cflowd http://www.caida.org/tools/measurement/cflowd/• Router configuration

Page 9: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Comparison between APAN-JP NOC page and Abilene NOC/Observatory-3

Operation Tools

Function APAN-JP NOCAbilene NOC/Observatory

Our Impression

Useful for Trouble shooting

Useful for operation

Get the result of operational commands to a router from the web page

Traceroute Service Core Node Router Proxy ☆☆☆ ☆☆☆

Visualize the tree

structure of a mroute- Multicast Route Viewer ☆☆ ☆☆

Operation manual searching engine

A necessary manual can be searched with a keyword from about 150 manuals (secure page)

? ☆☆☆ ☆☆

Show the DC power usage polled from power controllers -

Rack Power Draws☆☆ ☆☆☆

Ticket system for managing problem and maintenance Request Tracker (secure page) Helpdesk System ☆ ☆☆☆

Check the allocation of IP address

Automatically check by icmp

(secure page)? ☆ ☆☆

Page 10: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Get the result of operational commands to a router from the web page It allows APAN participants to execute “show” commands on multiple routers in APAN Tokyo XP via a web interface.Enable to conduct more complete diagnosis of a problem without contacting NOC.Requirements

• Hardware: WWW server (Ready)• Software: routeproxy http://sourceforge.net/projects/routerproxy/• Router configuration

Show the DC power usage polled from power controllersEnable to show an overview of power usage in rack.No need to measure the power usage of each rack on the regular schedule.Requirements

• Hardware: WWW Server (Ready), SNMP responseable AC/DC power controller• Software: NET-SNMP (Ready)

Page 11: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Comparison between APAN-JP NOC page and Abilene NOC/Observatory-4

Advanced Services

Function APAN-JP NOCAbilene NOC/Observatory

Our Impression

Useful for Trouble shooting

Useful for operation

Monitor various aspects of multicast at the router level -

MANTRA Multicast Monitoring

☆☆ ☆☆☆

Test multicast connectivity between multiple hosts - Multicast Beacon Server ☆☆☆ ☆☆☆

Display the amount of IPv6 traffic to and from the tunnel router

- IPv6 Traffic Graphs ☆☆ ☆☆

Determine the Provider Independent (PI) address based on the location

-Provider Independent Addressing

☆ ☆☆

Page 12: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Test of multicast connectivity between multiple hosts Provides measurement data for the current multicast traffic in a group by RTP.Requirements

• Hardware: WWW Server• Perl Module: Net-RTP-0.4, Net::Domain• Software: Multicast Beacon http://dast.nlanr.net/Projects/Beacon/index.html

Page 13: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Comparison between APAN-JP NOC page and Abilene NOC/Observatory-5

Observatory

Function APAN-JP NOCAbilene NOC/Observatory

Our Impression

Useful for Trouble shooting

Useful for operation

Provide one way latency information for each link - Abilene Latency Tables ☆☆☆ ☆☆

Provide one way latency statistics -Abilene Worst Ten Performing Latency Measurements

☆☆ ☆☆

Provide Throughput information for each link - Abilene Throughput Tables ☆☆☆ ☆☆☆

Provide Throughput statistics -Abilene Worst Ten Performing Throughput Measurements

☆☆ ☆☆

Provide Router Node Data -

Connection Technologies Table

☆☆ ☆☆

NTP Service Stratum 2 Server Stratum 2 Server ☆ ☆☆☆

Page 14: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Provide one way latency information for each link Network performance of each link can be checkedVital when trouble-shooting delay-related problems Requirements

• NTP on endpoints• Hardware: FreeBSD or Linux Server with OWAMP (One-way Active Measurement

Protocol)• OWAMP: http://e2epi.internet2.edu/owamp/download/

Provide Throughput Information for each linkDraw the throughput statistics and ranking from the data which Iperf reported.Throughput (Mbps) / Jitter (%) / Packet Loss (%), Date & TimeRequrements

• Hardware: WWW Server (Ready)• Software: Iperl http://dast.nlanr.net/Projects/Iperf/

Page 15: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Discussion: Global Observatory for Advanced Researches over HPRENs

We would like to work for a program challenge that supports the collection and dissemination of network data over HPRENs.

With observatory servers such as Abilene Observatory, NOC provides network engineers a view of the operational data associated with a global-scale network, and also research communities the fundamental properties of basic network protocols.

With provide rack space and fundamental operation service by NOC, Advanced research projects can collect data to be opened to the other researchers. NOC will provide an account in measurement devices and the ways to export data.

NOC manages resources for the operations, advanced researches and the conventional services.

Page 16: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Issues towards global operations and/or sharing measurement data among HPRENs

Non-uniform networks are connected.High load performance test should be considered.

Long DistanceLong latency.Time synchronization accuracy tends to be NTP level.

Time differenceBurden of Troubleshoot depends on how much management information are opened to the community.Troubleshoot escalation level for nighttime operators should be specified at every site.

ScalabilityAutomated operations of the tools will be more important, because the number of measurement or monitoring tools will be increased.Software update methods and software version management will also be more troublesome.

Page 17: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Requirements for Global Observatory

Basic data collection.The comparison between APAN-JP NOC and Abilene NOC/Observatory is described in the front pages.

Advanced features to be developed for advanced researchers.TCP performance test schedulerHigh resolution flow based traffic grapher for TCP/UDP performance test

Preparations to accept an applications of co-located projectObservatory Servers

• Cooperate with Abilene Observatory project.

Preparation for Co-Located project• Available rack space & power and account in related routers and switches.• Application: form, organizing evaluation team.

Drafting Acceptable Co-location Policy (Example)• Accounts should be provided to NOC operators and engineers.• Permission of exporting date to specified researchers.

Page 18: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

TCP Performance Test Scheduler

Increasing TCP high performance transmission experiment from multiple group or organization.

SLAC, CRL, NASA, Indiana University, Demonstrations

Experiments are getting congested.

Some experiments require original TCP stackDedicated machine is required for this kind of experiments.

Traffic generated by each experiment can easily fill up the link between the end hosts.

Multiple experiments running at the same time degrade the performance.

Page 19: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Architecture of TCP Performance Scheduler

RR

RR

RR

RR

Performance machine with original TCP stack(iperf, netperf, etc)

Scheduler

Host A

Host B

1. At beginning, Scheduler collects topology data by SNMP and routing information base from routers by OSPF SLAs and recognizes topology tree.

2. Run iperf wrapper program to send a ticket to Scheduler for running the performance test.

3. Scheduler checks the queuing tickets and responds when test can start. At this stage tickets performance time is registered as a reservation.

4. If the start time is accepted, the host responds whether to run or quit.

5. Scheduler update a reservation ticket based on wrapper program response. If wrapper program requests to continue, complete ticket registration.

6. Wrapper program run iperf test.

4. Response whether continue or no

2. & 3. test request and

runable time response

1. Collect data to recognize topology tree at Scheduler

5. Register reserving tickets for test in the future or discard

Iperf test path

6. Run performance test

Page 20: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

High Resolution Flow Base Traffic Grapher

Why Flow measurement?Performance test wants traffic viewers shows the traffic variation based on flows at intermediate routers.Data collection by Netflow/Cflowd shows traffic tendency because 1/100 - 1/1000 packet sampling analysis is done on high speed routers in stead of full packet checking. It’s not enough!

Why High resolutions?It’s easy for users to get average speed of TCP performance test at end stations, but hard to understand variation of speed in short period such as millisecond granularity.Control of variable burst traffic might help to troubleshoot TCP performance degradation.AIST and University of Tokyo was awarded in SC2003 bandwidth challenge.

Multiple flows should be measured at the same time over multiple links.

Measurement performance should be investigated.

Page 21: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Requirements of High Resolution Flow Base Traffic Grapher

Devices will be installed on multiple linksUser and operator want to compare the performance over multiple paths

Time synchronization between the devices.

Topology recognition is important.

Operation of the multiple devices is the key.

On demand measurementTo respond for the requests from multiple users without high burden, measurement should be short period (Ex. 10 second or 1 minute graphs)

Gigabit ethernet will be ready soon, but SONET/SDH…Development with OC48/OC192MON equipping DAG card sold by Endace.

Page 22: Global Observatory for Advanced Network Operations APAN Hawaii meeting, January 2004 Yoshinori Kitatsuji, Jin Tanaka & Kazunori Konishi APAN Tokyo XP

Architecture of High Resolution Flow Base Traffic Grapher

RR

RR

RR

RR

Performance machine with original TCP stack(iperf, netperf, etc)

Host A

Host B

Measurement device

1. At beginning, Manager program to control measurement devices collects topology data by SNMP and routing information base from routers by OSPF SLAs. Manager recognizes topology tree.

2. User requests the measurement to Manager via WEB interface by giving start and end of path, port numbers, start time and period.

3. Manager analyzes topology from path information and request Measurement devices to start to measure flows at the specified time.

4. Measurement Devices responds traffic flow graph to Manager.

5. User see the traffic graph page constructed by Manager, via the WEB browser.2. Request Traffic

graphes via http

1. Collect data to recognize topology tree at Scheduler

3. & 4. Request Measuremet device to measure and respond traffic graph

5. Show graphes on browser

Iperf test pathIperf test path