View
227
Download
4
Category
Preview:
Citation preview
1
點對點通訊協定及應用種子教師研習課程Part IV: Infrastructure of P2P & P2P in Mob
ile Environment
曹孝櫟 交大資工
2
Outline
Structure of this course Introduction of P2P Infrastructure of P2P P2P in Mobile Environment
3
課程網要
Introduction to P2P Introduction to P2P – Introduction (what, why)Introduction (what, why)– Survey of P2P networks (commercial, freeware, research)Survey of P2P networks (commercial, freeware, research)– Issues of P2P (infrastructure, search, routing, download)Issues of P2P (infrastructure, search, routing, download)
Infrastructure of P2P Infrastructure of P2P – Centralized (Napster)Centralized (Napster)– Unstructured (Gnutella)Unstructured (Gnutella)– Structured (Chord, CAN, Pastry)Structured (Chord, CAN, Pastry)– Hybrid (unstructured + structured, KaZaa, BT)Hybrid (unstructured + structured, KaZaa, BT)– HierarchicalHierarchical
Performance issues of P2P (improvement of P2P performance)– Neighbor selection– Infrastructure maintenance overhead– Routing (proximity)– Searching (keyword, semantic content search)– Download– Mobile issuesMobile issues– Replication (cache)– Hot spot and Free rider issues
Applications of P2P– File sharing– Storage– Video Streaming (Live, VOD, P2PTV)– VoIP over P2P (skype, P2PSIP)– Wireless (structured or MANET)Wireless (structured or MANET)– Semantic content search
Performance analysis of P2P– Simulation tool: PeerSim– Analytical models
Implementation of P2P – JXTA
4
Part I: Introduction to P2P
5
Why P2P?
“Sometimes”, we prefer P2P While there is an infrastructure Because it is natural and convenient
We share resources, information, … We help each other …
Although it is Less reliable Less secure Less efficiency
6
Why P2P?
What do you mean “an infrastructure”? Thanks to IC/IT technologies Thanks to broadband access technologies
and Internet infrastructure Thanks to P2P mechanisms
7
Why P2P?
P2P today P2P has dominated Internet traffic
Source: CacheLogic.
In 2006, more than 60% of Internet traffic
8
What’re P2P Technologies?
What we can share Share resource directory
File directory, phone books, … Share information
Messages, presence, NAT, … Share content
File, MP3, stored and live video, … Share computation power
Grid, … Share physical devices
Virtual hard disk, …
9
What’re P2P Technologies? (Cont.) How we could share information and
resources Search Retrieval
No one technology fits all applications Really depends on
Characteristics of resources (size, real-time, stored/live, …)
User behaviors (community, access pattern, …)
10
What’re P2P Technologies? (Cont.) Operations in P2P systems consist of
three phases Peer discovery (bootstrap)
Well-known nodes, cached peers, broadcasting, …
Resource discovery (search) Locate a resource given its identifier
Communication or data transfer Direct communication, NAT/Firewall traversal,
ALM
11
Part II: Infrastructure of P2P
12
Part II-1: Centralized P2P
13
Centralized Index Model (1/3) Utilize a central directory for object location,
ID assignment, etc. For file-sharing P2P, location inquiry form
central servers then downloaded directly from peers
14
Centralized Index Model (2/3)
Centralized Repository
R1 upload index 2 query
A B
3 download
15
Centralized Index Model (3/3) Benefits
Simplicity Efficient search Limited bandwidth usage
Drawbacks Unreliable (single point of failure) Performance bottleneck Scalability limits (scale the central directory) Vulnerable to DoS attacks Copyright infringement
16
Case Study - Napster
Why Difficult to find and download music over
networks Want to share music with friends
How A program that allowed computer users to share
and swap files, specifically music, through a centralized file server
Napster – the first popular P2P file-sharing application
17
Napster: System Overview
A large cluster of dedicated central servers maintain an index of shared files
The centralized servers also monitor the state of each peer and keep track of metadata The metadata is returned with the results of a
query Each peer maintains a connection to one of
the central servers
18
Napster Operation (1/4)
File list and IP addressis uploaded
napster.com
19
Napster Operation (2/4)
napster.com
result
query
User requests search at server
20
Napster Operation (3/4)
napster.com
User pings hosts that apparently have data to look for best transfer rate
21
Napster Operation (4/4)
napster.com
User choose to initiate a file exchange directly
22
Napster: Summary
Napster is not a pure P2P system but it was the first one that raised important issues to the P2P community
Hybrid decentralized unstructured system File transfer is decentralized but locating content is centralized Combination of client/server and P2P approaches
Napster protocol is proprietary Stanford University senior David Weekly posted the protocol in 2
000 Napster requested that he remove it, but Weekly created the Op
enNap project instead Napster introduces two major problems
Unreliable: central indexing server represents a single point of failure
Legal responsibility for music files sharing
23
Part II-2: Unstructured P2P
24
Outline
Introduction Flooded requests model Case study - Gnutella Supernode model Case study - Kazaa
25
Introduction
Blindly flood a query to the network among peers or among supernodes
Flooded requests model Case study: Gnutella
Supernode model (hierarchical) Case study: FastTrack and Kazaa
26
Flooded Requests Model (1/2) Each request is flooded (broadcast) to directly
connected peers, which then flood their neighbors Until the request is answered or with a certain scope (TTL
limit) Benefits
Highly decentralized Reliability and fault-tolerance
Drawbacks Excessive query traffic Not scalable Fail to find content that is actually in the system
27
Flooded Requests Model (2/2)
Request!
…found
28
Case Study - Gnutella
Napster introduces two major problems Unreliable: central indexing server represents a
single point of failure Legal responsibility for music files sharing
Gnutella Fully distributed peer-to-peer protocol
Reliability and fault-tolerance properties Flooding raises questions of cost and scalability
29
Gnutella: System Overview
Open, decentralized P2P search protocol Build at the application level a virtual network
with its own routing mechanisms Peers self-organize into an application-level
mesh Each peer initiates a controlled flooding
through the network by sending a query packet to all of its neighbors TTL is decremented on each hop
30
Gnutella: The Protocol
Peer discovery IRC (Internet Relay Chat), web pages, Ping-Pong messag
es Send GNUTELLA CONNECT to one known address node
then wait for GNUTELLA OK Discovery of peers and searching for files are imple
mented by passing five descriptors (message types) between nodes Ping, Pong, Query, QueryHit, Push
The direct file downloaded via a HTTP GET request
31
Gnutella: Protocol Message TypesType Description Contained Information
Ping Announce availability and probe for other servents
None
Pong Response to a Ping IP address and port# of responding servent; number and total kb of file shared
Query Search request Minimum network bandwidth of responding servent; search criteria
QueryHit Returned by servents that have the requested file
IP address, port# and network bandwidth of responding servent; number of results and result set
Push File downloaded requests for firewalled servents
Servent identifier; index of requested file; IP address and port to send file to
32
Gnutella: Connect operation
CONNECT
OK
33
Gnutella: Discovery operation
Ping
Ping
Ping
Ping
Pong
Pong
Pong
Pong
34
Example: Ping/Pong routing
Source: http://rfc-gnutella.sourceforge.net/developer/stable/index.html
35
Gnutella: Search & Transfer operations
36
Example: Query/QueryHit/Push routing
Source: http://rfc-gnutella.sourceforge.net/developer/stable/index.html
37
Gnutella: Summary
Fully distributed peer-to-peer protocol Reliability and fault-tolerance properties Flooding raises questions of cost and scalability
The current Gnutella protocol can not scale beyond a network size of a few thousand nodes without becoming fragmented
M. Portmann, P. Sookavatana, S. Ardon, and A. Seneviratne, “The cost of peer discovery and searching in the Gnutella peer-to-peer file sharing protocol,” in Proc. of ICON’01, Vol. 1, pp. 263-268, 2001.
38
Supernode Model (1/3)
Supernode acts both as a local central index for files shared by local peers and as an equal in a network of supernodes
Each peer is either designated as a supernode or assigned to a supernode
Supernodes are equal in search; all peers are equal in download
Examples: FastTrack and Kazaa
39
Supernode Model (2/3)
supernode
peer node
40
Supernode Model (3/3)
Benefits No single point of failure
Drawbacks Supernode may become overloaded or been
attacked Copyright infringement
41
Part II-3: Structured P2P
42
Outline
Document routing model Case studies - Chord
43
Document Routing Model (1/4) Each peer is assigned a random or hashed
ID and knows a given number of peers An ID is assigned to every shared document
based on a hash function A request will go to the peer with the ID most
similar to the document ID
44
Document Routing Model (2/4)
ID 1000 ID 0200
ID 0100
0005 …0008 at ID0050xxxx …
ID 0050
lookupFile ID=h(data)=0008
45
Document Routing Model (3/4) Benefits
Scalability: more efficient searching Logarithmic bounds to locate a document Fault tolerance
Drawbacks Routing table maintenance Network partitioning may cause an islanding
problem
46
Document Routing Model (4/4) Hash table
Data structure that efficiently map keys onto values Distributed hash table (DHT)
Distributed, Internet-scale, hash table Lookup, insertion and deletion of (key, value) pairs Only support exact-match search, rather than keyword sea
rch DHT-based P2P systems
CAN: S. Ratnasamy et al., UC Berkeley, 2001 Chord: I. Stoica et al., MIT and Berkeley, 2001 Tapestry: Ben Y. Zhao et al., UC Berkeley, 2001
47
Case Study - Chord
Chord is a distributed lookup protocol that tries to efficiently find the location of the node that stores a desired data
Just one operation: given a key, it maps the key onto a nodes
In an N-node network, each node maintains information about only O(logN) other nodes, and a lookup requires only O(logN) messages
48
Chord: System Overview
m-bit key/node identifier using SHA-1 hash function (m must be large enough)
These identifiers are ordered on an identifier circle modulo 2m
Chord ring: one-dimensional circular key space Key k is assigned to the successor, the first node
whose identifier is equal to or follows k in the identifier space
Each node maintains A routing table with (at most) m entries, called finger table The previous node on the identifier circle, called
predecessor
49
Example: Chord Ring with m=6
An identifier circle consisting of 10 nodes storing 5 keys
successor(K10) is N14
successor(K38) is N38
successor(K54) is N56
predecessor(N14) is N8
50
The Finger Table
The ith entry at node n contains the identity of the first node s that succeeds n by at least 2i-1 on the identifier circle, where 1 ≤ i ≤ m
ith finger s = successor(n+2i-1) modulo 2m
A table entry includes both Chord identifier and IP address of the relevant node
The first finger of node n is its immediate successor which also called the successor
51
Example: The Finger Table
N42 is the first node that succeeds (8+26-1) mod 26=40
N14 is the first node that succeeds (8+21-1) mod 26=9
Finger table entries point to the first node greater than or equal to a distance 2i-1 away from the node, for 1≤i≤m , modulo 2m
52
Simple Key Location
Each node only knows how to contact its current successor node on the identifier circle
Lookup uses a number of message linear in the number of nodes
Example Path taken by a query
form node 8 for key 54
53
Chord Operation
Construct the Chord ring Decide the m number Map identifiers (0~2m-1) to nodes or keys Identifiers are ordered on an identifier circle modulo 2m
The first node joins the Chord ring Node joins Key Lookup: determine the successor of the key Chord maintenance
54
Scalable Key Location
Key lookup using the finger table
Node n calls find_successor(id) to find the successor node of an identifier id
The closer n’ is to id, the more it will know about the identifier circle in the region of id
Theorem: the number of nodes that must be contacted to find a successor in an N-node network is O(logN)
id falls between n and its successor
n searches finger table for the node n’ whose ID most immediately precedes id
55
Example: Lookup for Key 54
N8+1 N14
N8+2 N14
N8+4 N14
N8+8 N21
N8+16 N32
N8+32 N42
54 - 8 = 46 = 101110two = 0 + 25 + 23 + 22 + 21
+25+23
N42+1 N48
N42+2 N48
N42+4 N48
N42+8 N51
N42+16 N1
N42+32 N14
N51+1 N56
N51+2 N56
N51+4 N56
N51+8 N1
N51+16 N8
N51+32 N21
closest_preceding_node(K54)
closest_preceding_node(K54)
return successor
56
Node Joins
When node n first starts It calls n.join(n’), this fun
ction ask n’ to find the immediate successor of n
or n.create() to create a new Chord network
57
Example: Chord Node Joins
keys less than 14
N3+1 [4, 5) N5
N3+2 [5, 7) N5
N3+4 [7, 11)
N10
N3+8 [11, 3)
N3
N5+1 [6, 7) N10
N5+2 [7, 9) N10
N5+4 [9, 13)
N10
N5+8 [13, 5)
N3N10+1 [11, 12)
N3
N10+2 [12, 14)
N3
N10+4 [14, 2) N3
N10+8 [2, 10) N3
0 12
3
4
5
6
7
8910
11
12
13
14
15N14+1 [15, 0) N3
N14+2 [0, 2) N3
N14+4 [2, 6) N3
N14+8 [6, 14) N3
N14
N14
N14
N14
N14
58
Node Stabilization
Must ensure each node’s successor pointer is up to date
Each node runs a stabilization protocol periodically in the background and which updates Chord’s finger tables and successor pointers
59
Chord: Summary
Chord is Efficiency: O(logn) messages per lookup Scalability: O(logn) states per node Robustness: surviving massive failures
Chord consists of Consistent hashing Small routing tables size: O(logn) Fast join/leave protocol
60
Part II-4: Hierarchical P2P
61
Outline
Introduction Hierarchical DHT-based P2P
62
Introduction
Traditional P2P systems organize peers into a flat overlay network
We will describe some hierarchical P2P systems
63
Hierarchical DHT-based P2P
64
Hierarchical DHT-based P2P
In hierarchical DHT, peers are organized into groups, and each group has its autonomous intra-group overlay network and lookup service
Advantages compared to the flat overlay networks Exploiting heterogeneous peers: more stable peers in
top-level overlay Transparency: key moved, change intra-group lookup
algorithm Faster lookup time: number of groups << total number
of peers Less messages in the wide-area: most overlay
reconstruction messages happen inside groups
65
Hierarchical DHT Framework (1/3)
66
Hierarchical DHT Framework (2/3) Hierarchical lookup service
the querying peer sends query message to one of the superpeers in its group
the top-level overlay first determines the group responsible for the key
the responsible group then uses its intra-group overlay to determine the specific peer that is responsible for the key
Intra-group lookup at the intra-group level, the
groups can use different overlays
67
Part III: Performance Issues of P2P
68
Part III-9: Mobile Issues
69
Introduction
P2P systems in mobile environments encounter several problems Heterogeneous node capability Limitations
Wireless bandwidth and battery power Churn
Ordinary churn: due to peer joining, departure and failure
Mobility churn: churns due to peer mobility
70
Performance Comparison of DHT Under Churn
71
Performance Comparison of DHT Under Churn Observation
Protocols for DHTs incorporate features to achieve low latency in the face of churn, continuous changes in membership
Most previous work evaluates protocols on static networks This paper
Formulate a unified framework for evaluating performance and cost
Analyze the effects of DHT parameters under churn
72
Framework
A cost versus performance framework
cost: the average number of bytes of message sent
performance: the average lookup latency
Each protocol has many parameters that affect cost and performance
no single best combination best points are on the convex
hull Which parameter settings
cause performance to be on the convex hull?
73
Chord (1/2)
Chord identifiers are structured in an identifier circle A key k is assigned to k’s successor Chord can route either iteratively or recursively A Chord node will
Periodically ping all its fingers to check their liveness Separately stabilize successor list because it is critical for
correctness but is much cheaper than finger stabilization
74
Chord (2/2)
Successor stabilization interval affects success rate 72 sec for above 99%
Finger stabilization interval only affects performance (faster rates result in lower lookup latency)
No single best base value
75
Summary
Base and stabilization interval have the most effect on DHT performance under churn, and affect different protocols in different ways
With the well-tuned parameter, all four DHTs have similar overall performance
76
Part IV: Applications of P2P
77
Part IV-6: Wireless
78
P2P over MANET
79
P2P over MANET
What is Ad Hoc Network No infrastructure: no base stations, no fixed network
infrastructure What is Mobile Ad Hoc Network (MANET)
Create and maintain networks without central entities Two mobile nodes communicate with each other through
intermediate nodes Multi-hop wireless communication Need support of dynamic routing protocols (network layer)
P2P protocols usually are not aware of the underlying MANET Additional and unnecessary network traffic
80
Routing in Peer-to-Peer Networks Central indexing server
Napster Flat routing (distributed flooding)
Gnutella protocol v0.4 Hierarchical routing (SuperNode)
KaZaA, FastTrack Structured P2P employ DHT
Chord, Pastry, Tapestry, CAN
81
Routing in Ad Hoc Networks (1/2) Proactive routing protocol
Table driven DSDV, CGSR
Reactive routing protocol On demand DSR, AODV
Hybrid Zone Routing Protocol
82
Routing in Ad Hoc Networks (1/2) Proactive vs. Reactive Routing
Proactive Routing Protocol continuously evaluate the routes try to maintain consistent, up-to-date routing information
when a route is needed, one may be ready immediately topology updates are broadcasted immediately to all
other nodes in the network Reactive Routing Protocol
try to find a route to the destination only when it is necessary (on-demand)
flood a route request through the network
83
Similarities between MANET and P2P Networks No central entities Flat network topology
except SuperNodes or cluster heads (CGSR) Frequently changing topology
frequent log-on and log-offs terminal mobility in wireless networks
Network log on the IP address or frequency range of the portal must be kn
own Flooding or broadcasting raises the scalability probl
em
84
Differences between MANET and P2P Networks P2P and MANET operate on different network
layers Network structure
P2P overlay network is separated from the physical network
physical and logical network structures corresponded Connection between two nodes
wired and direct link at the P2P layer wireless and indirect link over intermediate nodes
Available resources fixed P2P terminals have nearly unlimited resources nodes in MANET are mobile and constrained by limited
power and bandwidth
85
P2P Searching over MANET
Introduce five route-discovering approaches to integrate broadcast-based and DHT-based protocols file request message at application layer network routing message at network layer
86
A
B
Shortest path
Approach 1: Broadcast over Broadcast
• Broadcast at P2P overlay
• Broadcast at network layer
Overlay link
Physical link
• Easy implementation• Complexity: O(n2)
scalability problem (double broadcasts),low energy efficiency, not shortest path
87
Approach 2: Broadcast
• No P2P overlay• Broadcast at network layer
• Easy implementation• Shortest path obtained• Complexity: O(n)
Actual path
A
B
Shortest path
Physical link
still flooding causes heavy burden on bandwidth and power supply, cannot work for large networks
88
Approach 3: DHT over Broadcast
• DHT at P2P overlay• Broadcast at network layer
• No broadcast at P2P overlay• Complexity: O(nlogn)
A
B
Shortest path
Overlay link
Physical link• implementation complexity (routing table)O(n) to find the route between every pair,O(logn) peers lookup in P2P overlay
89
A
B
Shortest physical path
Approach 4: DHT over DHT
• DHT at P2P overlay• DHT at network layer
• No broadcast at P2P overlay• Complexity: O((logn)2)
Shortest path at overlay
Actual physical path• implementation complexity at both networks• scalability improvement• better complexity and energy-efficient than previous
90
Approach 5: DHT
• No P2P overlay• DHT at network layer
• No broadcast at P2P overlay• Complexity: O(logn)
Shortest path at network layer
Actual path
A
B
Shortest physical path
• reduce the implementation complexityin DHT over DHT
91
Comparison of Approaches
ApproachBroadcast
over broadcast
Broadcast
DHT
over
broadcast
DHT
over
DHT
DHT
Routing O(n2) O(n) O(nlogn) O((logn)2) O(logn)
Scalability Bad Bad Bad Good Excellent
Maintenance Low Low Medium High Medium
Energy efficiency
Low Low Low Medium Medium
The Shortest Path
No Yes No No No
Cross-layer No Yes No No Yes
1 < O(logn) < O((logn)2) < O(n1/2) < O(n) < O(nlogn) < O(n2)
92
Summary
Cross-layer design in Broadcast and DHT is better
The Broadcast approach can be easily implemented for small size MANET
The DHT approach is scalable to large network But its routing table and neighborhood table need
to be carefully maintained
93
Structured P2P Lookup Service in Mobile Networks
94
Structured P2P Lookup Service in Mobile Networks Hybrid Chord Protocol (HCP) solves the frequ
ent joins and leaves of nodes, and allows defining special interest groups
95
Architecture of HCP (1/3)
Static node high available, large capacity and quasi-permanent nodes become static nodes based on their uptime and on
their hardware and networking requirements all object references (info profiles) are stored at static
nodes Temporary node
do not store object references when a temporary node joins, all object references
according to the keys it is responsible for remain with its closest static successor
96
Architecture of HCP (2/3)
Context spaces every shared object is
described by an info profile every keyword indicates a
relevant context space for this object
each static node stores a list for every keyword it is responsible for
in this list all info profiles that contain that keyword are collected
these lists establish context spaces
The sharing host sends this info profile to those three static nodes
The first static successors of the hash values of these keywords
97
Architecture of HCP (3/3)
98
HCP Operations (1/2)
Node join determine position on HCP ring by hashing e.g. IP address set predecessor, successor-list and finger table entries
according to conventional Chord algorithm set static-successor-list (its s closest static successors) new static nodes send a message to their closest static
successor to get all info profiles that are responsible for Node leave
deregister from the network inform all static nodes that store info profiles owned by
leaving node leaving static nodes transfer all stored info profiles to their
closest static successor
99
HCP Operations (2/2)
Key insert shared objects are described by info profiles hash all keywords for every info profile search for the responsible nodes for this info profile contact the ascertained nodes to ask for their first static
successor and send the info profile to these static nodes Key lookup
build the intersection of all context spaces that are relevant for the query
if temporary nodes receive a request, they forward request to their closest static successor
100
Summary of HCP
HCP reduces the signaling traffic of shifting object references But the same maintenance traffic
The traffic load reduction is proportional to (1/) = T/TStatic
Where each node is assumed to leave the network after an average session length T, xn static nodes with 0<x<1 and TStatic=T
Recommended