41
P2P Search COP6731 Advanced Database Systems

P2P Search COP6731 Advanced Database Systems. P2P Computing Powerful personal computer Share computing resources P2P Computing Advantages: Shared

Embed Size (px)

DESCRIPTION

P2P Search Techniques  Centralized P2P systems e.g. Napster,  Decentralized & unstructured P2P systems e.g. Gnutella  Hybrid - partially decentralized e.g., Freenet  Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems

Citation preview

Page 1: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

P2P Search

COP6731Advanced Database Systems

Page 2: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

P2P Computing

Powerful personal computer Share computing resources P2P Computing

Advantages: Shared infrastructure costs Highly scalable No SPOF censorship-resistance

Name
Page 3: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

P2P Search Techniques

Centralized P2P systems e.g. Napster, SETI@home

Decentralized & unstructured P2P systems e.g. Gnutella

Hybrid - partially decentralized e.g., Freenet

Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems

Page 4: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Napster

MP3 file sharing with a centralized catalog

Peers hold files Napster Inc’s servers hold catalog File transfer is P2P, using a

proprietary protocol

Page 5: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Central Napster server(xyz.mp3, 192.1.2.3)

192.1.2.3

Napster: Publish a File

Users upload their IP address and music titles they wish to share

Page 6: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Users search for peers to download desired files

xyz.mp3 ?

192.1.2.3192.1.2.3

Napster: Query for a File

Central Napster server

Page 7: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

File transfer is P2P, using a proprietary protocol

192.1.2.3

xyz.mp3 ?

Napster: Transfer Requested File

Central Napster server

Page 8: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Disadvantage of Centralized Directory

Performance bottleneck Single point of failure

Can we do it without a directory ?

Page 9: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Gnutella

No catalog Pings network to locate Gnutella

peers File requests are broadcast to peers

Flooding or breadth-first research When provider is located, the file is

transferred via HTTP

Page 10: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

xyz.mp3 ?

Gnutella: Issue a Request

Page 11: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Gnutella: Flood the Request

Page 12: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

xyz.mp3

Gnutella: Reply with the File

Page 13: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Gnutella - Disadvantages

Network flooding - unnecessary network traffic

Using TTL - some files might not be found

Alternatively, using ultranodes (or supernodes) using depth-first search, i.e., Freenet

Page 14: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Morpheus, Kazaa

Cluster

Cluster

Cluster

CenterIndex forits cluster

C

B

A

F

E

D

I

H

G

Query: “W

ho has

file X”

Reply: “Peer H

has

file X”

Download file X from Peer H

SupernodeLayer

Page 15: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Using Ultranodes

Queries flood only the network of ultranodes

Other peer nodes shielded from query traffic

Combine the benefits of centralized and decentralized search;

Take advantage of the heterogeneity in peer capabilities;

Page 16: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Freenet - Depth-First Search

A

B

D

C

E

Query: “Who has file X”

Download file X from Peer E

Page 17: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Freenet – File not Found

A

B

D

C

E

Download file X from Peer E

F

NOT FOUND !

I HAVE FILE X !

The requested file not found due to a poor routing decision made at peer D

In this case, query backs out of the dead-end, and tries another peer in depth-first manner

Page 18: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Structured P2P Systems DHT-based

Chord / Pastry / Tapestry: hash-based into single dimensional space

CAN: hash-based into multi-dimensional space P-grid: hash-based into virtual binary search tree

Skip-list based Skipgraph / SkipNet

Index Tree-based BATON

Page 19: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

DHT Design Goals

An “overlay” network with: Flexible mapping of keys to physical nodes

Data Independence Small network diameter Small degree (fan-out) Local routing decisions Robustness to churn Routing flexibility Proximity

A “storage” or “memory” mechanism with No guarantees on persistence Maintenance via soft state

Page 20: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Metrics

Searching/Lookup Number of hops in searching Number of messages Database related metrics:

Total disk I/O Response Time Accuracy

Maintenance Number of hops Number of messages

Page 21: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

How to Bound Search Space ?

Network

Work onplacement!

Page 22: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Basic Idea - Hashing

Hash key

Object “y”

Objects have hash keys

Peer “x”Peer nodes also have hash keys in the same hash space

P2P Network

y xH(y) H(x)

Join (H(x))Publish (H(y))

Place object to the peer with closest hash keys

Page 23: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Viewed as a Distributed Hash Table

Hash table0 2128-1

Peernodes

Each is responsible for a range of the hash table,according to the peer hash key

Objects are placed in the peer with the closest keyNote thatpeers areInternetedges

Internet

Page 24: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

How to Find an Object?

Hashtable

0 2128-1

Peernode

Simplest idea:Everyone knows everyone else!

one hop tofind the objectWant to keep only

a few entries!

Page 25: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Using Distributed Hash Table (DHT) A peer only needs to know its logical

neighbors Search based on multihop routing

Hashtable

0 2128-1

Peernode

Page 26: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

Page 27: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

Page 28: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

Operation: take key as input; route messages to node holding key

Page 29: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

insert(K1,V1)

Operation: take key as input; route messages to node holding key

Page 30: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

Operation: take key as input; route messages to node holding key

insert(K1,V1)

Page 31: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

(K1,V1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

Operation: take key as input; route messages to node holding key

Page 32: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

retrieve (K1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: get()

Operation: take key as input; route messages to node holding key

Page 33: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

retrieve (K1)

Page 34: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

CAN – Content Addressable Network

Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone

Each peer knows the neighbors of its zone

Random assignment of peers to zones at startup

Dimensional-ordered multihop routing

Page 35: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

CAN: Object Publishing

node I::publish(K,V) I

Page 36: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

(1) a = hx(K)

CAN: Object Publishingx = a

node I::publish(K,V) I

Page 37: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

(1) a = hx(K) b = hy(K)

CAN: Object Publishingx = a

y = b

node I::publish(K,V) I

Page 38: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

(1) a = hx(K) b = hy(K)

CAN: Object Publishing

(2) route (K,V) -> J

node I::publish(K,V) I

J

Page 39: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

(2) route (K,V) -> J

(3) J stores (K,V)

CAN: Object Publishing

(K,V)

node I::publish(K,V) I

(1) a = hx(K) b = hy(K)

J

Page 40: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

(2) route “retrieve(K)” to J that is in charge of (a,b)

(K,V)(1) a = hx(K) b = hy(K)

node I::retrieve(K)

I

CAN: Object Retrieval

J

Page 41: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared

Some Research Topics

Content-based Image Retrieval in P2P

Location Management in P2P Security Considerations for DHT P2P Backup Wireless P2P