Upload
brittany-wiggins
View
216
Download
0
Embed Size (px)
Citation preview
A P2P-Based Architecture for A P2P-Based Architecture for Secure Software Delivery Using Secure Software Delivery Using Volunteer AssistanceVolunteer AssistancePurvi Shah, Jehan-François Pâris, Jeffrey Morgan and John SchettinoIEEE 8th International Conference on Peer-to-Peer Computing (P2P'08)
69621014 劉家賢69621020 黃義凱
OutlineOutlineIntroductionTrace analysisProposed mechanismEvaluationConclusions
2
OutlineOutlineIntroductionTrace analysisProposed mechanismEvaluationConclusions
3
IntroductionIntroductionA content delivery infrastructure
distributing and maintaining software packages in a large organization
Combines a conventional server and P2P technology
A novel load balancing mechanism is included in the system
4
Introduction (Cont.)Introduction (Cont.)Main contributions:
◦A trace-based analysis is used to find general principles and properties to devise a better system
◦A scalable CDN(Content Delivery Network) architecture for delivering software using volunteer nodes
◦An efficient mechanism for load balancing in the proposed design
5
OutlineOutlineIntroductionTrace analysisProposed mechanismEvaluationConclusions
6
Trace analysisTrace analysisTen days worth of logs
associated with a software delivery system supporting various Linux installations and distributing their updates in a corporate environment
7
Central software Central software repositoryrepository
8
Central software repository Central software repository (Cont.)(Cont.)The repository served system
software and updates for ten Linux distributions
9
Access patternsAccess patternsImage(.iso) downloads comprise
71% of total server workloadAs seen in Fig. 4, almost 2/3 of
the files uploaded by the server are smaller than 256KB
10
Access patterns (Cont.)Access patterns (Cont.)Flash crowds shows up at the
time of new package releasesMore than 1/3 of all customers
requests are received before the end of the day following the package release
11
Access patterns (Cont.)Access patterns (Cont.)The number drops by the third
day and is further reduced one week after the release◦Increasing the server capacity to
control flash crowds is not a practical solution
12
Number of identical filesNumber of identical filesThey found out 17% of files
larger than 1MB were identical and differed from other files only in name◦17% of these identical files were
source-code package by more than one Linux distribution
13
SimilaritySimilarityThose files may exist among
different versions and variants of the same source-code package
The majority of packages in the repository are compressed files but compressed in different tools, e.g. gzip and bzip
The lack of standard has some very favorable effects on exploiting file similarity using the tools such as rsync
14
Similarity(Cont.)Similarity(Cont.)Fig. 8 suggests that considerable
similarity exists among the uncompressed versions of the same software
They also observed that software has variants, that is, different packages for clients with different architectures and operating systems
Utilizing the similarity among these variants would greatly benefit any P2P solution as peers could find several more potential neighbors
15
Similarity(Cont.)Similarity(Cont.)
16
Synchronization workloadSynchronization workloadThe various departments within
the enterprise manage around forty edge nodes that maintain complete or partial mirrors of the software repository for serving updates to a small set of machines
The nodes spent average 1.81 hours daily synchronizing its repository with the server
17
Synchronization workload Synchronization workload (Cont.)(Cont.)
18
OutlineOutlineIntroductionTrace analysisProposed mechanismEvaluationConclusions
19
Software synchronization by Software synchronization by the edge nodesthe edge nodesSynchronization tool rsync + P2P =
PrsyncThis integration was feasible because
all edge nodes use the same rsync tool
Using BitTorrent among the edge nodes would allow us to further improve the efficiency and scalability of the synchronization system◦With PRsync, server now can reduce the
processing on the server repository for multiple edge nodes
20
Software synchronization by Software synchronization by the edge nodes (Cont.)the edge nodes (Cont.)PRsync separates content
delivery from synchronization:◦The first task is shared by the edge
nodes and the server while synchronization remains the sole responsibility of the server
◦This separation removes redundant processing at the server and permits the use of P2P protocol
21
Software delivery to the Software delivery to the customerscustomers
Since we are providing a delivery service, we shouldalso avoid consuming customer bandwidth if there is analternative way to obtain the required bandwidth
Download Tools
Software delivery to the Software delivery to the customerscustomers
The resultant system is not a pure P2P system, but aclient-server system based on P2P technology
System Design
Software delivery to the Software delivery to the customerscustomers
First step is to Collect information on the volunteer nodes and to find out which volunteer nodes have which files
Added therefore feedbacks that measure number of active connections maintained by each volunteer peer and will be retrieved at each tracker update interval connections
Tracker construct
Software delivery to the Software delivery to the customerscustomers
Load balance◦ Feedback-controlled load balancing
mechanism
◦ Peers to add information on their current workload to the messages they already send to the tracker
◦ Can identify the volunteer nodes that are currently overloaded and redirect fewer customers to such volunteer nodes
Major Issue
Software delivery to the Software delivery to the customerscustomers
Synchronization◦ Use either dedicated connections or a
private high-speed networks
Security◦ MD4 checksums
Other Issue
OutlineOutlineIntroductionTrace analysisProposed mechanismEvaluationConclusions
27
Software delivery to the Software delivery to the customerscustomers
JAVA based discrete-event General P2P Simulator (GPS)
Idealized performance of TCP
Request counting algorithm provided by the Apache load balancer
Simulation Environment
Software delivery to the Software delivery to the customerscustomers
EvaluationFour volunteer had a much high workload and the other four volunteer had a much light workload
Software delivery to the Software delivery to the customerscustomers
EvaluationSome unevenness in the response times
Software delivery to the Software delivery to the customerscustomers
Evaluation
Software delivery to the Software delivery to the customerscustomers
Evaluation
Software delivery to the Software delivery to the customerscustomers
Evaluation
Software delivery to the Software delivery to the customerscustomers
Evaluation
OutlineOutlineIntroductionTrace analysisProposed mechanismEvaluationConclusions
35
ConclusionsConclusionsOur proposal consists of
supplementing a conventional server with volunteer nodes that expand its scalability.
Our system includes a novel load balancing mechanism that considers both the synchronization workload and the customer-generated workload of the volunteer nodes.
Future worksFuture worksWe plan to study content placement
policies that can handle volatile volunteer nodes.
We plan to take into account the round trip time as an additional metric when performing load balancing.
As the volunteer nodes could be globally distributed, it is desirable to select the volunteer nodes that are near to the customer.
Future worksFuture worksWe have considered exploiting
similarity between different versions of a package.
Q & AQ & A
39