Upload
rohfollower
View
216
Download
0
Embed Size (px)
Citation preview
8/16/2019 PP16-lec5-arch4.ppt
1/27
1.1
Parallel Processingsp2016
lec#5
Dr M Shamim Baig
8/16/2019 PP16-lec5-arch4.ppt
2/27
Explicitly Parallel Processorarchitectures:
Task-level Parallelism
1.2
8/16/2019 PP16-lec5-arch4.ppt
3/27
1.3
Elements of (Explicit) Parallel
Architectures
• Processor configurations:
Instruction/Data Stream ase!
• "emor $onfigurations: % P&sical ' (ogical ase!
% )ccess%Dela ase!
• Inter%processor communication: %$ommunication%Interface !esign
% Data *+c&ange/ Snc& approac&
8/16/2019 PP16-lec5-arch4.ppt
4/27
1.,
Parallel Platforms:"emor -P&sical s (ogical $onfigurations
• P&sical s (ogical "emor $onfig
P&sical "emor config -S" D" $S"
(ogical )!!ress Space config -S)S S)S $ominations
• $S" S)S -S"P4 ")
• D" S)S
-DS"4 ")• D" S)D -"ulticomputer/$lusters
8/16/2019 PP16-lec5-arch4.ppt
5/27
1.5
S&are! memor -S" "ultiprocessor
• It is important to note !ifference eteenS&are! "emor ' S&are! )!!ress Space
• 7ormer is p&sical memor config &ilelater is (ogical memor a!!ress ie for
program.• It is possile to proi!e S&are! )!!ress Space using a p&sicall !istriute!memor.
• S"%multiprocessors sstems are S)S%configusing p&sical memor configuration
either as C S" or as -D" DS"
8/16/2019 PP16-lec5-arch4.ppt
6/27
1.6
UMA vs NUMA
• S"%multiprocessors are further categorized ase! onmemor access !ela as ") -uniform memor
access ' ") -non uniform memor access
• ") sstem is ase! on -C S" S)S config
&ere eac& processor &as same !ela for
accessing an memor location
• ") sstem is ase! on -D"S)S 8 DS"
config &ere a processor ma &ae !ifferent !ela for accessing !ifferent memor location.
8/16/2019 PP16-lec5-arch4.ppt
7/271.9
UMA & NUMA Arch Block ia!rams
"ypical share#$a##ress$space architectures: (a) Uniform$memory access share#$a##ress$space
computer% () Uniform$memory$access share#$a##ress$space computer 'ith caches an# memories%(c) Non$uniform$memory$access share#$a##ress$space computer 'ith local memory only
"
"
"
") -$S" S)S ") -D" S)S8 DS"
ot& are S"%
multiprocessors
!iffering in"emor )ccess
Dela format
8/16/2019 PP16-lec5-arch4.ppt
8/271.;
Simplistic ie of a small s&are! memor
Smmetric "ulti Processor -S"P:
-$S" S)S us
*+amples:
• Dual Pentiums
•
8/16/2019 PP16-lec5-arch4.ppt
9/271.=
interface
I/> us
Processor/memorus
S&are! memor
8/16/2019 PP16-lec5-arch4.ppt
10/271.10
"ulticomputer -$luster Platform$omplete computers P -$ P* ' D" it& S)S '
interconnection netor? interface at I/> us leel.
Processor
Interconnectionnetor?
(ocal
$omputers
"essages
memor
• @&ese platforms comprise of a set of processors
an! t&eir on -e+clusie/ !istriute! memor
• Instances of suc& a ie come naturall from
non%s&are!%a!!ress space -S)S
multicomputers e.g clustere! or?stations
8/16/2019 PP16-lec5-arch4.ppt
11/271.11
Data *+c&ange/Snc& )pproac&es:S&are! !ata s "essage%Passing
• @&ere are to primar approac&es of
!ata e+c&ange/snc& in parallel sstems
S&are! "emor "o!el
"essage%Passing "o!el
• S"%multiprocessors use S&are!%Data
approac& for !ata e+c&ange/snc&.
• "ulticomputers -$lusters use "essage%Passing approac& for !ata e+c&ange/
snc&.
8/16/2019 PP16-lec5-arch4.ppt
12/271.12
• S&are! memor platforms &ae lo comm
oer&ea! can support loer grain leels
&ile message passing platforms &ae more
comm oer&ea! ' t&erefore are more suite!
for coarse grain leels
• S" "ultiprocessors are faster ut &ae poor
scalailit• "essage passing "ulticomputer platforms
are sloer ut &ae &ig&er scalailit.
Data*+c&ange/Snc& Platforms:
S&are!%memor s "essage%Passing
8/16/2019 PP16-lec5-arch4.ppt
13/271.13
$lusters as a $omputing Platform
• $lusters: ) netor? of computers ecame aer attractie alternatie to e+pensie
supercomputers used for &ig&%performance
computing in earl 1==0s
• Seeral earl proAects notal:%
)S) eoulf proAect
er?ele >B -netor? of or?stations
proAect.
8/16/2019 PP16-lec5-arch4.ppt
14/271.1,
eoulf $lustersC
• ) group of interconnected commo!it computers ac&ieing &ig& performance it& lo cost.
• @picall using commo!it interconnects e.g&ig& spee! *t&ernet ' >S e.g (inu+.
C eoulf comes from name gien )S) o!!ar!Space 7lig&t $enter cluster proAect.
8/16/2019 PP16-lec5-arch4.ppt
15/271.15
)!antages of $luster $omputer:
->B%li?e
• Processing o!es are &ig& performance P$s/
or?stations rea!il aailale at lo cost.
• Interconnection of processing no!es using
&ig& performance ()s/ S)s
• *asil pgra!ale incorporating latest
processors into sstem as t&e become aailale
• *asil scalale to igger ' more poerfulsstems
• *+isting softare can e easil a!apte! for
parallel e+ecution on $luster sstem
8/16/2019 PP16-lec5-arch4.ppt
16/271.16
$luster Interconnects: () s S)
• ()s : fast / its/ 10%its *t&ernet
• S)s: "rinet S calls causing moreprocessing !elas
8/16/2019 PP16-lec5-arch4.ppt
17/271.19
Fector/ )rra Data Processors
• Fector proc:1D%@emporal parallelism using
pipeline )rit& unit ' Fector c&aining 7loat a!! pipe: $omp e+p algn mant a!! mant ormaliGe
• )rra proc:1D% Spatial parallelism using )(%arra as SI"D
• Sstolic )rra: comines 2%D spatial
parallelism it& pipeline! -computationalaefront
loc? Diagrams of Fector/arra ' Sstolic processing
HHHHH
8/16/2019 PP16-lec5-arch4.ppt
18/271.1;
Summar: Parallel Platforms4
"emor ' Interconnect $onfigurations
• "emor $onfig -P&sical s (ogical P&sical "emor config -S" D" $S"
(ogical )!!ress Space config -S)S S)S
$ominations
• $S" S)S -S"P4 ")
• D" S)S -DS"4 ")
• D" S)D -"ulticomputer/$lusters
• Interconnection etor?:o Interface leel: memor us -using "* in S"%
multiprocessors -") ") s I/> us -using I
in multicomputer / cluster
o Data *+c&ange / snc:
S&are! Data mo!el s "essage Passing mo!el
8/16/2019 PP16-lec5-arch4.ppt
19/27
omeor?:
self assesse! prolems
Please mar? our solution ' note
t&e mar?s ou ac&iee!HHHHHHH
1.1=
8/16/2019 PP16-lec5-arch4.ppt
20/27
Prolems:
*+plicit Parallel )rc&itectures
1.20
8/16/2019 PP16-lec5-arch4.ppt
21/271.21
$onsi!er a S"%"ultiprocessor using
32%it EIS$ processors running at 150
"G carries out one instruction percloc? ccle. )ssume 15J !ata%loa! '
10J !ata%store instructions using
s&are! us &aing 2/sec B.$ompute "a+ numer of processors
possile to connect on t&e aoe us
for folloing parallel configurations:%
*+ample Prolem1:
us ase! S"%"ultiprocessor(imit of Parallelism
8/16/2019 PP16-lec5-arch4.ppt
22/271.22
-a S"P -it&out cac&e memor
- S"P it& cac&e memor&aing &it%ratio of =5J '
memor rite%t&roug& polic
-c ") it& program (ocalit
factor 8 ;0 J
*+ample Prolems:
us ase! S"%"ultiprocessor:(imit of ParallelismK.cont’d
8/16/2019 PP16-lec5-arch4.ppt
23/27
1.23
Bus$ase# interconnects (a) 'ith no local caches% () 'ith local memory,caches
+ince much of the #ata accesse# y processors is local to the processor- a
local memory can improve the performance of us$ase# machines Example..
S"P -S" ' S&are! us I
8/16/2019 PP16-lec5-arch4.ppt
24/27
1.2,
UMA & NUMA Arch Block ia!rams
"ypical share#$a##ress$space architectures: (a) Uniform$memory access share#$a##ress$space
computer% () Uniform$memory$access share#$a##ress$space computer 'ith caches an# memories%
(c) Non$uniform$memory$access share#$a##ress$space computer 'ith local memory only
"
"
"
") -$S" S)S ") -D" S)S8 DS"
ot& are S"%
multiprocessors
!iffering in
"emor )ccess
Dela format
8/16/2019 PP16-lec5-arch4.ppt
25/27
omeor?:
self assesse! prolem
Please mar? our solution ' note
t&e mar?s ou ac&iee!HHHHHHH
1.25
E l P l /
8/16/2019 PP16-lec5-arch4.ppt
26/27
1.26
Example Prolem/:
Messa!e Passin! Multicomputer-
*ocal vs 0emote memory #ata access #elays
$onsi!er 6,%no!e multicomputer eac& no!e comprises of32%it EIS$ processor &aing 250 "G cloc? rate ' ; "
local memor. @&e (ocal memor access reLuires , cloc?
ccles remote comm initiate -setup oer&ea! is 15 cloc?
ccles ' t&e Interconnection etor? B is ;0 "/sec.@otal numer of instructions e+ecute! are 200000.
If memor !ata loa! ' store are 15J ' 10J respectiel
of t&e instructions compute:%
-a(oa!/ store time if all accesses are to local no!es-(oa!/ store time if 20J of accesses are to remote no!es
note !ssume Packet lengths are variable "de#end on addr
$ data b%tes& $ communication #rotocol given'''.
Size of #acket fields is in multi#le of b%tes.
E l P l / t’d
8/16/2019 PP16-lec5-arch4.ppt
27/27
Processor
1nterconnection
net'ork
(ocal
omputers
Messa!es
memor
Example Prolem/ (cont’d
Messa!e Passin! Multicomputer-
*ocal vs 0emote memory #ata access #elays