43
并并并并并并 PARALLEL PROGRAMMING Pingpeng Yuan

并行程序设计 Parallel Programming

Embed Size (px)

DESCRIPTION

并行程序设计 Parallel Programming. Pingpeng Yuan. Parallel Programming. What Why How Goal exam. What is Parallel Programming?. Coordinating multiple processing elements to solve a problem. Parallelism - A simplistic understanding. Multiple tasks at once. - PowerPoint PPT Presentation

Citation preview

PowerPoint Template

Parallel ProgrammingPingpeng Yuan

1Parallel ProgrammingWhatWhyHowGoalexam

2What is Parallel Programming?Coordinating multiple processing elements to solve a problem

3Parallelism - A simplistic understandingMultiple tasks at once.Distribute work into multiple execution units.Two approaches -Data ParallelismFunctional or Control Parallelism . 4

Fundamentals of Parallel Processing16/12/2008Ashish Agrawal, IIT Kanpur4WhyWhyTechnology TrendApplication Needs 5

AgeGrowth5 10 15 20 25 30 35 40 45 . . . .

Human Architecture! Growth Performance6VerticalHorizontalNo. of ProcessorsC.P.I.1 2 . . . .Computational Power Improvement7MultiprocessorUniprocessorGeneral Technology Trends8Microprocessor performance increases 50% - 100% per yearClock frequency doubles every 3 yearsTransistor count quadruples every 3 years

8Clock Frequency Growth Rate (Intel family)9 30% per year

9Intel Many Integrated Core (MIC)

32 core version of MIC:

10Tileras 100 cores (June 2011)Tilera has introduced a range of processors (64-bit Gx family: 36 cores, 64 cores and 100 cores), aiming to take on Intel in servers that handle high-throughput web applications64-bit cores running up to 1.5GHzManufactured in 40nm technology11

Top500

.

Paradigm Change in HPC12GPU ArchitectureNVIDIA Fermi, 512 Processing Elements (PEs)

13NVIDIA planned to put 512 PEs into a single GPU, but the GTX480 turns out to have 480 PEs.The Gap Between CPU and GPU

ref: Tesla GPU Computing Brochure14This gap is narrowed by multi-core CPUs.GPU Will Top the List in Nov 2010

15Transistor Count Growth Rate (Intel family)16Transistor count grows much faster than clock rate- 40% per year, order of magnitude more contribution in 2 decades

16How to Use More TransistorsImprove single threaded performance via architecture:Not keeping up with potential given by technology Use transistors for memory structures to improve data localityUse parallelismInstruction-level Thread level1717Similar Story for Storage (Transistor Count)18

18DRAM densities to double every 3 yearsProjections for DRAM densities revised downwards over timeCurrent densities at 4Gb/dieTrends in DRAM CapabilitiesDRAM data rates to double every 4-5 yearsProjections for DRAM data rates revised upwards over timeCurrent data-rates at 2.2 Gb/s19Similar Story for Storage1980-951000x50%3% (only 2x from 1980-95)2xcache20

20Memory hierarchycache21CPU registersL1 cacheL2 cachePrimary MemorySecondary StorageTertiary Storage100 bytes32KB256KB1GB1TB1PB10 ms1s-1hr< 1 ns1 ns4 ns60 ns21Similar Story for Storagebit2222Disk trendsDisks too: Parallel disks plus cachingDisk capacity, 1975-1989doubled every 3+ years25% improvement each yearfactor of 10 every decadeStill exponential, but far less rapid than processor performanceDisk capacity, 1990-recentlydoubling every 12 months100% improvement each yearfactor of 1000 every decadeCapacity growth 10x as fast as processor performance!

2323Disk trendsOnly a few years ago, we purchased disks by the megabyteToday, 1 GB (a billion bytes) costs $1 $0.50 $0.05 from Dell => 1 TB costs $1K $500 $50, 1 PB costs $1M $500K $50KTechnology is amazingFlying a 747 6 above the groundReading/writing a strip of postage stamps242425Commodity computer systems19462003 General-purpose computing: Serial. 5KHz4GHz.

2004 General-purpose computing goes parallel. Clock frequency growth flat. #Transistors/chip 19802011: 29K30B! #cores: ~dy-2003

26If you want your program to run significantly faster youre going to have to parallelize it 27

Drivers of Parallel Computing Application needsref:http://www.nvidia.com/object/tesla_computing_solutions.html28GPU can achieve 10xperformance over CPU.Applications of Parallel Processing29

Fundamentals of Parallel Processing16/12/2008Ashish Agrawal, IIT Kanpur2930

Why Do We Need Parallel Processing?31Reasonable running time = Fraction of hour to several hours (103-104 s)In this time, a TIPS/TFLOPS machine can perform 1015-1016 operations Example 2: Fluid dynamics calculations (1000 1000 1000 lattice)109 lattice points 1000 FLOP/point 10 000 time steps = 1016 FLOPExample 3: Monte Carlo simulation of nuclear reactor1011 particles to track (for 1000 escapes) 104 FLOP/particle = 1015 FLOPDecentralized supercomputing ( from Mathworld News, 2006/4/7 ): Grid of tens of thousands networked computers discovers 230 402 457 1, the 43rd Mersenne prime, as the largest known prime (9 152 052 digits )Example 1: Southern oceans heat Modeling (10-minute iterations)300 GFLOP per iteration 300 000 iterations per 6 yrs = 1016 FLOP4096 E-W regions1024 N-S regions12 layersin depth32

33

34

IDC20122.7ZB202035ZB35What Makes it Big Data?36VOLUMEVELOCITYVARIETYVALUE

SOCIALBLOGSMARTMETER 101100101001001001101010101011100101010100100101

36VARIETY + COMPLEXTITYVELOCITY + DENSITY big data real time data- ( facebbok , tweets, weblogs ) .NumbersHow many data in the world?800 Terabytes, 2000160 Exabytes, 2006500 Exabytes(Internet), 20092.7 Zettabytes, 201235 Zettabytes by 2020How many data generated ONE day?7 TB, Twitter10 TB, Facebook37

Big data: The next frontier for innovation, competition, and productivity McKinsey Global Institute 2011Big Data Use Cases38Todays ChallengeNew DataWhats PossibleHealthcareExpensive office visitsRemote patient monitoringPreventive care, reduced hospitalizationManufacturingIn-person supportProduct sensorsAutomated diagnosis, supportLocation-Based ServicesBased on home zip codeReal time location dataGeo-advertising, traffic, local searchPublic SectorStandardized servicesCitizen surveysTailored services, cost reductionsRetailOne size fits all marketingSocial mediaSentiment analysis segmentation data 38HowHow39Parallel ProgrammingParallel ArchitecturesParallel AlgorithmsParallel Programming

40Most people in the research community agree that there are at least two kinds of parallel programmers that will be important to the future of computingProgrammers that understand how to write software, but are nave about parallelization and mapping to architectureProgrammers that are knowledgeable about parallelization, and mapping to architecture, so can achieve high performance

Goal324: +42442 + 1 doc +20801 doc43