69
Программировани е процессора Cell Киреев С.Е. Летняя школа по параллельному программированию, Новосибирск, 28 августа 2009

Программирование процессора Cell

  • Upload
    amil

  • View
    67

  • Download
    0

Embed Size (px)

DESCRIPTION

Программирование процессора Cell. Киреев С.Е. Летняя школа по параллельному программированию, Новосибирск, 28 августа 2009. Сравнение современных процессоров. Реализации систем на базе Cell. Roadrunner – самый мощный суперкомпьютер в мире на базе процессоров Opteron и Cell. - PowerPoint PPT Presentation

Citation preview

  • Cell .. , , 28 2009

  • CellRoadrunner Opteron Cell

  • Cell Cell

  • Cell IBMIBM BladeCenter QS 212 Cell B.E. 3.2 GHzIBM BladeCenter QS 222 PowerXCell 8i 3.2 GHz2 PowerXCell 8i 4.0 GHz

    Mercury Computer SystemsMercury Dual Cell Based System 22 Cell B.E. 3.2 GHzMercury Dual Cell Based Blade 22 Cell B.E. 3.2 GHzMercury PCI Express Cell Accelerator Board 21 Cell B.E. 2.8 GHz

    FixstarsFixstars GagaAccell 180 Accelerator Board1 PowerXCell 8i 2.8 GHz

    -PeakCell S2 PowerXCell 8i 3.2 GHzPeakCell W2 PowerXCell 8i 3.2 GHzPeakCell YPS4 PowerXCell 8i 3.2 GHz

  • Sony PlayStation3 1 CELL B.E. 3.2 GHz1 PPE6 SPE256 MB

  • Cell SPE

  • -100 () -30 ()

    L2

    L2

    L2

    L2

    FSB

    0

    1

  • Cell CellPower Processor Element ()Synergistic Processor Element ()Element Interconnect Bus

  • Cell1 PPE (PowerPC, 2 )

    8 SPE ( ) : 256 KB : 128 EIB

  • Cell. , :2 PPE8 SPE

  • Cell. , :2 PPE8 SPE

    . , :SPE SIMDPPE VMX

  • Cell. , :2 PPE8 SPE

    . , :SPE SIMDPPE VMX

  • PPE (read / write) SPE SPE (put / get)SPE (read / write) SPE SPESPE

  • SPE

  • :

  • SPE SPE ( ) LS SPE SPE

  • Load 1Count 1Store 1Load 2Count 2Store 2Load 1Count 1Store 1Load 2Count 2Store 2Count 3Load 3Load 4Count 4Store 3 SPE SPELoad SPECount Store SPE

  • Cell SPE

  • Cell Cell :

    SPE

    SPE

  • Cell SMP- + (OpenMP, PThreads, )

    + MPI

    Cell

  • Cell SMP- + (OpenMP, PThreads, ) 0123 , 4-core SMP

  • Cell + MPI , CLUSTER

  • Cell Cell PPE SPE PPESPE SPE

  • CellFunction offload modelPPE{matrix a, b, c;multiply(a, b, c);}SPEmultiply(){ }mul(){ }

  • Cell SPESPESPESPEPPEwork() { }work() { }work() { }work() { }TaskTaskTaskTaskTaskTaskmain(){ AddTask(); }Task

  • Cell SPESPESPESPEPPEstep1() { }step2() { }step4() { }step3() { }InputdataOutputdatamain(){ make_input(); get_output(); }

  • Cell SPEPPESPESPESPESPESPESPESPE

  • Cell SPEPPESPESPESPESPESPESPESPE PPE

  • Cell SPE

  • libspe2LibSPE Cell, . : PPE, SPE,PPE- SPE-,SPE- callback- PPE-.

  • libspe2: Hello, World! PPE (ppu_prog.c)#include extern spe_program_handle_t spu_hello;int main (){ unsigned int entry = SPE_DEFAULT_ENTRY; spe_context_ptr_t spe;

    spe = spe_context_create (0, NULL); spe_program_load (spe, &spu_hello); spe_context_run (spe, &entry, 0, (void *) 10, (void *) 20, NULL); spe_context_destroy (spe);

    return 0;}

    SPE (spu_prog.c)#include int main (unsigned long long spe, unsigned long long argp, unsigned long long envp) { printf("Hello, World! (%llu,%llu)\n", argp, envp); return 0;}

  • libspe2: Hello, World! Cell:main(){ }main(){ }spu_prog.cspu-gcc -o spu_prog spu_prog.cspu_progppu_prog.cspu_hello

    ppu-embedspu spu_hello spu_prog spu_prog.oppu-gcc -o prog ppu_prog.c spu_prog.o -lspe2spu_prog.oprog SPE PPE+SPE

  • libspe2: Hello, World! Cell: PPE , SPE.main(){ }thread_func() { }thread_func() { }thread_func() { }pthread_create()main() { }main() { }main() { }SPESPESPErun()PPE

  • libspe2: Hello, World! PPE #include #include #define NTHREADS 40extern spe_program_handle_t spu_hello;

    void *thread_func (void *data){ unsigned int entry = SPE_DEFAULT_ENTRY; spe_context_ptr_t spe; spe = spe_context_create (0,NULL); spe_program_load (spe, &spu_hello); spe_context_run (spe, &entry, 0, (void *)data, (void *)NTHREADS, NULL); spe_context_destroy (spe); return 0;}

    int main (){ pthread_t tid [NTHREADS]; unsigned long i; for (i=0;i

  • Cell SPE

  • DMA- 16 KB: Get: SPE Put: SPE Mailbox- 32- :SPE in (4)SPE out (1)SPE out interrupt (1) 32- : SPE

  • DMA- PPE, SPE SPESPESPESPE

  • DMA- PPE, SPE SPESPESPESPE

  • DMA- PPE, SPE SPESPESPESPE

  • DMA- 16 . 1,2,4,8,16 16*N . DMA- 5- DMA-. DMA-.

  • DMA- , get put:Barrier:getb, putbFence:getf, putf

  • libspe 2.0: DMA SPE

    // GET: PPE SPEvoid get (void *dest_lsa, unsigned long long sour_ea, unsigned long size){ int tag=mfc_tag_reserve(), mask=1

  • Mailbox: 4-

    PPESPEspe_in_mbox_write()spu_read_in_mbox()spe_out_mbox_read()spu_write_out_mbox()spe_out_mbox_status()spe_in_mbox_status()spu_stat_in_mbox()spu_stat_out_mbox() inout

  • Mailbox: 4-

    Signal: 4- PPESPEspe_in_mbox_write()spu_read_in_mbox()spe_out_mbox_read()spu_write_out_mbox()spe_out_mbox_status()spe_in_mbox_status()spu_stat_in_mbox()spu_stat_out_mbox()PPESPEspe_signal_write()spu_read_signal1()spu_stat_signal1()spu_stat_signal2()spe_signal_write()spu_read_signal2() inoutsig1sig2

  • libspe 2.0: Ping-pong PPE

    while ( spe_out_mbox_status(spe) == 0 );// spe_out_mbox_read(spe ,&data ,1);// data++;// spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, data); // 1

    SPE

    spu_write_out_mbox(data);// data=spu_read_signal1();// 1

  • SPE SPE . SPE SPE. SPESPEspe_ls_area_get()

  • SPE mailbox- SPE ./ SPE SPE. SPESPEspe_ps_area_get()putget

  • Cell SPE

  • SPE SPE 16 . . . , , :

  • SPE : 256 KB 16 : , 16 ,

  • . :unsigned char buffer[1024] __attribute__ ((aligned(16))); :vector float vf1 = { 1.0, 2.0, 3.0, 4.0 };vector float vf2[2] = {{1.0, 2.0, 3.0, 4.0}, {5.0, 6.0, 7.0, 8.0}};vector float vf3[2] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};

  • intrinsics : (specific) ,: d = si_to_int(a); (generic) ,: c = spu_add (a, b); (built-ins) intrinsics ( )., DMA-.

  • :d = spu_add(a,b);d = spu_sub(a, b);d = spu_madd(a, b, c);d = spu_mul(a, b); :d = spu_and(a, b);d = spu_or(a, b);d = spu_eqv(a, b); : simdmath.

  • :d = spu_insert(s, v, n);d = spu_splats(s);d = spu_promote(s, n);s = spu_extract(v, n); :d = spu_convtf(a, scale);d = spu_convts(a, scale);d = spu_extend(a);

  • :, d = spu_rl(a, count);d = spu_sl(a, count); d = spu_sel(a, b, pattern);d = spu_shuffle(a, b, pattern);

  • : void mulv (float *a, float *b, float *c, int n){ int i, j, k; vector float *bv = (vector float *) b; vector float *cv = (vector float *) c; vector float s, t;

    s = spu_splats(0.0); for (i=0; i

  • :for (i=0; i
  • :int a[n];short int b[n];for (i=0; i
  • :

    , , , , , , , , . !

  • SPE 32- . short int int, .short int i,k;for (i=0; i
  • Cell SPE

  • Cell IBM (IBM Cell SDK) libspe 2.0 : SIMD Math Library, MASS Library, FFT, Game math, Image Processing, Matrix, Vector, Multi-precision math, BLAS, LAPACK, Monte-CarloSync Library: , , , Software managed cache (OpenMP): xlc, xlfDaCS Data Communication and Synchronization library ALF Accelerated Library Framework

    BSC Cell SuperscalarMercury Computer Systems: MultiCore FrameworkRapidMindGedae

  • IBM Cell Broadband Engine resource center, http://www.ibm.com/developerworks/power/cell/documents.htmlCell Developer's Corner, http://www.power.org/resources/devcorner/cellcornerSTI Center of Competence for the Cell Broadband Engine Processor, http://sti.cc.gatech.eduBarcelona Supercomputing Center: Cell Superscalar, www.bsc.es/cellsuperscalarRapidMind Development Platform, www.rapidmind.netMercury Computer Systems: MultiCore Framework, http://www.mc.com/software/multicore_framework.aspx