Download pdf - Cuda Nhapmon

Transcript
  • Hng Dn Cch Thit Lp D n CUDA Ngo Quoc Vinh

    Kyoto Japan 2008

    1. Cu hnh phn cng v phn mm cn thit

    nh ngha:

    CUDA ngha l Compute Unified Device Architecture l 1 kin trc phn mm v

    phn cng cho mc ch pht trin tnh ton trn GPU. Trong h thng a nhim

    vic s dng GPU trong vic tnh ton (Lp trnh CUDA) v ha c th xy ra ng

    thi.

    Phn mm cn thit:

    Cuda SDK version2.0 c th dng cho windows XP 32-bit hoc 64-bit.

    Trong windows bn cn s dng chng trnh Microsoft Visual C++ 2005 vit 1 d

    n CUDA

    Phn cng cn thit: vit 1 chng trnh CUDA, ngoi cc phn mm h tr ta cn c phn cng

    chng trnh hot ng (khng phi ch m phng)

    Cc thit b phn cng c NVIDIA h tr trong lp trnh cuda, c th tham kho

    tai site: http://www.nvidia.com/object/cuda_learn_products.html (trong trng hp

    trn ch s dng 1 graphics card nu c bn c th s dng nhiu hn 1 graphics

    card).

    Tham kho:

    Ti site ny c th tham kho cc ti liu lien quan n cuda s dng ting anh

    http://forums.nvidia.com/index.php?showtopic=36286

    Ti site ny c th tham kho cc ti liu lien quan n cuda s dng ting nht

    http://www.nvidia.co.jp/object/cuda_home_jp.html.

  • 2. Cch ci t CUDA driver, CUDA tool kit v cuda SDK

    1 chng trnh cuda hot ng c trong mi trng windows xp. Bn cn phi c

    cc th vin h tr. Cc th vin ny c cha trong b SDK do NVIDIA cung cp.

    Cch download CUDA driver

    Driver c download t http://www.nvidia.com/object/cuda_get.html#windows tng

    ng vi s serial ca card. Nu ng ting nht bn c th download ti site:

    http://www.nvidia.co.jp/object/cuda_get_jp.html#windows

    Trong site ny bn chn vo mc NVIDIA driver for Microsoft Windows XP with CUDA

    support (174.55). nu bn dng OS l Windows 32-bit thi chon vo x86 trong muc

    Architecture download (Figure 1).

    Figure 1

    1. Tip theo 1 site NVIDIA Driver Download s xut hin v bn click vo text click

    here download.Figure 2.

    Figure 2

    Hp thoi File Download xut hin v bn click Save. Figure 3.

    Save

    Figure 3

  • 2. hp thoi Save As hin th ra hi bn ni mun save file driver, lc ny bn

    chn ng dn ni mun save file v click Save (Figure 4). Ch 1 thi gian

    chng trnh t dng download file c hon tt.

    Figure 4

    Cch ci t CUDA driver.

    Sau khi download xong, bn double click vo file *.exe download ( v d trong

    trng hp ny l file 169.21_forceware_winxp_32bit_english_whql dung cho

    Geforce8800GT, Operation System Window XP, language English(US)).

    Tip theo bn chn I accept the terms in the license agreement ri click Next.

    Chng trnh s ni m bn mun ci chng trnh. Theo ti bn nn mc nh

    c:\NVIDIA\Win2k\169.21\English (Figure 5)

    Figure 5

    Click Next chng trnh load cc file cn ci t,

    Click Next chng trnh ci t t ng (Figure 6).

    Figure 6

    Ch trong giy lt, sau khi chng trnh ci t xong bn click Finish khi ng

    my li.

  • Download file SDK v Toolkit.

    Sau khi ci dat driver cho card ban cn phi ci b cng c h tr lp trnh cho

    CUDA

    Bn cn download 2 file NVIDIA_CUDA_Toolkit_1.0.exe v

    NVIDIA_CUDA_SKD_1.0.exe ti site

    http://www.nvidia.com/object/cuda_get.html#windows . ty thuc vo OS ca my bn

    l 32 hay 64-bit (nu 32-bit bn chn Architecture l x86 v nu l 64-bit bn chon

    x86-64) (hinh 4.1).Cch download 2 file ny hon ton ging nhau.

    Sau khi click vo kiu Architecture, mt site mi s xut hin tip theo bn click vo

    click here download file. thc h in download file bn thc hin cc bc 1,2,3

    ca mc 4.1.1

    Ci t CUDA Toolkit.

    NVIDIA_CUDA_Toolkit_1.0.exe file ny cha cng c cc th vin h tr trong lp trnh

    cuda v cc ti liu hng dn lp trnh.

    Cch install file Toolkit.

    Sau khi download file NVIDIA_CUDA_Toolkit_1.0.exe (hoc mi hn) bn double

    click vo file ny ci t vo h thng. sau khi double click vo file ny th chng

    trnh ci t t ng Install Shied Wizard s c kch hot.

    Click button Next ci t chng trnh. Tip theo bn chn I accept the terms of

    license Agrement v click Next (hnh 4.10), lc ny chng trinh s hi ni bn mun

    ci t (theo ti bn nn ch default C:\CUDA)

    Figure 7

    Click Next tip tc qua bc tip theo. Bn click Install ci t phn mm

    sau khi qua cc bc m chng trnh Install Shied Wizard hng dn. ch 1

    vi pht sau khi chng trnh ci t xong bn click Finish kt thc vic ci t.

    Ci t SDK NVIDIA_CUDA_SKD_1.0.exe y l b SDK ca NVIDIA. Trong file ny sau khi ci t

    s cha cc d n mu. cc d n ny rt quan trng trong vic t nghin cu ca

    ban.

    1. Sau khi download file NVIDIA_CUDA_SDK_1.0.exe (ho mi hn) bn double

    click vo file ny ci t vo h thng. sau khi double click vo file ny th chng

    trnh ci t t ng Install Shied Wizard s c kch hot.

  • 2. Click button Next ci t chng trnh. Tip theo bn chn I accept the

    terms of license Agrement v click Next.bc ny c thc hin tng t bc 2

    ca cch install file Toolkit.

    3. Chng trinh s hi mt s thng tin ca bn. Bn cn phi in tn vo

    textbox Name. in tn cng ty hoc t chc vo Textbox Organization, v in

    a ch Email ca bn vo Email(Optional).(Figure 8)

    NameOrganization

    Email

    Next

    Figure 8

    4. Click Next tip tc. chng trnh s yu cu bn ch r ni bn mun ci

    t (theo ti bn nn ch default C:\Program Files\NVIDIA

    Corporation\NVIDIA CUDA SDK) ri click Next tip tc qua bc tip theo.

    5. Click Install ci t phn mm sau khi qua cc bc m chng trnh

    Install Shied Wizard hng dn. ch 1 vi pht sau khi chng trnh ci t

    xong bn click Finish kt thc vic ci t.

    Hu ht cc chng trnh cuda mu c NVIDIA cung cp chy trn nn Visual C++.

    V th bn cn phi c phn mm Microsoft Visual C++, c th dng bng Microsoft

    Visual Studio C++ Express,c cung cp min ph.

    Sau khi hon tt vic ci t, bn c th m d n mu deviceQuery ca NVIDIA

    cung cp trong C:\Program Files\NVIDIA Corporation\NVIDIA CUDA

    SDK\bin\win32\Release v chy th, nu thnh cng chng trnh s hin th cu

    hnh card GPU ca bn v hin th thong bo TEST

    3. Cch ci t chng trnh Visual profiler

    Visual profiler c cung cp bi NVIDIA dng phn tch v nh gi 1 chng trnh

    cuda.

    Download visual profiler t site: http://www.nvidia.com/object/cuda_get.html#windows.

    Trong site ny bn s tm thy dng text Cuda Visual Profiler trong bng Cuda for

    Windows(hnh 4.9).

    Vic download chng trnh ny tng t nh bc 7,8,9 ca mc 4.1.1.

    Sau bn extract file CudaVisualProfiler_0.2_beta_windows.zip

    Sau khi extract s xut hin folder CudaVisualProfiler trong cha 2 folders bin v

    Projects.

    Folder project s cha thng tin ca 1 d n cuda sau khi c phn tch.

  • Folder bin cha cc file *.dll v 1 file cudaprof.exe y l file chng trnh Cuda

    Visual Profiler.

    Chy chng trnh Cuda Visual Profiler bng cch double click vo file cudaprof.exe

    (Figure 9).

    Figure 9

    Chng trnh ny s hot ng m khng cn phi ci t.

    4. To highlighting cho syntax ca 1 file cuda (*.cu)

    1 file ngun cuda c m rng bng *.cu. nu bn dng Microsoft Visual C++ m

    file ny th n s hin th dng file text (s khng trc quan bi v cc bin, tu kha u

    l mu en.) tng kh nng trc quan cho chng rnh d quan st. NVIDIA cung

    cp 1 file nhng vo Microsoft Visual C++ file *.cu hin th trc quan di Microsoft

    Visual C++ nh 1 file *.cpp.

    1. Bn vo ng dn C:\Program Files\NVIDIA CUDA

    SDK\doc\syntax_highlighting\visual_studio_8 v copy file

    usertype.dat vo C:\Program Files\Microsoft Visual Studio 8\Common7\IDE.

    2. Tip theo bn vo menu tool ->options trong hp thoi Options bn vo Text

    Editor->File Extension (hnh 4.15).

    Pha bn phi hp thoi trong Extension: text bn g vo cu (tn m rng ca

    chng trnh cuda)

    3. Tip theo trong listbox Editor: bn chn Microsoft Visual C++ (mi trng hot

    ng ca cuda file).

    Sau bn click button Apply and then click OK.khi ng li Microsoft Visual

    studio hon tt (Figure 10).

  • (2)(3)

    Figure 10

    By gi bn hon tt vic lm highlighting 1 file *.cu, lm chng trnh sang sa d

    c hn

    5. Cch thit lp 1 d n CUDA trn Microsoft Visual C++ 2005

    Nhng phn trc gii thiu cc ci t lien quan n 1 d n CUDA.trong mc 4.4 gii

    thiu cch hot ng 1 chng trnh mu c NVIDIA SDK cung cp km theo. Tuy nhin

    bn c th chy bt c chng trnh mu no ca NVIDIA SDK cung cp.

    Trong phn ny s ch ra phng php to t mnh to ra 1 d n CUDA.

    n gin ta s to 1 d n console

    M Microsoft Visual C++, vo menu File->New->Project hp thoi New Project

    hin th bn vo Visual C++->Win32 sau chn Win32 Console Application.

    Bn c th t tn cho d n l CudaStep1 v solution CudaProgram. Sau

    bn click OK->Next->Finish. Cho n thi im ny bn c 1 d n console

    nhng cha phi l d n cuda (Figure 11).

    Figure 11

    Vo ca s Solution Explorer click phi vo Header Files->Add->New Item

    hp thoi Add new Item- CudaStep1 hin th. Tip theo bn vo Visual C++-

    >Code chn Header File(.h) v t tn CudaHeader.h ri click Add (Figure 12).

    File ny s cha thng tin v cu hnh ca chng trnh cuda v prototype ca

    cc hm kernel m bn s vit.

  • Name

    Figure 12

    Tip theo cn to 1 file cha m ngun cho 1 chng trnh cuda, file ny s c

    m rng bng .cu tng t nh bc 2 bn

    vo ca s Solution Explorer click phi vo Header Files->Add->New Item hp

    thoi Add new Item- CudaStep1 hin th. Tip theo bn vo Visual C++->Utility

    chn Text File (.txt) v t tn l CudaFunction.cu ri click Add.

    Cho n lc ny bn to ra 1 d n cuda, nhng chng trnh vn cha hot

    ng c v bn cha vit code cho chng trnh. m t hot ng chng

    trnh ta cn 1 chn trnh nh x l 1 matrix gm 32 phn t. hm cuda s c

    nhim v tng gi tr 1 phn t ln 1 n v

    M file CudaStep1.cpp v khi to 1 matrix dng lm d liu mu tnh ton

    C th copy on chng trnh sau:

    // cudastep1.cpp : Defines the entry point for the console application.

    #include "stdafx.h"

    #include

    #include "CudaHeader.h"

    #include

    using std::cout;

    using std::cin;

    //prototype display function de hien thi len man hinh

    void display(float *matrix, int col, int row);

    //ham chnh

    int _tmain(int argc, _TCHAR* argv[])

    {

    //khi to mng v gn gi tr ban u

    float matrix[32];

    for (int I = 0; I < 32; i++) {

    matrix[i] = 9;

    }

    //hin th matran cha x l

  • cout
  • M file CudaFunction.cu vit hm tnh ton. Trong file ny ta s vit 2 hm,

    hm CudaProcessing dng trong vic truyn d liu gia Host, Device v gi

    hm kernel tnh ton. Hm cn li l CudaProcessingKernel dng tnh

    ton.

    #include "CudaHeader.h"

    #include

    #include

    #include

    #include

    extern "C"

    //prorotype hm kernel

    __global__ void CudaProcessingKernel(float *data);

    /*hm ny dng chuyn data t Host qua Device, gi hm kernel sau truyn d liu tnh ton v li cho Host*/

    void CudaProcessing( float *hostData)

    {

    // chun b b nh trn Driver cha data nhn t Host

    float *deviceData;

    int size = sizeof(float)*MATRIXSIZE;

    cudaMalloc((void**)&deviceData, size);

    //copy data t b nh Host vo b nh Device tnh ton

    cudaMemcpy(deviceData, hostData, size, cudaMemcpyHostToDevice);

    //khai bo s thread trn 1 block cn x l

    dim3 dimBlock(XTHREADS, YTHREADS);

    //khai bo s block trn 1 grid cn x l

    dim3 dimGrid(XBLOCKS, YBLOCKS);

    //gi chng trnh tnh ton kernel

    CudaProcessingKernel(deviceData);

    //sau khi tnh ton xong d liu c tr v li cho b nh Host

    cudaMemcpy(hostData, deviceData, size, cudaMemcpyDeviceToHost);

    //xa b nh tm thi trn Device

    cudaFree(deviceData);

    }

  • __global__ void CudaProcessingKernel(float *data) //kernel function

    {

    //s th t block trn 1 grid

    int bx = blockIdx.x;

    //s th t thread trn 1 block

    int tx = threadIdx.x;

    //s th t thread trn 1 grid

    int tid = bx * XTHREADS + tx;

    //tnh ton data

    data[tid] = data[tid]+1;

    //ng b cc thread

    __syncthreads();

    }

    Bin dch chng trnh: bn c th bin dch chng trnh ny vi Win32 hoc

    Win64, release, debug, emurelease, hoc emudebug Ty thuc vo cu hnh my

    ca bn v ch bn mun build. Tuy nhin chng trnh s bo li v khng

    compiler dc file CudaFunction.cu.

    Bn cn download builrule t site

    http://forums.nvidia.com/index.php?showtopic=30273 file cuda_build_rule.zip

    vic download file ny tng t nh bc 8,9 ca mc 4.1.1.Vo Solution

    Explorer click phi ln d n CudaStep1 chn Custom Build Rules hp

    thoi Visual C++ Build Rule Files hin th click vo Find Existing bn chn

    file cuda (file cuda build rules sau khi extract file cuda_build_rule.zip) v

    click Open (Figure 13).

    Figure 13

    Tr li hp thoi Visual C++ Build Rule Files bn check vo CUDA bo cho

    compiler bit s dng build rule ny bin dch file cuda (*.cu).

    Do trong chng trnh cn kt ni cc th vin nn bn vo Vo Solution

    Explorer click phi ln d n CudaStep1 chn Properties hp thoi

    CudaStep1 property pages xut hin. vo Configuration Properties->C/C++-

    >General trn ca s pha phi bn vo Additional Include Directories v nhp

    ng dn $(CUDA_INC_PATH);./;../../common/inc;"C:/Program Files/NVIDIA

  • Corporation/NVIDIA CUDA SDK/common/inc" kt ni cc header ca

    chng trnh Cuda (Figure 14).

    Figure 14

    Vo Configuration Properties->linker->General trn ca s pha bn phi bn

    vo Additional Library Directories v nhp ng dn cha dn cc file library

    ca Cuda. $(CUDA_INC_PATH);./;../../common/lib;"C:/Program Files/NVIDIA

    Corporation/NVIDIA CUDA SDK/common/lib";"C:/CUDA/lib"

    vo Configuration Properties->linker->Input trn ca s pha bn phi bn vo

    Additional Dependences v nhp tn cc th vin cn thit cho chng trnh.

    Trong trng hp ny ta dng 2 th vin cudart.lib cutil32.lib.(Figure 15).

    Figure 15

    By gi th bin dch v chy chng trnh bn s thy kt qu hin ln mng

    hnh console.

    Gii thch code:

    Trong file CudaStep1.cpp cha hm Main() v hm hin th display(). Trong hm

    main() 1 matrix s c khi to v gn gi tr ban u l 9. Sau hm main()

    cho hin th gia tr ca matrix chua tnh ton nay ln man hnh.

    Tip theo chng trnh chnh s gi hm tnh ton (CudaProcessing()) ca device

    v truyn matrix ny device tnh ton.

    Sau khi tnh ton xong th hm main() s cho hin th kt qu ln mn hnh.

    Trong file CudaFunction.cu s tn ti 2 hm.

  • Hm CudaProcessing() dng copy data t b nh Host sang b nh device

    sau gi hm tnh ton kernel, sau khi kt thc tnh ton d liu c tr v li

    cho Host.

    Hm CudaProcessingKernel() dng tnh ton. S th t ca mi phn t trong

    matrix s tng ng vi s th t ca mi thread trong grid, iu ny c xc nh

    thng qua ch s tid

    6. C ch hot ng 1 chng trnh cuda

    Ta s dng Cuda v mong mun chng trnh chy nhanh hn nh kh nng x l song

    song. V th tt hn ht chng ta cn loi b cc nh hng lm 1 chng trnh chy

    chm i.

    1 chng trnh cuda hot ng theo m hnh SIMD (single instruction multiple data) v

    th cc nh hng chnh n tc ca chng trnh l s khng thng nht v tranh

    chp vng nh trong qu trnh c v lu data. iu ny buc trnh bin dch phi

    chn gii php an ton trong truy cp d liu, iu ny bin 1 chng trnh song song

    theo m hnh SIMD bin thnh m hnh ni tip.

    Kch thc ca kiu d liu rt quan trng trong vic truy cp data 1 cch thng nht

    (coalescing) kch thc data phi bng 4,8,16 bytes.

    ngoi ra nu s lnh tnh ton l ln th ta nn copy data t global memory vo shared

    memory hn ch vic truy cp thng xuyn vo global memory lm chm chng

    trnh (do vic truy cp vo global memory mt rt nhiu thi gian hn truy cp vo

    shared memory)

    Pattern ca 1 chng trnh cuda thng s dng 2 hm (1 hm dnh cho vic truy cp

    data v hm cn li thng c gi l hm kernel dng cho vic x l data)

    //hm dng trong vic truy cp data

    Void DataFunction( type hostData)

    {

    //to 1 vng nh trn device lu data t host vo device

    Type *deviceData;

    Int size=sizeof(type)*(s phn t ca deviceData);

    cudaMalloc((void**)&deviceData,size);

    //copy data t b nh Host vo b nh Device tnh ton

    cudaMemcpy(deviceData, hostData, size, cudaMemcpyHostToDevice);

    //to 1 vng nh lu data sau khi tnh toan

    Type *resultData;

    Int resultSize =sizeof(type)*(s phn t ca resultData);

    cudaMalloc(void**)&resultData,resultSize);

    //khai bo s thread trn 1 block cn x l

  • dim3 dimBlock(XTHREADS,YTHREADS);

    //khai bo s block trn 1 grid cn x l

    dim3 dimGrid(XBLOCKS,YBLOCKS);

    //gi chng trnh tnh ton kernel

    CudaProcessingKernel(deviceData,resultData);

    //sau khi tnh ton xong d liu c tr v li cho b nh Host

    cudaMemcpy(hostData, resultData, resutlSize, cudaMemcpyDeviceToHost);

    //xa b nh tm thi trn Device

    cudaFree(deviceData);

    cudaFree(resultData);

    }

    //hm dng trong tnh ton data

    __global__ void CudaProcessingKernel(type *data, type * result)

    {

    //s th t block trn 1 grid

    int bx=blockIdx.x;

    int by=blockIdx.y;

    //s th t thread trn 1 block

    int tx=threadIdx.x;

    int ty=threadIdx.y

    //copy data t global memory vo shared memory

    __shared__ type sharedData[];

    __shared__ type sharedResult[];

    //ng b h thng m bo data c copy ln shared memory

    __synchreads();

    //tnh ton data da theo ch s ca thread

    //ng b cc thread m bo data c tnh ton xong

    __syncthreads();

    }

    hiu cch hot ng 1 chng trnh cuda ta cn thng nht 1 s cc khi nim sau.

    Host: l nhng tc v v cu trc phn cng, phn mm c x l t CPU.

    Driver: l nhng tc v v cu trc phn cng, phn mm c x l t GPU.

  • Figure 16

    Cch hot ng c m t nh sau:

    1) D liu cn c tnh ton lun trn b nh ca Host v vy bc 1 truyn d

    liu cn tnh ton t b nh Host qua b nh Device.

    2) Sau Device s gi cc hm ring ca mnh tnh ton d liu .

    Sau khi tnh ton xong, d liu cn c tr v li cho b nh ca Host.

    7. nh gi 1 chng trnh CUDA da vo Cuda visual profiler

    Trong phn 4.3 miu t cch ci t 1 chng trnh Visual profiler. By gi ta s

    dng phn mm quan st 1 d n cuda.

    Chy chng trnh Visual profiler bng cch double click vo file visualprof.exe

    1) To 1 project mi bng cch vo menu File->New hoc dng toolbar. Sau hp

    thoi New Project xut hin, bn cn in tn v ng dn lu li d n ny

    (trong trng hp ny ta dng tn l CudaStep1Test) (Figure 17).

    Figure 17

    2) Click OK hp thoi session settings xut hin. Vo laugh chn file trong d n

    cuda m bn bin dch thnh cng (trong trng hp ny chn file

    CudaStep1.exe) ri click Start chng trnh hot ng (Figure 18).

  • Figure 18

    Lu : chng trnh visual profiler s hin th 1 thng bo li do chng trnh

    CudaStep1.exe khng kt thc. gii quyt vn ny bn xa b dng lnh

    cout>>wait; trong file CudaStep1.cpp ri bin dch li chng trnh. By gi bn

    c th dng visual profiler quan st d n CudaStep1.exe.(Figure 19).

    Figure 19

    Nu thnh cng chng trnh s hin th 1 table cha cc thng s cn thit nh

    gi 1 d n cuda. D liu thu c sau khi phn tch s c lu trong file Excel.

    .