40
応応応応応応応応 応 9 応 応応応応応応応応応応応応応応 2006 応 6 応 29 応 応応応応応応応 応応応応

応用数理工学特論 第 9 回 高速フーリエ変換とその並列化

Embed Size (px)

DESCRIPTION

応用数理工学特論 第 9 回 高速フーリエ変換とその並列化. 2006 年 6 月 29 日 計算理工学専攻 山本有作. 今回の講義の目標 (1). FFT の原理と基本的な技法を学ぶ。 信号処理,偏微分方程式の求解など,広い応用範囲 メーカー提供のライブラリ,フリーウェアなどが多数存在 しかし, FFT には用途に応じて様々な変種が存在 実数データの FFT ,分散メモリ向けの FFT ,など 使いたいタイプの FFT が,ライブラリにあるとは限らない。 FFT の原理と基本的な技法を理解 し,必要に応じて既存のソフトウェアを改造して使う力を身に付ける。 - PowerPoint PPT Presentation

Citation preview

  • 92006629

  • (1)FFTFFTFFTFFTFFT

    FFTFFT

  • (2)FFT

  • FFT

  • FFTCooley-Tukey FFTStockham FFTFFTFFTFFTFFTFFT

  • .FFTFFTCooley-Tukey FFTStockham FFTFFTFFTFFTFFT

  • 1.1 (1)DFTN a0a1 aN-1 c0c1 cN-1Discrete Fourier Transform

    DFTO(N2)

    DFTDFT

    DFT

  • 1.1 (2)DFT [0, 2]N f (xn) = an exp (ikx)k=0, 1, , N1 cnDFT cn N an DFT

    FFT

  • 1.2FFT (1)DFTFFTNDFT

    N = N/2,ej = a2j,oj = a2j+1

    kexp(2i(k+N/2)/N) = exp(2ik/N)

    N DFT N/2 DFT exp(2ik) DFTFFT; Fast Fourier Transform(*)

  • 1.2FFT (2)FFTFFTNDFTT(N) exp(2ik) 2N3N5N

    T(1) = 0

    FFT 5N log2 N NDFTT(N) = 2T(N/2) + 5NT(N) = 5N log2 N

  • 1.3Cooley-Tukey FFT (1)DFT*N/2DFTc0c1cN1 (*)

  • 1.3Cooley-Tukey FFT (2)Cooley-Tukey FFTN/2DFT* Cooley-Tukey FFT Cooley & Tukey, 1965

    c0c1c7c2c3c4c5c6a0a4a7a2a6a1a5a3N=8Cooley-Tukey FFT0123

  • 1.3Cooley-Tukey FFT (3)Cooley-Tukey FFTCooley-Tukey FFT L+1 L FFTin-place FFT c0c1c7c2c3c4c5c6a0a4a7a2a6a1a5a3

  • 1.3Cooley-Tukey FFT (4)Cooley-Tukey FFT{aj}aj j jp-1j1j0 p = log2Najip-1i1i0

    ip-1= j0ip-2 = j1 i0 = jp-1{aj}

    FFTc0c1c7c2c3c4c5c6a0a4a7a2a6a1a5a3j0 = 0 ip-1 = 0j0 = 1 ip-1 = 1j1 = 0 ip-2 = 0j1 = 1 ip-2 = 1

    jp-1 = 0 i0 = 0jp-1 = 1 i0 = 1

  • 1.4Stockham FFT (1)self-sortingFFT

    XL (j, k) L= 2L L= 2pL1 XL 2LLXL (j, *) 2LL ajaj+2L aj+2(L1 )L DFT

    XL (j, k)L = 0 X0 (j, 0) = ajL = p Xp (0, k) = ck X0 (j, 0) Xp (0, k) X0 X1X2Xp self-sortingFFT

  • 1.4Stockham FFT (2) XL (j, k)DFT*XLXL+1 XL (j, k)

    Stockham FFT X0 X1 Xp-1 Xp Stockham FFT

    Stockham FFTSelf-sorting In-place N

    XL+1 (j, k) = XL (j, k) + XL (j+L, k)NkLXL+1 (j, k+L) = XL (j, k) XL (j+L, k)NkL j = 0, 1, , L1k = 0, 1, , L1

  • 1.4Stockham FFT (3)Stockham FFT

    DO 20DO 30

    DO 10 L = 0, p1 = 2L = 2pL1 DO 20 k = 0, 1DO 30 j = 0, 1 XL+1 (j, k) = XL (j, k) + XL (j+, k)Nk XL+1 (j, k+) = XL (j, k) XL (j+, k)Nk30 CONTINUE20 CONTINUE10 CONTINUE

  • 1.4Stockham FFT (4)L+1N =128L = 3

    N = 8

    XL (j, k)XL+1 (j, k)2LLLL2LXL (j+L, k)XL (j, k)XL+1 (j, k)XL+1 (j, k+L)

  • 1.5FFTFFT (1)FFT (I)1.1(1)DFTDFT

    FFTFFTN = exp(2i/N) N* N

    FFTFFT(1) ck ck* (2) ck* FFT(3) (4) 1/N

    DFTDFT

  • 1.5FFTFFT (2)FFT (II)Stockham FFT XL XL+1 L FFT

    FFTDO 10 L = p1, 0, 1 = 2L = 2pL1DO 20 k = 0, 1DO 30 j = 0, 1 XL (j, k) = (XL+1 (j, k) + XL+1 (j, k+)) / 2 XL (j+, k) = (XL+1 (j, k) XL+1 (j, k+))Nk / 230 CONTINUE20 CONTINUE10 CONTINUE

  • 1.5FFTFFT (3)FFTDFTN N* N DFTFFT (II) N N* N FFTFFT*FFTFFT

    FFTFFTFFT

  • 1.6FFTDFTNxNy {ajx, jy} {ckx, ky} DFT

    Ny Nx FFT Nx Ny FFT 5 NxNy log2 (NxNy)

    FFTFFTFFT12xy

  • 1.7FFT (1)DFTDFTN = lm jk

    NDFT

    l m DFT m l DFTj = rl + s s = 0l1 r = 0m1 k = pm + qq = 0m1 p = 0l1

  • 1.7FFT (2)DFTN = lm DFT(1)l m FFT(2) s q exp(2iqs / N) (3)m l FFT

    FFT

    01234567891011121314150481215913261014371115l mFFTm lFFTajckarl+scpm+q

  • .

  • 2.1

    NEC SX-7 VPP5000 SR2201SR8000Intel Pentium 4

    01215SR8000

  • 2.2

    Intel Pentium IIIIBM Power 4AMD AthlonDEC Alpha

    8128KMB

  • 2.3Stockham FFT N/2N/41

    > DO 20 DO 30 O(N 1/2)DO 10 L = 0, p1 = 2L = 2pL1 DO 20 k = 0, 1DO 30 j = 0, 1 XL+1 (j, k) = XL (j, k) + XL (j+, k)Nk XL+1 (j, k+) = XL (j, k) XL (j+, k)Nk30 CONTINUE20 CONTINUE10 CONTINUE

  • 2.4kStockham FFT XL (j, k) XL (j +L, k) L*L = N/2 XL NA NA*L LNAXL (j+L, k)XL (j, k)LXL+1 (j, k+L)XL+1 (j, k)LL

  • 2.5 (1)FFTStockham FFT FFTFFTFFT

    FFTXL (j, k)XL+1 (j, k)XL+2 (j, k)LL+1

  • 2.5 (2)FFTL = 2LL = 2 pL2XL+1 (j, k) = XL (j, k) + XL (j+2L, k)2kLXL+1 (j, k+L) = XL (j, k) XL (j+2L, k)2kLXL+1 (j +L, k) = XL (j +L, k) + XL (j+3L, k)2kLXL+1 (j +L, k+L) = XL (j +L, k) XL (j+3L, k)2kLXL+2 (j, k) = XL+1 (j, k) + XL+1 (j+L, k)kLXL+2 (j, k+2L) = XL+1 (j, k) XL+1 (j+L, k)kLXL+2 (j, k +L) = XL+1 (j, k +L) + XL+1 (j+L, k +L) ( k+L ) LXL+2 (j, k+3L) = XL+1 (j, k +L) XL+1 (j+L, k +L) ( k +L ) LLL+1

  • 2.5 (3)FFT L+1L

    XL+1 (j, k) = XL (j, k) + XL (j+2L, k)2kLXL+1 (j, k+L) = XL (j, k) XL (j+2L, k)2kLYL+1 (j +L, k) = XL (j +L, k)kL+ XL (j+3L, k)3kLYL+1 (j +L, k+L) = XL (j +L, k) ( k +L ) L XL (j+3L, k) ( 3k +L ) L= iXL (j +L, k)kL + iXL (j+3L, k)3kLXL+2 (j, k) = XL+1 (j, k) + YL+1 (j+L, k)XL+2 (j, k+2L) = XL+1 (j, k) YL+1 (j+L, k) XL+2 (j, k +L) = XL+1 (j, k +L) + YL+1 (j+L, k +L)XL+2 (j, k+3L) = XL+1 (j, k +L) YL+1 (j+L, k +L)LL+1LL = exp (i/2) = i

  • 2.5 (4)FFTFFTByte/Flop

    6 / 44 / 46.4

    22 / 128 / 83.76 24 / 16

    66 / 3216 / 162.61 72 / 48Byte/Flop

  • 2.6 (1)Stockham FFT XL (j, k) N M M < N O(N) FFT O(N log2 N)

    FFTM = N 1/2 1.7 (2) NFFT(1) M = N 1/2 M FFT(2) s q exp(2iqs / N) (3) M M FFT

    (1)(3) FFT(1)(2)(3) O(N) FFT O(N)

  • 2.6 (2)N =16M = 4

    N M r r1FFT MFFT

  • .FFTFFT

  • 3.1 FFT (1)yxFFTall-to-all broadcast xyFFT

    p 5 NxNy log2 (NxNy) / p NxNy / p p1

  • 3.1 FFT (2)xFFTyy

    PU0PU2PU1PU1PU1PU3PU0202PU0PU21313

  • 3.2FFT (1)FFTN = NxNy NFFT(1) Ny Nx FFT y(2) jy kx exp(2i kx jy / N) (3) all-to-all broadcast (4) Nx Ny FFT x

    FFT01234567891011121314150481215913261014371115kxkykxkyNy NxFFTNx NyFFT+ ajckajxNy+jyckxNx+kyFFT

  • 3.2FFT (2) p 5 N log2 N / p N / p p1

    SR80008GFLOPS0.0625Gword/sN = 230

    FFTFFTCooley-Tukey FFTXy