47
無人機視覺之關鍵技術 仿神經智慧視覺晶片 清華大學電機系 鄭桂忠教授

無人機視覺之關鍵技術 仿神經智慧視覺晶片most-sat.atri.org.tw/archive/file/02-無人機視覺之...Canziani, Alfredo, Adam Paszke, and Eugenio Culurciello. "An analysis

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • 無人機視覺之關鍵技術 –仿神經智慧視覺晶片

    清華大學電機系 鄭桂忠教授

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Outline

    • Smart Agricultureopportunities and challenges

    • Neuromorphic and AI algorithms

    • Neuromorphic sensor (Processing-In-Sensor)

    • Neuromorphic architecture (Computing-In-Memory)

    • Summary

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Outline

    • Smart Agricultureopportunities and challenges

    • Neuromorphic and AI algorithms

    • Neuromorphic sensor (Processing-In-Sensor)

    • Neuromorphic architecture (Computing-In-Memory)

    • Summary

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Crop Farming Challenge

    Climate

    Soil Quality

    Bug

    Microorganism

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Traditional Crop Farming Solution

    Spraying Pesticide

    • Pesticide Pollution Problem• Using Too Many Farmers• Human Food Crisis

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Smart Agriculture For Crop Farming

    AIoT Tractor

    • Multi-Camera Capture Image• Object Detection using Workstation Computer• The precise spray of pesticides and fertilizers

    • Evaluate The Farmland Situation• Monitor The Crop• Organized the Farming

    Drone AI System

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Outline

    • Smart Agricultureopportunities and challenges

    • Neuromorphic and AI algorithms

    • Neuromorphic sensor (Processing-In-Sensor)

    • Neuromorphic architecture (Computing-In-Memory)

    • Summary

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Drone Obstacle Avoid System

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    DJI drone Obstacle Avoid System

    Obstacle Avoid System based on Radar

    Obstacle Avoid System based on Neuromorphic Algorithm

    Proposed Technology

    Current Technology

    Radar

    Camera

    Power Consumption

    Radar 12w

    FPV Camera 20mw-200mw

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    直線運動 拋物線運動

    藍框:侷域動態偵測結果 綠框:物件動態預測結果

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Prediction Demonstration(Block matching and Centroid Velocity)

    [C.-C. Lo, NTHU, unpublished]

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Drone Farmland Detection and Segmentation

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Drone Farmland Detection and Segmentation

    • Farmland Detection and Segmentation• Obstacle Labeling• Flight Path Planning and Mission Scheduling

    Proposed TechnologyAlgorithm Running on Drone

    Current TechnologyAlgorithm Running on Laptop

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Instance Detection and Segmentation

    Faster R-CNN

    FCN

    He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected], Alfredo, Adam Paszke, and Eugenio Culurciello. "An analysis of deep neural network models for practical applications." arXiv preprint arXiv:1605.07678 (2016).

    The More Accuracy you Want, The More Operations and Parameters you Need

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected], Albert, et al. "Survey and Benchmarking of Machine Learning Accelerators." arXiv preprint arXiv:1908.11348 (2019).

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Future Drone System

    • Farmland Detection and Segmentation• Obstacle Detection and Avoid System• Flight Path Planning and Mission

    Scheduling• Lite Batter and Takeoff weights

    More Powerful Neuromorphic models are needed!Low-power & cost-aware AI chips are needed !!

    Processing data in Sensor and Memory could be the solution!!!

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Outline

    • Smart Agricultureopportunities and challenges

    • Neuromorphic and AI algorithms

    • Neuromorphic sensor (Processing-In-Sensor)

    • Neuromorphic architecture (Computing-In-Memory)

    • Summary

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Schematic of the Mammalian Retina

    (1) Rods

    (2) Cones

    (3) horizontal cells

    (4) bipolar cells

    (5) amacrine cells

    (6) retinal ganglion cells

    ref: H. Wässle,” Parallel processing in the mammalian retina,” Nature Reviews Neuroscience, Vol. 5, pp. 1-11, October 2004.

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    The Visual Language

    Multiple representations of the visual world

    Werblin, UC Berkeley

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Why we need Application-driven CIS

    Motion

    Detection

    Facial

    Detection

    Dynamic images

    Feature Descriptor Feature Extraction

    Application-driven CIS can process the specific tasks in real time

    >>> Low-power & low-latency

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Processing-In-Sensor

    Digital Still Camera

    +

    General-purpose ISP

    ― Well-defined application

    ― Simplified software module

    ― Reduced hardware complexity

    AI and Deep Learning

    Application-driven CIS

    +

    AI CNN Processor

    ― HDR

    ― Noise reduction

    ― Color correction

    Adapt to low power edge devices

    Raw

    Image

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    • Digital Still Camera with ISP• Fixed/full resolution digitization

    • High resolution data transfer

    • High capacity frame buffer

    • High complexity ISP

    • Application-driven CIS with AI CNN Processor• High speed feature extraction

    • Down resolution digitization

    • Low bandwidth/latency/power

    Processing-In-Sensor (cont.)

    Application-driven CIS needsProcessing-in-sensor (PIS):Higher Energy Efficiency for Edge Devices !

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    • Convolutional CIS (C2IS)

    • Real-time feature extraction

    NTHU PIS Design

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    • Real-time in-column convolution

    • Programmable 3x3 kernel with 4-bit weights

    • Tunable-resolution ADC

    • Normal linear-response image

    • Goal and Advantage• Improve system power efficiency

    • Reduce data transfer and latency

    • 1st stage feature extraction

    NTHU PIS Architecture

    Application:Real-time Feature Extraction

    Front-end of CNN Processor

    [C.-C. Hsieh, NTHU, ASSCC 2019]

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    • TSMC 0.18um

    • Operation voltage: 0.5V

    • Pixel: 7.6um

    • Chip area: 1.9*2.3 mm2

    Chip Specification

    [C.-C. Hsieh, NTHU, ASSCC 2019]

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Live Demonstration

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Outline

    • Smart Agricultureopportunities and challenges

    • Neuromorphic and AI algorithms

    • Neuromorphic sensor (Processing-In-Sensor)

    • Neuromorphic architecture (Computing-In-Memory)

    • Summary

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Weight and Processing in Neural Systems1. Weights are accessed in every processing!!

    2. Weights are stored very close to processing unit!!

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Increasing model size over the

    years for high accuracy

    More memory access

    Energy efficiency of memory get

    saturated recently

    Energy for memory access

    becomes difficult to reduce

    Memory Access Energy Growing Up

    Source: X. Xu, Nature electronics, 2018

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Memory Wall in von Neumann Architecture

    Data movement across memory layers and system bus Nonvolatile memory (NVM/SSD)-DRAM-SRAM (on-chip)- PE

    NVM usually require long read/write latency

    Long latency, high power consumption, high hardware cost !

    High-bandwidth memory is required

    Beyond Von Neumann (new) architecture is required

    Source: M.-F. Chang, ISSCC2018 Tutorial & 31.4

    Von Neumann

    “Bottleneck”

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Inference Architectures between PE and Memory

    • Computation still digital• Eliminates data transfer

    costs

    • Memory read energydominates

    SRAM

    Bank

    Near MemoryMemory

    Digital processing

    • Memory access and computation combined

    • Mixed signal computation• significant energy &

    latency reduction

    SRAM

    Bank

    Deep In-MemoryMemory

    Mixed signal

    Processing

    SRAM

    Bank

    ALU / Digital Processing

    DigitalMemory

    • Data access energy and latency dominates

    Energy Efficiency

    Transferred data

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Concept of Nonvolatile Computing-In-Memory

    Input/WL

    (IN)

    Weight/MC

    (W)

    Product

    (INxW)IMC

    0 0 (HRS) 0 0

    0 +1 (LRS) 0 0

    1 +1 (LRS) +1 ILRS

    1 0 (HRS) 0 IHRS

    Concept:

    Input: WLs (IN)

    Weight: cell data (W)

    Multiply: WLs x cell-data

    Accumulation: BL current

    Analog-to-digital out[W.-H. Chen, NTHU, ISSCC2018]

    𝑰𝑩𝑳 [𝒋] =

    𝒊=𝟎

    𝒊=𝑵

    𝑰𝑴𝑪[𝒊, 𝒋]

    𝑰𝑴𝑪 = 𝑰𝑯𝑹𝑺 (𝐖𝐢𝐣 = 𝟎)

    WL ON:

    WL OFF:

    𝑰𝑴𝑪𝟎

    Accumulation at BL:W

    L D

    river

    Tim

    e

    Reference Generator

    Analog-to-digital out

    Write-Control

    BL[0

    ]

    BL[v

    -1]

    BL[v

    ]

    Cell Arrays

    WL[0]

    WL[i]

    WL[n]

    SL[0

    ]

    SL[v

    -1]

    SL[v

    ]

    𝑰𝑴𝑪 = 𝑰𝑳𝑹𝑺 (𝐖𝐢𝐣 = 𝟏)

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Nonvolatile Computing-In-Memory (nvCIM)

    Nonvolatile computing-in-memory (nvCIM) Store the weight at power-off

    suppress data movement across memory layers

    Parallel in-memory multiply-and-accumulate (MAC)

    reduce amount of intermediate dataPotentially low energy, low cost, and high performance !!

    I𝑩𝑳j=

    n

    Ini×W𝐢,𝐣

    nvCIM macro

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Recent CIM Silicon Development Status

    SRAM-CIM

    PCM-CIM

    20192017 20182016

    ReRAM-CIM1Mb RRAM 2bIN/3bW CIM@ISSCC19

    1Mb RRAM CNNBinary/Ternary, 3b out @ISSCC18 (NTHU)

    224b SLC RRAM @ISSCC18 (Stanford)

    16Mb RRAM Logic@IEDM17 (NTHU)

    2Mb RRAM FCNMLC cell+binary out@VLSI18 (Panasonic)

    32*32 RRAM FCNBinary/Ternary, 3b out @VLSI17 (THU&NTHU)

    6T-SRAM Classifier@VLSI16, (Princeton)

    4+2T SRAM@ VLSI17 (Princeton)

    BRein Memory@ VLSI17 (Hokkaido)

    10T CSRAM BWN@ ISSCC18 (MIT)

    Classifier SVM@ ISSCC18 (UIUC)

    DSC6T SRAM BNN@ ISSCC18 (NTHU)XNOR-SRAM@ VLSI18 (Columbia)

    4b T8T SRAM CNN @ ISSCC19 (NTHU)AI-Accelerator+T-SRAM@ISSCC19, (THU+NTHU)Sandwich RAM BWN @ISSCC19, (Southeast Univ.)

    Time-based SRAM+ Accelerator@ISSCC19, (Minnesota)

    Compute SRAM + Accelerator@ISSCC19, (Michigan)

    10*3 Crossbar FCNPCM 8bW CIM@IEDM18(IBM)

    3Mb PCM CIM@Nat. electronics18 (IBM)

    3Mb PCM CIM@Nat. Commun.17 (IBM)

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Input-aware reference current generation (increases read margin)

    Small-offset multi-level current sense amplifier (DR-CSA)

    Customized ternary-bits model compression algorithm [BioCAS 2018]

    [W.-H. Chen, K.-T. Tang & M.F. Chang, et al., NTHU, ISSCC2018 #31.4]

    CIM for Neural Networks - 1

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    [K.T. Tang & M.F. Chang, NTHU, ISSCC 2019]

    Multi-bit computation ReRAM macro

    Serial-Input Non-Weighted Product structure

    Down-Scaling Weighted Current Translator

    Triple-Margin Current-mode Sense Amplifier

    Multi-bits compact model

    SINWP SINWP

    DSWCT DSWCT

    Comparator + PN-ISUB

    Positive weights Negative weights

    DOUT[2:0]DOUTSIGN

    IREF

    TMCSA

    Input

    CLK

    WL[0]

    WL[1]

    WL[n]

    YMUXS

    L[0

    ]

    SL

    [1]

    SL

    [n-1

    ]

    BL

    [n]

    BL

    [0]

    BL

    [1]

    BL

    [n-1

    ]

    SL

    [n]

    Positive-Weight Group Negative-Weight Group

    2bit input

    IN[0] IN[1]

    IBL_MSB[0] IBL_MSB[n]IBL_LSB[0] IBL_LSB[n]

    MCM MCL

    CIM for Neural Networks - 2

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    • ResNet-18 for CIFAR-10 dataset

    • Optimization according to ADC output bit limitation

    Algorithm Deployment

    [K.T. Tang & M.F. Chang, NTHU, ISSCC 2019]

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    MNIST Demonstration

    Demo

    [K.T. Tang & M.F. Chang, NTHU, ISSCC 2018]

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Cifar-10 Demonstration (ResNet-18)

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    • On-going/Near-Term (Von Neumann Arch.)• Digital Neural Network based designs • GPU + High bandwidth memory • ASIC+ novel accelerators

    CPU+ Storage (memory) Suffer latency/energy bottleneck due to data movement between

    ALU and memory

    • Mid-Term (Next-generation)• Near-memory computing (NMC)• In-memory computing (IMC/CIM)Memory: storage + computing• 10~1000x energy reduction !

    Future Trend: Integrating CIM in AI-ASIC

    Bus

    Von Neumann “Bottleneck”

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    New scenario of edge AI application device

    • High system robustness &

    flexibility

    • High resolution digitization

    • Low power consumption

    and High Energy Efficiency

    • High speed image

    processing & computing

    Breaking Memory Wall

    & speeding up MVMs

    Application-driven CIS

    Energy EfficiencyResolution

    Bus Fabric

    CPU SRAM

    GPIO

    UART

    Processing-In-Sensor (PIS)

    CIS In IFSPI

    OSD

    Ctrl

    DDR4

    Ctrl

    I2CDMA Display

    Ctrl

    DDR4

    PHY

    Output WriterCtrl

    CIM-based DLA

    [K.-T. Tang, NTHU, VLSI-Symposium 2019]

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Outline

    • Smart Agricultureopportunities and challenges

    • Neuromorphic and AI algorithms

    • Neuromorphic sensor (Processing-In-Sensor)

    • Neuromorphic architecture (Computing-In-Memory)

    • Summary

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Summary

    • Processing-In-Sensor • Processing data in sensor can achieve low power and latency performance.• However, the achievable computation complexity is limited and suitable only for the

    well-defined application-driven architectures.

    • Computing-In-Memory• Integrating In-memory computing can break the von Neumann bottleneck to achieve

    high energy efficiency.• Mutli-bits CIM marco to achieve higher accuracy causes more hardware issues and

    needs smarter circuit design.

    • Next generation AI-ASIC• In addition to hardware improvement, device and fabrication technology (ex. 3D

    stacking) and software development are also key to the road of success.

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    Vision of the future

    • The drone can work automatically• Low-power & cost-aware AI chip can deploy on the drone

    • All the algorithms that the drone needs can run on the drone online and real-time

  • EN I AC Neuromorphic and Biomedical Engineering Lab PI: Prof. Kea-Tiong (Samuel) Tang, [email protected]

    • ITRI project

    •NTHU-ITRI project

    •Moon shoot project, MOST

    •Competitive team project, NTHU

    Acknowledgement