13
Huawei Noah’s Ark Lab Yunhe Wang AI on the Edge — Discussion on the Gap Between Industry and Academia

AI on the Edgevalser.org/webinar/slide/slides/20200603/模型压缩... · 2020. 6. 3. · Huawei Atlas 200 AI Accelerator Module The key approaches used for completing this task:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Huawei Noah’s Ark Lab

    Yunhe Wang

    AI on the Edge— Discussion on the Gap Between Industry and Academia

  • ABOUT ME

    Enthusiasm

    Programmer

    PKUer

    Researcher

    Yunhe Wangwww.wangyunhe.site

    [email protected]

  • [Han et. al. NIPS 2015]

    [Han et. al. ICLR 2016 best paper award]

    • It is very surprised to see that over 90% of pre-trained parameters in AlexNet and VGGNet are redundant.• The techniques used in visual compression is transferred successfully, e.g. quantization and Huffman

    encoding.• Compressed networks can achieve the same performance compared to original baselines after fine-tuning.• Cannot directly obtain a considerable speed-up on mainstream hardwares.

    Restrictions for using AI on the edge.

    Deep Model Compression

  • CNNpack: Packing Convolutional Neural Networks in the Frequency Domain (NIPS 2016)

    Compressed AlexNet VGGNet-16 ResNet-50

    rc 39x 46x 12xrs 25x 9.4x 4.4x

    Top1-err 41.6% 29.7% 25.2%Top5-err 19.2% 10.4% 7.8%

    Input data DCT bases DCT feature maps

    Weightedcombination

    Feature maps of this layer

    DCT bases K-means clustering

    0.4990.4980.5010.5020.500

    0.5Huffman&

    CSR storage

    Original filters l1-shrinkage Quantization Compression

    232

    572

    955.9 12.4 7.9

    0

    200

    400

    600

    800

    AlexNet VGGNet-16 ResNet-50

    Memory (MB)

    7e8

    2e10

    3.8e93e7

    2.1e9 8.5e80.00E+00

    5.00E+09

    1.00E+10

    1.50E+10

    2.00E+10

    2.50E+10

    AlexNet VGGNet-16 ResNet-50

    Multiplications

  • Input Images Teacher Network

    Student NetworkDiscriminator

    (Assistant)

    Feature Space

    Teacher FeatureStudent Feature

    LGAN =1n

    Pni=1 H(o

    iS ,y

    i) + � 1nPn

    i=1

    ⇥�log(D(ziT )) + log(1�D(z

    iS))

    �⇤,

    We suggest to develop a teaching assistant network to identify the difference between featuresgenerated by student and teacher network:

    Adversarial Learning of Portable Student Networks (AAAI 2018)

  • Visualization results of different networks trained on the MNIST dataset, where features of a specific categoryin every sub-figure are represented in the same color: (a) features of the original teacher network; (b) featuresof the student network learned using the standard back-propagation strategy; (c) features of the studentnetwork learned using the proposed method with a teaching assistant.

    (a) accuracy = 99.2% (b) accuracy = 97.2% (c) accuracy = 99.1%

    Adversarial Learning of Portable Student Networks (AAAI 2018)

  • An illustration of the evolution ofLeNet on the MNIST dataset.Each dot represents an individualin the population, and the thirtybest individuals are shown ineach evolutional iteration. Thefitness of individuals is graduallyimproved with an increasingnumber of iterations, implyingthat the network is morecompact but remaining the sameaccuracy.

    Original Filters:

    Remained Filters:

    Retrained Filters:

    Toward Evolutionary Compression (SIGKDD 2018)

  • Two generators in CycleGAN will be simultaneously compressed:

    Statistics of compressed generators

    P30 Pro Latency: 6.8s -> 2.1s

    Co-Evolutionary Compression for GANs (ICCV 2019)

    Generator A

    Generator B

    Generator A

    Generator B

    Gen A

    Gen B

    Iteration = 1 Iteration = 2 Iteration = T… …

    … …

    Population APopulation A Population A

    Population BPopulation B Population B

    … …

    Input Baseline ThiNet Ours

  • Student Network

    Teacher Network

    RandomSignals

    Generated Images

    Generative Network Distillation

    A generator is introduced to approximate training data

    DAFL: Data-Free Learning of Student Networks (ICCV 2019)

    How to provide perfect model optimization service on the cloud?

    Privacy-Related AI Applications

    Entertainment APPFaceID

    Voice assistant

    Fingerprint

    Original and Generated Face Images

    98.20% on MNIST 92.22% on CIFAR-10 74.47% on CIFAR-100

  • AdderNet: Do We Really Need Multiplications in Deep Learning?(CVPR 2020)

    Using Add in Deep Learning can significantly reduce the energy consumption and area cost of chips.https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdfhttp://eecs.oregonstate.edu/research/vlsi/teaching/ECE471_WIN15/mark_horowitz_ISSCC_2014.pdfhttp://eyeriss.mit.edu/2019_neurips_tutorial.pdf

    Feature Visualization on MNIST

    Adder Network Convolutional Network

    Feature calculation in adder neural network:

    Feature calculation in convolutional neural network:

    Validations on ImageNet

  • Huawei HDC 2020: Real-time Video Style Transfer

    Inference Time: about 630ms Inference Time: 60ms

    Huawei Atlas 200AI Accelerator Module

    The key approaches used for completing this task:

    1. Model Distillation: remove the optical flow module in the original network

    2. Filter Pruning: reduce the computational complexity of the video generator

    3. Operator Optimization: automatically select the suitable operators in Atlas 200

    https://developer.huaweicloud.com/exhibition/Atlas_neural_style.html

  • Discussions – Edge Computing

    The 4 reasons to move deep learning workloads from the cloud down on to the device

    1. Privacy & security: if your data can't leave the premises where it’s captured

    2. Latency: if you need to have a real-time response, so in the case of a robotics workload or a self-driving car

    3. Reliability: your network up to the cloud might not always be reliable

    4. Cost: if a channel is actually costly to use to send the data up to the cloud

    ü fast

    ü large memory

    ü free energy resource

    Server/Cloud Mobile device

    • small memory• slow• limited energy

    resource

    Deep Neural Network

    Github Link

    Zhihu (知乎)

  • Thank You!Contact me:

    [email protected], [email protected]://www.wangyunhe.site