Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Huawei Noah’s Ark Lab
Yunhe Wang
AI on the Edge— Discussion on the Gap Between Industry and Academia
ABOUT ME
Enthusiasm
Programmer
PKUer
Researcher
Yunhe Wangwww.wangyunhe.site
[Han et. al. NIPS 2015]
[Han et. al. ICLR 2016 best paper award]
• It is very surprised to see that over 90% of pre-trained parameters in AlexNet and VGGNet are redundant.• The techniques used in visual compression is transferred successfully, e.g. quantization and Huffman
encoding.• Compressed networks can achieve the same performance compared to original baselines after fine-tuning.• Cannot directly obtain a considerable speed-up on mainstream hardwares.
Restrictions for using AI on the edge.
Deep Model Compression
CNNpack: Packing Convolutional Neural Networks in the Frequency Domain (NIPS 2016)
Compressed AlexNet VGGNet-16 ResNet-50
rc 39x 46x 12xrs 25x 9.4x 4.4x
Top1-err 41.6% 29.7% 25.2%Top5-err 19.2% 10.4% 7.8%
Input data DCT bases DCT feature maps
Weightedcombination
Feature maps of this layer
DCT bases K-means clustering
0.4990.4980.5010.5020.500
0.5Huffman&
CSR storage
Original filters l1-shrinkage Quantization Compression
232
572
955.9 12.4 7.9
0
200
400
600
800
AlexNet VGGNet-16 ResNet-50
Memory (MB)
7e8
2e10
3.8e93e7
2.1e9 8.5e80.00E+00
5.00E+09
1.00E+10
1.50E+10
2.00E+10
2.50E+10
AlexNet VGGNet-16 ResNet-50
Multiplications
Input Images Teacher Network
Student NetworkDiscriminator
(Assistant)
Feature Space
Teacher FeatureStudent Feature
LGAN =1n
Pni=1 H(o
iS ,y
i) + � 1nPn
i=1
⇥�log(D(ziT )) + log(1�D(z
iS))
�⇤,
We suggest to develop a teaching assistant network to identify the difference between featuresgenerated by student and teacher network:
Adversarial Learning of Portable Student Networks (AAAI 2018)
Visualization results of different networks trained on the MNIST dataset, where features of a specific categoryin every sub-figure are represented in the same color: (a) features of the original teacher network; (b) featuresof the student network learned using the standard back-propagation strategy; (c) features of the studentnetwork learned using the proposed method with a teaching assistant.
(a) accuracy = 99.2% (b) accuracy = 97.2% (c) accuracy = 99.1%
Adversarial Learning of Portable Student Networks (AAAI 2018)
An illustration of the evolution ofLeNet on the MNIST dataset.Each dot represents an individualin the population, and the thirtybest individuals are shown ineach evolutional iteration. Thefitness of individuals is graduallyimproved with an increasingnumber of iterations, implyingthat the network is morecompact but remaining the sameaccuracy.
Original Filters:
Remained Filters:
Retrained Filters:
Toward Evolutionary Compression (SIGKDD 2018)
Two generators in CycleGAN will be simultaneously compressed:
Statistics of compressed generators
P30 Pro Latency: 6.8s -> 2.1s
Co-Evolutionary Compression for GANs (ICCV 2019)
Generator A
Generator B
Generator A
Generator B
Gen A
Gen B
Iteration = 1 Iteration = 2 Iteration = T… …
… …
Population APopulation A Population A
Population BPopulation B Population B
… …
Input Baseline ThiNet Ours
Student Network
Teacher Network
RandomSignals
Generated Images
Generative Network Distillation
A generator is introduced to approximate training data
DAFL: Data-Free Learning of Student Networks (ICCV 2019)
How to provide perfect model optimization service on the cloud?
Privacy-Related AI Applications
Entertainment APPFaceID
Voice assistant
Fingerprint
Original and Generated Face Images
98.20% on MNIST 92.22% on CIFAR-10 74.47% on CIFAR-100
AdderNet: Do We Really Need Multiplications in Deep Learning?(CVPR 2020)
Using Add in Deep Learning can significantly reduce the energy consumption and area cost of chips.https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdfhttp://eecs.oregonstate.edu/research/vlsi/teaching/ECE471_WIN15/mark_horowitz_ISSCC_2014.pdfhttp://eyeriss.mit.edu/2019_neurips_tutorial.pdf
Feature Visualization on MNIST
Adder Network Convolutional Network
Feature calculation in adder neural network:
Feature calculation in convolutional neural network:
Validations on ImageNet
Huawei HDC 2020: Real-time Video Style Transfer
Inference Time: about 630ms Inference Time: 60ms
Huawei Atlas 200AI Accelerator Module
The key approaches used for completing this task:
1. Model Distillation: remove the optical flow module in the original network
2. Filter Pruning: reduce the computational complexity of the video generator
3. Operator Optimization: automatically select the suitable operators in Atlas 200
https://developer.huaweicloud.com/exhibition/Atlas_neural_style.html
Discussions – Edge Computing
The 4 reasons to move deep learning workloads from the cloud down on to the device
1. Privacy & security: if your data can't leave the premises where it’s captured
2. Latency: if you need to have a real-time response, so in the case of a robotics workload or a self-driving car
3. Reliability: your network up to the cloud might not always be reliable
4. Cost: if a channel is actually costly to use to send the data up to the cloud
ü fast
ü large memory
ü free energy resource
Server/Cloud Mobile device
• small memory• slow• limited energy
resource
Deep Neural Network
Github Link
Zhihu (知乎)
Thank You!Contact me:
[email protected], [email protected]://www.wangyunhe.site