View
37
Download
0
Category
Preview:
DESCRIPTION
How Microsoft H ad Made Deep Learning Red-Hot in IT Industry. Zhijie Yan, Microsoft Research Asia USTC visit, May 6, 2014. Self Introduction. @MSRA 鄢志杰 996 – studied in USTC from 1999 to 2008 - PowerPoint PPT Presentation
Citation preview
How Microsoft Had Made Deep Learning Red-Hot in IT IndustryZhijie Yan, Microsoft Research AsiaUSTC visit, May 6, 2014
Self Introduction
@MSRA 鄢志杰 996 – studied in USTC from 1999 to 2008 Graduate student – studied in iFlytek speech lab from
2003 to 2008, supervised by Prof. Renhua Wang Intern – worked in MSR Asia from 2005 to 2006 Visiting scholar – visited Georgia Tech in 2007 FTE – worked in MSR Asia since 2008
Research interests Speech, deep learning, large-scale machine learning
In Today’s Talk
Deep learning becomes very hot in the past few years
How Microsoft had made deep learning hot in IT industry
Deep learning basics Why Microsoft can turn all these ideas into reality Further reading materials
How Hot is Deep Learning
“This announcement comes on the heels of a $600,000 gift Google awarded Professor Hinton’s research group to support further work in the area of neural nets.” – U. of T. website
How Hot is Deep Learning
How Hot is Deep Learning
How Hot is Deep Learning
How Hot is Deep Learning
Microsoft Had Made Deep Learning Hot in IT Industry Initial attempts made by University of Toronto had
shown promising results using DL in speech recognition on TIMIT phone recognition task
Prof. Hinton’s student visited MSR as an intern, good results were obtained on Microsoft Bing voice search task
MSR Asia and Redmond collaborated and got amazing results on Switchboard task, which shocked the whole industry
Microsoft Had Made Deep Learning Hot in IT Industry
*figure borrowed from MSR principal researcher Li DENG
Microsoft Had Made Deep Learning Hot in IT Industry Followed by others and results were confirmed in
various different speech recognition tasks Google / IBM / Apple / Nuance / 百度 / 讯飞
Continuously advanced by MSR and others Expand to solve more and more problems
Image processing Natural language processing Search …
Deep Learning From Speech to Image ILSVRC-2012 competition on ImageNet
Classification task: classify an image into 1 of the 1,000 classes in your 5 bets
airliner lifeboat school busInstitution Error rate (%)
University of Amsterdam 29.6XRCE/INRIA 27.1
Oxford 27.0ISI 26.2
Deep Learning From Speech to Image ILSVRC-2012 competition on ImageNet
Classification task: classify an image into 1 of the 1,000 classes in your 5 bets
airliner lifeboat school busInstitution Error rate (%)
University of Amsterdam 29.6XRCE/INRIA 27.1
Oxford 27.0ISI 26.2
SuperVision 16.4
Deep Learning Basics
Deep learning deep neural networks multi-layer perceptron (MLP) with a deep structure (many hidden layers)
Input layer
Hidden layer
Output layer
W0
W1
Input layer
Hidden layer
Output layer
W0
W1
Hidden layerW2
Hidden layerW3
Deep Learning Basics
Sounds not new at all? Sounds familiar like you’ve learned in class?
Things not change over the years Network topology / activation functions / … Backpropagation (BP)
Things changed recently Data Big data General-purpose computing on graphics processing
units (GPGPU) “A bag of tricks” accumulated over the years
E.g. Deep Neural Network for Speech Recognition
Three key components that make DNN-HMM work
Tied tri-phones as
the basis units for HMM states
Many layers of nonlinear
feature transformatio
n Long window of frames
*figure borrowed from MSR senior researcher
Dong YU
E.g. Deep Neural Network for Image Classification The ILSVRC-2012 winning solution
*figure copied from Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”
Scale Out Deep Leaning
Training speed was a major problem of DL Speech recognition model trained with 1,800-hour data
(~650,000,000 vector frames) costs 2 weeks using 1 GPU
Image classification model trained with ~1,000,000 figures costs 1 weeks using 2 GPUs*
How to scale out if 10x, 100x training data becomes available?
*Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”
DNN-GMM-HMM
Joint work with USTC-MSRA Ph.D. program student, Jian XU ( 许健 , 0510)
The “DNN-GMM-HMM” approach for speech recognition* DNN as hierarchical nonlinear feature extractor, trained
using a sub-set of training data GMM-HMM as acoustic model, trained using full data
*Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”
DNN-GMM-HMM
DNN-deriv
ed features
PCA HLDATied-state WE-RDLT
MMI sequence traini
ng
CMLLR
unsupervised
adaptation
GMM-HMM modeling of DNN-derived features: combine the best of both worlds
Experimental Results 300hr DNN (18k states, 7 hidden layers) + 2,000hr
GMM-HMM (18k states)* Training time reduced from 2 weeks to 3-5 days
101112131415 15.4
14.713.8
13.1
Word Error Rate (%)DNN-HMM (CE) DNN-GMM-HMM (RDLT)DNN-GMM-HMM (MMI) DNN-GMM-HMM (UA)
10% WERR15% WERR
*Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”
A New Optimization Method
Joint work with USTC-MSRA Ph.D. program student, Kai Chen ( 陈凯 , 0700)
Using 20 GPUs, time needed to train a 1,800-hour acoustic model is cut from 2 weeks to 12 hours, without accuracy loss
The magic is to be published We believe the scalability issue in DNN training for
speech recognition is now solved!
Why Microsoft Can Do All These Good Things Research
Bridge the gap between academia and industry via our intern and visiting scholar programs
Scale out from toy problems to real-world industry-scale applications
Product team Solve practical issues and deploy technologies to serve
users worldwide via our services All together
We continuously improve our work towards larger scale, higher accuracy, and to tackle more challenging tasks
Finally We have big-data + world-leading computational
infrastructure
If You Want to Know More About Deep Learning Neural networks for machine learning: https://
class.coursera.org/neuralnets-2012-001 Prof. Hinton’s homepage:
http://www.cs.toronto.edu/~hinton/ DeepLearning.net: http://deeplearning.net/ Open-source
Kaldi (speech): http://kaldi.sourceforge.net/ cuda-convent (image):
http://code.google.com/p/cuda-convnet/
Thanks!
Recommended