How Microsoft H ad Made Deep Learning Red-Hot in IT Industry

How Microsoft Had Made Deep Learning Red-Hot in IT IndustryZhijie Yan, Microsoft Research AsiaUSTC visit, May 6, 2014

Self Introduction

@MSRA 鄢志杰 996 – studied in USTC from 1999 to 2008 Graduate student – studied in iFlytek speech lab from

2003 to 2008, supervised by Prof. Renhua Wang Intern – worked in MSR Asia from 2005 to 2006 Visiting scholar – visited Georgia Tech in 2007 FTE – worked in MSR Asia since 2008

Research interests Speech, deep learning, large-scale machine learning

In Today’s Talk

Deep learning becomes very hot in the past few years

How Microsoft had made deep learning hot in IT industry

Deep learning basics Why Microsoft can turn all these ideas into reality Further reading materials

How Hot is Deep Learning

“This announcement comes on the heels of a $600,000 gift Google awarded Professor Hinton’s research group to support further work in the area of neural nets.” – U. of T. website

How Hot is Deep Learning

Microsoft Had Made Deep Learning Hot in IT Industry Initial attempts made by University of Toronto had

shown promising results using DL in speech recognition on TIMIT phone recognition task

Prof. Hinton’s student visited MSR as an intern, good results were obtained on Microsoft Bing voice search task

MSR Asia and Redmond collaborated and got amazing results on Switchboard task, which shocked the whole industry

Microsoft Had Made Deep Learning Hot in IT Industry

*figure borrowed from MSR principal researcher Li DENG

Microsoft Had Made Deep Learning Hot in IT Industry Followed by others and results were confirmed in

various different speech recognition tasks Google / IBM / Apple / Nuance / 百度 / 讯飞

Continuously advanced by MSR and others Expand to solve more and more problems

Image processing Natural language processing Search …

Deep Learning From Speech to Image ILSVRC-2012 competition on ImageNet

Classification task: classify an image into 1 of the 1,000 classes in your 5 bets

airliner lifeboat school busInstitution Error rate (%)

University of Amsterdam 29.6XRCE/INRIA 27.1

Oxford 27.0ISI 26.2

Deep Learning From Speech to Image ILSVRC-2012 competition on ImageNet

Classification task: classify an image into 1 of the 1,000 classes in your 5 bets

airliner lifeboat school busInstitution Error rate (%)

University of Amsterdam 29.6XRCE/INRIA 27.1

Oxford 27.0ISI 26.2

SuperVision 16.4

Deep Learning Basics

Deep learning deep neural networks multi-layer perceptron (MLP) with a deep structure (many hidden layers)

Input layer

Hidden layer

Output layer

Input layer

Hidden layer

Output layer

Hidden layerW2

Hidden layerW3

Deep Learning Basics

Sounds not new at all? Sounds familiar like you’ve learned in class?

Things not change over the years Network topology / activation functions / … Backpropagation (BP)

Things changed recently Data Big data General-purpose computing on graphics processing

units (GPGPU) “A bag of tricks” accumulated over the years

E.g. Deep Neural Network for Speech Recognition

Three key components that make DNN-HMM work

Tied tri-phones as

the basis units for HMM states

Many layers of nonlinear

feature transformatio

n Long window of frames

*figure borrowed from MSR senior researcher

Dong YU

E.g. Deep Neural Network for Image Classification The ILSVRC-2012 winning solution

*figure copied from Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”

Scale Out Deep Leaning

Training speed was a major problem of DL Speech recognition model trained with 1,800-hour data

(~650,000,000 vector frames) costs 2 weeks using 1 GPU

Image classification model trained with ~1,000,000 figures costs 1 weeks using 2 GPUs*

How to scale out if 10x, 100x training data becomes available?

*Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks”

DNN-GMM-HMM

Joint work with USTC-MSRA Ph.D. program student, Jian XU ( 许健 , 0510)

The “DNN-GMM-HMM” approach for speech recognition* DNN as hierarchical nonlinear feature extractor, trained

using a sub-set of training data GMM-HMM as acoustic model, trained using full data

*Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”

DNN-GMM-HMM

DNN-deriv

ed features

PCA HLDATied-state WE-RDLT

MMI sequence traini

unsupervised

adaptation

GMM-HMM modeling of DNN-derived features: combine the best of both worlds

Experimental Results 300hr DNN (18k states, 7 hidden layers) + 2,000hr

GMM-HMM (18k states)* Training time reduced from 2 weeks to 3-5 days

101112131415 15.4

14.713.8

Word Error Rate (%)DNN-HMM (CE) DNN-GMM-HMM (RDLT)DNN-GMM-HMM (MMI) DNN-GMM-HMM (UA)

10% WERR15% WERR

*Z.-J. Yan, Q. Huo, and J. Xu, “A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR”

A New Optimization Method

Joint work with USTC-MSRA Ph.D. program student, Kai Chen ( 陈凯 , 0700)

Using 20 GPUs, time needed to train a 1,800-hour acoustic model is cut from 2 weeks to 12 hours, without accuracy loss

The magic is to be published We believe the scalability issue in DNN training for

speech recognition is now solved!

Why Microsoft Can Do All These Good Things Research

Bridge the gap between academia and industry via our intern and visiting scholar programs

Scale out from toy problems to real-world industry-scale applications

Product team Solve practical issues and deploy technologies to serve

users worldwide via our services All together

We continuously improve our work towards larger scale, higher accuracy, and to tackle more challenging tasks

Finally We have big-data + world-leading computational

infrastructure

If You Want to Know More About Deep Learning Neural networks for machine learning: https://

class.coursera.org/neuralnets-2012-001 Prof. Hinton’s homepage:

http://www.cs.toronto.edu/~hinton/ DeepLearning.net: http://deeplearning.net/ Open-source

Kaldi (speech): http://kaldi.sourceforge.net/ cuda-convent (image):

http://code.google.com/p/cuda-convnet/

Thanks!

How Microsoft H ad Made Deep Learning Red-Hot in IT Industry

Documents

Catalogue EN - 대림코퍼레이션 pad for split-flywheel General Industry Hot runner system Seal cap, Insulator Plasma cutting torches part Swirl ring, Insulator, Cap Heat resistance

and China Acne Medication Industry 2014 Deep Research ...pdf1.askci.com/reports/2014/10/22/1459375er0.pdfd, The report introduced Acne Medication new project SWOT analysis Investment

Canadian Pulse Industry Overview 0… · >900,000 tonnes of soy protein was used by the global food industry ... Hot Cereals (5%) Cereal Bars (15%) Salad Dressings (5%) Premium Wet

Deep-Sea Research IIMatabos et al. 20154... · Deep-Sea Research II 121 (2015) 146–158 thin oceanic crust and is ejected as hot ﬂuids with high concentrations of reduced sulphur,

Experience in the Oil and Gas Industry - Sullivan & Cromwell · 4 5 Oil and Gas UPSTREAM S &C’s deep industry experience includes significant upstream projects globally. This work,

Deep Learning Tutorial: From Perceptrons to Deep Networks

(PBT) Industry And China Polybutylene Terephthalate 2014 Deep …pdf1.askci.com/reports/2014/09/25/113518ilb2.pdf · 2014. 9. 25. · ASKCI CONSULTING CO.,LTD ... (PBT) new project

Werkzeugtechnologie zur mechanischen …€¦ · Motorenbau Engine Construction Festwalzen Deep Rolling Automobilindustrie Automotive Industry MMS MQL Glätten Smoothing Kaltverfestigung

arts industry and non-industry

The hot debate on climate risk and pension investments ... · climate risk and pension investments: Does practice stack up against the law? ... How does pension industry practice

Deep Drawingacac

Global Banking Industry Outlook · finance, green finance and inclusive finance are going to embrace development opportunities. The present report will make a research into such hot

Robust Semantic Video Segmentation through Confidence-based … · 2019. 10. 23. · using deep learning methods in the automotive industry are the high safety requirements for the

Hot Doc 48 (Deep Web Article)

Deep Forest: Towards An Alternative to Deep Neural Networks

From deep IT-infrastructure to deep waters (Norwegian)

BP 4 Deep Well BP 6 Deep Well - Kaercher

Deep River

2015 Deep Research Report on Global Plastic Bearing Industry

Automotive EPS Industry 2014 Deep Research Report on Chinapdf1.askci.com/reports/2014/10/10/1025381z2v.pdf · 2014. 10. 10. · Global License: 4200 USD ... 1.4.2 Electro-hydraulic