78
自然语言处理的进展 周明博士 微软亚洲研究院副院长、国际计算语言学学会候 任会长 201812月于重庆大学

自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

自然语言处理的进展

周明博士微软亚洲研究院副院长、国际计算语言学学会候

任会长2018年12月于重庆大学

Page 2: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

5 Research Focus Areas

微软亚洲研究院

简介

Microsoft’s

fundamental research

arm in the Asia Pacific

region

Technologies transferred into

all major Microsoft products

Founded on

Nov. 5th, 1998

Papers published

4,000+

50+Best Paper

The World’sHottestComputer Lab

-- MIT TR

Page 3: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

微软亚洲研究院主要研究方向

自然用户界面 多媒体机器学习 大数据

Page 4: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Self-driving

Personal assistant

Surveillance detection Translation Medical diagnostics Game

Art

Image recognition Speech recognition Natural language Generative model Reinforcement learning

人工智能的进展

Page 5: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

大数据集

RDMA

计算能力

14M images

算法与框架

数据、算法、计算构成人工智能的三要素

Page 6: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

微软人工智能在典型评测上达到人类水准

96%

物体识别88.5%

阅读理解

94.9%

语音识别69%

机器翻译

ImageNet Switchboard SQuAD WMT-2017

Page 7: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

创造

智能

认知智能

语言、知识、推理

感知智能

听觉、视觉、触觉

运算智能

记忆、计算

Page 8: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 9: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

自动以文字描述录像

A car is running A man is cutting

a piece of meatA man is performing

on a stage

A man is riding

a bike

A man is singing A panda is walking A woman is riding

a horse

A man is flying in a field

Page 10: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

视觉 语言 搜索知识语音

微软认知服务:理解世界的智能API

Page 11: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

利用计算机对人类语言进行处理、理解,使其具备人类的听说读写能力,是未来最为关键的核心技术之一。--比尔·盖茨

自然语言处理是人工智能皇冠上的明珠--比尔·盖茨

Page 12: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

语言理解的难度

剩女和剩男产生的原因有两个:一是谁都看不上,二是谁都看不上。

Page 13: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

语言智能(自然语言理解)

NLP 基础技术

词汇表示和词汇分析

短语表示和分析

句法语义表示和分析

篇章表示和分析

信息检索

NLP 核心技术

机器翻译

提问和回答

聊天和对话

知识工程

语言生成

NLP+

搜索引擎

智能客服

商业智能

语音助手

机器学习大数据用户画像

推荐系统信息抽取

领域知识云计算

Page 14: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

重要且广泛的应用

Page 15: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

NLP 的历史沿革• 1940 ~ 1954: 电子计算机发明,智能理论构建

• 代表人物:Chomsky,Backus,Weaver, Shannon

• 1954 ~ 1970:形式化规则系统,逻辑理论,感知机

• 代表人物:Minsky,Rosenblatt

• 1970 ~ 1980:基于HMM的语音识别,语义和篇章建模

• 代表人物:Frederick Jelinek,Martin Kay

• 1980 ~ 1991:大规模规则知识库构建

• 代表系统:WordNet (1985), HPSG (1987), CYC (1984)

• 1991 ~ 2008:统计建模和机器学习的广泛应用

• 代表方法:SVM, MaxEnt, PCFG, PageRank

• 典型应用:统计机器翻译,IBM Watson问答系统,互联网搜索

• 2008 ~ 2017:大数据和深度学习

• 代表技术:词嵌入,神经机器翻译,机器阅读,对话系统

Source : The Economist

Page 16: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 17: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

what is namethe brotherJustin Bieber

nsubj

det

prep

nnnn

attr

of

pobj

root

WP VBZ NNDT NNNNP NNPIN

句法分析

Page 18: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

语义分析

上下文无关的单轮分析

上下文有关的多轮分析

语义分析

Page 19: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Handspring's other board

members are Dubinsky and

chief product officer Jeff

Hawkins, both Handspring

co-founders; John Doerr,

general partner at Kleiner ,

Perkins, Caufield & Byers;

Bruce Dunlevie, managing

member with Benchmark

Capital; Mitchell Kertzman,

CEO of Liberate Inc.; and

Kim Clark, dean of Harvard

Business School.

IE

NAME TITLE ORGANIZATION

Dubinsky board member Handspring

Jeff Hawkins board member Handspring

John Doerr board member Handspring

Kim Clark board member Handspring

Dubinsky co-founder Handspring

Kim Clark dean Harvard…

…….

Person-Affiliation

信息抽取信息抽取

Page 20: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Adam Wang (Male)

XXXX Company of Bejing,Beijing City,

1000071364-110-XXX

[email protected] Education Background From Sept. 2000 to Apr. 2003, I got master degree from University of XXX in computer software engineering major. From Sept. 1996 to July. 2000, I got bachelor degree from School of XXX of Xi’an in computer science and technology major. Experience From March 2003 to now, Software Engineer, XXXX Company of Beijing From June 2001 to March 2003, Software Engineer, Research Center of XXX Company From Sept. 2000 to May 2001, Software Engineer, National Lab. Of XXX University

Interests Reading, music, and jogging

<Name>Adam Wang</Name>

<Gender>Male</Gender>

<Address>XXXX Company of Bejing, Beijing City</Address>

<ZipCode>100007</ZipCode>

<Mobile>1364-110-XXX</Mobile>

<Email>[email protected]</Email>

<GradSchool> University of XXX</GradSchool> <Major>Computer Software Engineering</Major><Degree>Master</Degree><GradSchool>School of XXX of Xi’an</GradSchool><Major>Computer Science and Technology</Major><Degree>Bachelor</Degree>

<Interests>Reading, music, and jogging</Interests>

<Experience>From March 2003 to now, Software Engineer, XXXX Company of Beijing From June 2001 to March 2003, Software Engineer, Research Center of XXX Company From Sept. 2000 to May 2001, Software Engineer, National Lab. Of XXX University</Experience>

Personal Information/

Personal detailed info

Education/

Educational detailed info

Research Experience

Interests

示例:简历信息抽取

Page 21: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

知识图谱

• 实体知识图谱上的节点

• 谓词连接两个实体的边

• CVT (Compound Value Type)

并不是一个真实的实体节点,而是被用来搜集一个事件的多个属性

• 事实三元组,包括一个谓词及其连接着的两个实体。事件,通过一个CVT节点连接着一组多实体。

Page 22: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

问答系统

Simple-Relation Question

Multi-Constraint Question

Multi-Hop Question

问答系统

Page 23: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Image/Video

Web DocumentWeb TableKnowledge Graph

Query

Response

Entity Table (Cell)Paragraph/Sentence

/Phrase

Human-in-the-Loop (HI)

KBQA TableQA DocQACommunityQA

Question Generation

Image/Video

VisionQA

Page 24: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Honolulu

Michelle Obama

Page 25: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Where did the president of the United States born?

S

set

A1

find(set, r1)

A4

find(set, r2)

A4

{e}

United States

A15

A16

placeOfBirth

A17

isPresidentOf

A17

A1: S → Set

A4: Set → find (set, r1)

A4: Set → find (set, r2)

A15: Set → {e}

A16: e → United States

A17: r2 → isPresidentOf

A17: r1 → placeOfBirth

𝑆

𝐴1

𝐴1

𝐴4

𝐴4

𝐴4

𝐴4

𝐴15

𝐴15

𝑒𝑈𝑆

𝑟𝑔𝑟𝑎𝑑

𝑒𝑛𝑑

𝑒𝑈𝑆

𝑟𝑝𝑟𝑒𝑠

𝑟𝑝𝑟𝑒𝑠

𝑟𝑔𝑟𝑎𝑑

Guo et al. Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base. NIPS, 2018.

Page 26: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Dialog Memory

Entity{United States, tag=utterance}

{New York City, tag=answer}

Predicate {isPresidentOf}

{placeOfBirth}

Action

Subsequence

𝑠𝑒𝑡 → 𝐴4 𝐴15 𝑒𝑈𝑆 𝑟𝑝𝑟𝑒𝑠𝑠𝑒𝑡 → 𝐴4 𝐴15𝑠𝑒𝑡 → 𝐴4 𝐴4 𝐴15 𝑒𝑈𝑆 𝑟𝑝𝑟𝑒𝑠 𝑟𝑏𝑡ℎ𝑠𝑒𝑡 → 𝐴4 𝐴4 𝐴15

Where did president of

the United States born?New York City

Where did he

graduate from?

𝑟𝑔𝑟𝑎𝑑

𝑟𝑔𝑟𝑎𝑑

𝑒𝑛𝑑

𝐴4 𝐴19 𝐴4 𝐴15

𝐴19

𝑒𝑈𝑆 𝑟𝑝𝑟𝑒𝑠

replicated action sequence w/ instantiation

Previous Question Previous Answer Current Question

S

setA1

find(set, r1)

A4

graduateFrom

A17

find(set, r2)

A4

{e}

United States

A15

A16

isPresidentOf

A17

𝑆

𝐴1

𝐴1

𝐴4

copy

Guo et al. Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base. NIPS, 2018.

Page 27: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

单文档摘要 句子摘要

the Sri Lankan government on

Wednesday announced the closure of

government schools with immediate

effect as a military campaign against

Tamil separatists escalated in the north of

the country.

Sri Lanka closes schools as war

escalates

多文档摘要/新闻聚合

1. Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou and Tiejun Zhao. Neural Document Summarization

by Jointly Learning to Score and Select Sentences. Proc. ACL 2018.

2. Qingyu Zhou, Nan Yang, Furu Wei and Ming Zhou. Selective Encoding for Abstractive Sentence Summarization.

Proc. ACL 2017.

3. Pengjie Ren, Zhumin Chen, Zhaochun Ren, Furu Wei, Jun Ma and Maarten de Rijke. Leveraging Contextual Sentence

Relations for Extractive Summarization Using a Neural Attention Model. Proc. SIGIR 2017.

North Korea denies that US sanctions drove its denuclearization

pledge — …

North Korea has denied claims that US-led sanctions are what

encouraged it to seek international peace talks, and experts say it

was arrogance to ever think that was the case. According to the

state-run news agency …

Story from: Business Insider| Reuters| Voice of America

自动文摘

Page 28: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

事件对齐文档规划

和生成图片过滤

和选择

• 时间• 动作• 运动员

• 基于模板的方法• 基于神经网络的方法• 系统融合

• 图片过滤• 按时间轴对齐• 确定图片位置

……

结构化数据 非结构化数据 图片集 足球赛报

自动生成新闻报道

Page 29: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

神经机器翻译

聊天机器人

阅读理解

创作

Page 30: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

神经机器翻译

聊天机器人

阅读理解

创作

Page 31: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

神经机器翻译(NMT)

𝒆=(Economic, growth, has, slowed, down, in, recent, years,.)

𝒇=(

)

经济, 发展, 变, 慢, 了, .近, 几年,

Encoder

Decoder

-0.2 0.9 -0.1 0.50.7 0.0 0.2

Page 32: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Decoder Recurrent Neural Network

Encoder Recurrent Neural Network

神经机器翻译

𝒛𝒊

𝒖𝒊Wo

rdSa

mp

leR

ecu

rren

tSt

ate

𝒇=(

)

𝒔𝒊Sou

rce

Stat

e

𝒘𝒊Sou

rce

Wo

rdD

eco

der

En

cod

er

Sutskever et al., NIPS, 2014

经济, 发展, 变, 慢, 了, .近, 几年,

(1)

(2)(3)

𝒆=(Economic, growth, has, slowed, down, in, recent, years,.)

Page 33: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

注意力模型

𝒛𝒊

𝒖𝒊

𝒄𝒊

𝒉𝒋

Wo

rdSa

mp

leR

ecu

rren

tSt

ate

Inte

rnal

Se

man

tic

Sou

rce

Vec

tors

Attention Weight

En

cod

er

Atte

ntio

n

𝒇=(

)

Deco

der

近, 几年,

Left-to-Right

Right-to-Left

𝒆=(Economic, growth, has, slowed, down, in, recent, years,.)

Bahdanau et al., ICLR, 2015

Page 34: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

注意力模型

Bahdanau et al., ICLR, 2015

𝒛𝒊

𝒖𝒊

𝒄𝒊

𝒉𝒋

Wo

rdSa

mp

leR

ecu

rren

tSt

ate

Inte

rnal

Se

man

tic

Sou

rce

Vec

tors

En

cod

er

Atte

ntio

n

𝒇=(

)

Deco

der

发展, 变, 慢, 了, .近, 几年,

⨀ Attention Weight ⊕

经济,

𝒆=(Economic, growth, has, slowed, down, in, recent, years,.)

Page 35: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Microsoft Confidential

•• 自注意力层• 前馈非线性层

•• 自注意力层• 注意力到源语言层• 前馈非线性层

•• 对不同位置的不同信息建模

Input Embedding + Positional

Encoding

Output Embedding + Positional

Encoding

Multi-HeadSelf-Attention +

Residual

Encoder-Decoder Attention +Residual

Multi-Head Self-Attention +

Residual

Feed Forward + Residual

Softmax

Feed Forward +Residual

Encoder N-layers Decoder N-layers

基于Transformer的翻译模型

Page 36: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Microsoft Confidential

非线性层

自注意力层

非线性层

残差

残差

残差

残差

自注意力层

最终隐状态

Transformer编码器(两层为例)

Page 37: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Microsoft Confidential

自注意力层

残差

非线性层

残差

残差

自注意力到源

源语言隐状态

Transformer解码器(单层为例)

Page 38: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

反向翻译训练方法 (Sennrich, et al., ACL’2018)

𝑁𝑀𝑇0(𝑦 → 𝑥)

𝑁𝑀𝑇0 (𝑥 → 𝑦)

translate

𝑦 𝑡 , 𝑥′ 𝑡

𝑥 𝑛 , 𝑦 𝑛 𝑦 𝑡

双语数据 目标语言单语数据

Page 39: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Microsoft Confidential

对偶学习(He, et al, NIPS, 2016)

𝑥 𝑛 , 𝑦 𝑛

双语数据

𝑥 𝑠 𝑦 𝑡

源语言单语数据 目标语言单语数据

𝑁𝑀𝑇0 (𝑥 → 𝑦) 𝑁𝑀𝑇0(𝑦 → 𝑥)

train train

𝑥′ 𝑡𝑦′ 𝑠

𝑁𝑀𝑇0(𝑦 → 𝑥)

𝑥′ 𝑠

𝑁𝑀𝑇0 (𝑥 → 𝑦)

𝑦 𝑡

loss loss

Page 40: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

半监督联合学习模型 (Zhang, et al., AAAI’2018)

𝑁𝑀𝑇0 (𝑥 → 𝑦) 𝑁𝑀𝑇0(𝑦 → 𝑥)

𝑁𝑀𝑇1(𝑦 → 𝑥)𝑁𝑀𝑇1 (𝑥 → 𝑦)

translatetranslate

𝑥 𝑠 , 𝑦′ 𝑡 𝑦 𝑡 , 𝑥′ 𝑡

𝑁𝑀𝑇2(𝑦 → 𝑥)𝑁𝑀𝑇2 (𝑥 → 𝑦)

𝑥 𝑠 , 𝑦′′ 𝑡 𝑦 𝑡 , 𝑥′′ 𝑡

translate translate

Iteration 0

Iteration 1

Iteration 2

𝑥 𝑛 , 𝑦 𝑛𝑥 𝑠 𝑦 𝑡

双语数据源语言单语数据 目标语言单语数据

Page 41: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Microsoft Confidential

推敲网络 (Xia, et al, NIPS, 2017)

Page 42: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

𝑥 𝑛 , 𝑦′(𝑛)

translatetranslate

𝑥 𝑛 , 𝑦′(𝑛)

𝑥 𝑛 , 𝑦′′(𝑛) 𝑥 𝑛 , 𝑦′′(𝑛)

translate translate

Iteration 0

Iteration 1

Iteration 2

𝑥 𝑛 , 𝑦 𝑛𝑥 𝑛

双语数据双语中的源语言 双语中的源语言

𝑥 𝑛

𝑁𝑀𝑇0 (𝑥 → Ԧ𝑦) 𝑁𝑀𝑇0(𝑥 → ശ𝑦)

train train

𝑁𝑀𝑇1(𝑥 → ശ𝑦)𝑁𝑀𝑇1 (𝑥 → Ԧ𝑦)fine-tune fine-tune

𝑁𝑀𝑇2(𝑥 → ശ𝑦)𝑁𝑀𝑇2 (𝑥 → Ԧ𝑦)fine-tune fine-tune

双向翻译一致性解码(Zhang, et al, 2018)

Page 43: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Microsoft Confidential

率先在 WMT-2017 测试集达到人类水准

Achieving Human Parity on Automatic Chinese to English News Translation, Hany Hassan et al, https://arxiv.org/pdf/1803.05567.pdf

Microsoft Human-Parity MT systems

Page 44: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

24.0

24.5

25.0

25.5

26.0

26.5

27.0

27.5

28.0

28.5

26.38(Sogou, Ensembel)

BLEU (%)

25.57(Back Translation)

24.2(Transformer Baseline)

26.51(Dual Learning)

27.71(Joint Training)

26.91(Agreement

Regularization)

28.46(System Combination)

27.40(Deliberation Nets)

重要技术

Page 45: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

新闻句子翻译示例

Source input 他 的 职业 生涯 如 过 山 车 一般 。

NMT output It has been a rollercoaster ride .

Human reference His career is like a roller coaster.

Source input 有线索人士 请 拨打 旧金山 警察局 举报 电话 4 15- 575 - 44 44 。

NMT output For clues, call the San Francisco Police Department at 415-575 - 4444.

Human reference Anyone with information is asked to call the SFPD Tip Line at 415-575-4444 .

• Sampled from WMT2017 Chinese-English task

Source input 霍夫 施泰特尔 表示 : " 这将由检察官来确定 " 。

NMT output That 's what the prosecutor must determine , " said Hofstetter .

Human reference Mr Hoff Steitel said: "It will be up to the prosecutors to determine.

Page 46: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 47: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

手语翻译(与中科院合作)

父母生了我们三个孩子

父母 下 子女 三

父母 生了 我们 三个 孩子

长辈

爸妈

晚辈

孩子

三种

Page 48: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

神经机器翻译

聊天机器人

阅读理解

创作

Page 49: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

用户上下文 小冰上下文 用户当前输入

阅读

记忆

提炼

回复生成

用户画像

对话情感

Decoder

Attention Model

LSTM

Encoder

Page 50: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 51: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

微软小冰:已登陆中日美印尼五国

2014 2015 2016 2017 2017

China

小冰Japan

りんなUS

Zo

India

Ruuh

Indonesia

Rinna

Page 52: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

敦煌公众号客服系统(敦煌小冰)

Zhao Yan, et al, DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents, ACL 2016

Page 53: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

神经机器翻译

聊天机器人

阅读理解

创作

Page 54: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

机器阅读理解

Passage (P) Question (Q) Answer (A)+

Tesla later approached Morgan to ask for more funds to build a more powerful transmitter. When asked where all the money had gone, Tesla responded by saying that he was affected by thePanic of 1901, which he (Morgan) had caused. Morgan was shocked by the reminder of his part in the stock market crash and by Tesla’s breach of contract by asking for more funds.

P

Q A Panic of 1901On what did Tesla blame for the loss ofthe initial money?

Read a document (passage) and then answer questions about it

Page 55: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

queryanswer

passage

Dataset # of <question, passage>

pairs

Training 87,599

Dev 10,570

Test (not available

to participants)

> 10K

ImageNet style competition for machine readingcomprehension

Best Resource Paper in EMNLP 2016

Page 56: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

74.520

75.860

76.920

77.68877.845

78.70678.842 78.926

79.083

81.003

81.685

82.136

82.65082.440 82.5

MSRA

2016.12.6

MSRA

2017.1.20

MSRA

2017.3.7

MSRA

2017.7.2

iFLYTEK

2017.7.25

Salesforce

2017.8.16

Microsoft

Business AI

2017.9.20

MSRA

2017.10.13

iFLYTEK/HIT

2017.10.17

AI2

2017.11.17

MSRA

2017.11.21

MSRA

2017.12.18

MSRA

2018.1.3

Alibaba iDST

2018.1.5

iFLYTEK/HIT

2018.1.22

Human EM Performance: 82.304

Best System EM Scores on SQuAD Machine Reading Comprehension Dataset (Dec. 6, 2016-Jan. 26, 2018)

Surpass Human EM [2018.1.3]

阅读理解技术的进步(精准回答)

Page 57: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 58: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

神经机器翻译

聊天机器人

阅读理解

创作

Page 59: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

我们的计算机创作之旅

对联 字谜古典诗词 歌词 谱曲/音乐

2005 2012 2014 2016/2017 2016/2017

现代诗

2016/2017

Page 60: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

对联问题定义

FS: 海(hai) 阔(kuo) 凭(pin) 鱼(yu) 跃(yue)

sea wide allow fish jump

| | | | |

SS: 天(tian) 高(gao) 任(ren)

鸟(niao) 飞(fei)

sky high permit bird fly

Page 61: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

风 (wind)----------------水 (water)

吹 (blow) ---------------使 (make)

荞(buckwheat) -- ------舟 (ship)

动(wave)----------------流 (go)

桥 (bridge) -------------洲 (island)

未 (not) -----------------不 (not)

动(wave) ---------------流(go)

Repetition of

pronunciations(音韵联)

Page 62: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

有 (have)----------------- 缺 (lack)

子 (son) -------------------鱼 (fish)

有 (have) ------------------缺 (lack)

女 (daughter)-------------羊 (mutton)

方 (so) ---------------------敢 (dare)

称 (call) --------------------叫 (call)

好(good) -------------------鲜(fresh)

Decomposition of

characters (拆字联)

鲜鱼羊

好女子

Page 63: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

板桥(Banqiao)---------------- 东坡 (Dongpo)

造(produce) -------------------居 (live)

桥(bridge) ---------------------坡 (mountain)

板(board)----------------------东(east)

Person

name

(人名联)

Palindrome

(回文联)

•Banqiao(板桥) and Dongpo(东坡) are famous litterateurs

•Reading from top to down is identical to down to top

Page 64: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

Phrase-based log-linear model

SS output

Linguistic filters

FS input

N-best candidates

Ranking SVM model

Page 65: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

天 高sky high

山hill

天sky

高high

深deep

任permit

倚depend

虫insect

鸟bird

虎tiger

飞fly

舞dance

鸣tweedle

鸟 飞bird fly

山 高hill high

海 阔 凭 鱼 跃Sea wide allow fish jump

虎 啸tiger roar

山高任鸟飞天高任鸟鸣天高任鸟飞山高靠虎啸山高任虎啸山深任鸟飞天高任花香

……

SMT decoding Reranking

天高任鸟飞山高任鸟飞天高任鸟鸣天高任鸟舞山深任鸟飞山高任花香天高任花香

……

山高任鸟飞天高任鸟鸣天高任鸟飞山深任鸟飞天高任花香天高任鸟舞山高任花香

……

Linguisticfiltering

Page 66: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

特征

[ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ]

=

= +=+=

==1

1

1

1 11 )()(

),(log),()(

n

i

n

i

n

ij ji

jin

ij

ji

spsp

sspssISMI

Page 67: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 68: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 69: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

http://duilian.msra.cn

http://video.sina.com.cn/v/b/10937201-1452530713.html

Page 70: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

感归 春兴

从军北征 望洞庭

Page 71: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 72: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 73: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

回忆 Memories

把你写过的日记 The diary you wrote

埋藏在我心底 Buried in my heart

写下我所有的记忆 Write down all my memories

把回忆留给自己 Leave the memories to myself

把你写在我心里 Write you up in my heart

写满了岁月的痕迹 Filled with the traces of years

不是因为我知道 Not because I know

让我想念你的微笑 Let me miss your smile

让我听见你的心跳 Let me hear your heartbeat

自从遇见你那一秒 Since the second I met you

Page 74: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

《机智过人》是中央电视台综合频道与中国科学院联合主办的大型科学挑战节目。

国内首档聚焦智能科技的科学挑战类节目,是中国科学领域与传媒领域一次深入合作,更是全球顶尖人工智能研发精英和科技项目的巅峰盛典,标志着“科教兴国”战略的新高度。

Page 75: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API
Page 76: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

未来研究方向

• 通过用户画像实现个性化服务

• 通过可解释的学习洞察人工智能机理

• 通过知识与深度学习的结合提升学习效率

• 通过迁移学习实现领域自适应

• 通过强化学习实现自我演化

• 通过无监督学习充分利用未标注数据

Page 77: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API

未来5-10年, NLP技术走向成熟

• 口语机器翻译完全普及

• 自然语言会话(聊天、问答、对话)达到实用

• 智能客服+人工客服完美结合大大提高效率

• 自动写诗、新闻、小说、流行歌曲流行起来

• 推动语音助手、物联网、智能硬件、智能家居的

普及

• 与其他AI技术一起在金融、法律、教育、医疗等

垂直领域得到广泛应用

Page 78: 自然语言处理的进展...a bike A man is singing A panda is walking A woman is riding a horse A man is flying in a field 视觉 语音 语言 知识 搜索 微软认知服务:理解世界的智能API