Introduction to Natural Language Processing and Speech Computer
Science Research Practicum Fall 2012 Andrew Rosenberg
Slide 3
Artificial Intelligence AI is no longer a single subdiscipline
in computer science Natural Language Processing Speech/Spoken
Language Processing Robotics Logic/Planning Cognitive Radio Machine
Learning 1
Slide 4
Artificial Intelligence What is intelligence? How does computer
science make intelligent tools, systems, algorithms? Does computer
science theory contribute to the definition of intelligence? 2
Slide 5
Language and Speech What is the relationship between language
and intelligence/thought/cognition? 3
Slide 6
Language and Speech Most people consider language to be the
most direct access to cognition and thought. Language is core to
Artificial Intelligence 4
Slide 7
Natural Language Processing Information Retrieval (search)
Information Extraction Knowledge Base Population Summarization
Question Answering Named Entity Recognition Named Entity Linking,
Co-reference resolution Parsing Sentiment Analysis 5
Slide 8
Information Retrieval Input: Query Output: Relevant Documents
Simplest approach: Identify every document that contains the word
or words in the query What about related words? run is related to
running runs and marathon How do you rank for relevance? 6
Slide 9
Information Extraction Identify specific information from a
single document or set of documents. Who works for what
organization Who was born when? died when? Who did what to whom.
This is *very* complex. Domain specific systems are developed How
many different ways are there to say the same thing? 7
Slide 10
Named Entity Recognition and Linking Bo Obama is Fat. POTUS
says so. The President called his dog fat. Mr. Obama, speaking to
an interviewer said that The White House dog needs to go on a diet.
Recognize that Bo Obama POTUS, The President Mr. Obama, The White
House are all ENTITIES? How do you recognize that POTUS, The
President, Mr. Obama, him all refer to the same person? 8
Slide 11
Parsing Understanding grammatical structure from text.
Important step in some relation extraction, question answering,
etc. 9
Slide 12
Sentiment Analysis Can you tell the difference between a
positive review and a negative one? Some reviews come with labels
Some labels have no reviews Some reviews have no stars 10
Slide 13
Spoken Language Processing Automatic Speech Recognition Rich
Transcription Speaker Recognition Speech Synthesis Text
Normalization Discourse and Dialog Turn taking Emotion Recognition
11
Slide 14
Speech Recognition Converting speech to text. Acoustic Modeling
Speech to Phoneme Pronunciation Modeling How are words pronounced?
Language Modeling What sequences of words are most common? 12
Slide 15
13 Rich Transcription ALSO FROM NORTH STATION I THINK THE
ORANGE LINE RUNS BY THERE TOO SO YOU CAN ALSO CATCH THE ORANGE LINE
AND THEN INSTEAD OF TRANSFERRING UM I YOU KNOW THE MAP IS REALLY
OBVIOUS ABOUT THIS BUT INSTEAD OF TRANSFERRING AT PARK STREET YOU
CAN TRANSFER AT UH WHATS THE STATION NAME DOWNTOWN CROSSING UM AND
THATLL GET YOU BACK TO THE RED LINE JUST AS EASILY
Slide 16
14 Rich Transcription Also, from the North Station... (I think
the Orange Line runs by there too so you can also catch the Orange
Line... ) And then instead of transferring (um I- you know, the map
is really obvious about this but) Instead of transferring at Park
Street, you can transfer at (uh whats the station name) Downtown
Crossing and (um) thatll get you back to the Red Line just as
easily.
Slide 17
Speaker/Author Recognition What makes one speaker or author
distinguishable from another? Email hacks, Chat transcripts,
Anonymous authors. What are the acoustics which distinguish across
two speakers? Spectral Qualities Prosodic Qualities Lexical,
syntactic and content usage 15
Slide 18
Speech Synthesis Generating Speech from Text There are tools
like Festival, HTS and Mary TTS that make this relatively easy Unit
Selection Use a corpus of a single speaker and paste together small
slices of speech to make new words Watson
http://www.youtube.com/watch?v=WFR3lOm_xhEhttp://www.youtube.com/watch?v=WFR3lOm_xhE
Parametric Synthesis Learn the spectral shape of different speech
sounds, and synthesize them from oscillators and additive noise.
Mary TTS Web client http://mary.dfki.de:59125/ 16
Slide 19
Discourse and Dialog How do you accomplish some task through
discourse? Understanding the semantics of a user turn Generating an
appropriate prompt Dialog/Task planning. Semantic Frame filling.
17
Slide 20
Emotion Recognition What are the acoustic properties of emotion
expression? Loudness, speaking rate, pitch, hesitation etc. This
type of analysis can extend to other speaker states Intoxication
Sleepiness Age Gender Personality Factors Deception 18 Three
Hundred Twelve. Three Thousand Twelve.
Slide 21
Corpus Analysis A corpus is a body of linguistic material
Corpora (plural of corpus) are generally shared across research
groups Allow for reproducible findings Division of Labor Describing
phenomena is an important first step in most research. What is the
distribution of ratings? What are the correlations between features
and labels? Are there errors in the annotation? 19
Slide 22
Some famous corpora Penn Treebank Parse trees and part of
speech ACE and KBP Information Extraction Switchboard
Conversational telephone speech TIMIT Phonetic Transcription Boston
Radio News Corpus Prosodic Annotation 20
Slide 23
The standard approach 21 Identify labeled training data Decide
what to label What is a data point? Extract features based on the
entity Train a supervised classifier Machine Learning Evaluate
Cross-validation or a held-out test set.
Slide 24
How does machine learning fit in? 22 Automatically identifying
patterns in data Automatically making decisions based on data
Hypothesis: Data Learning Algorithm Behavior Data Programmer or
Expert Behavior
Slide 25
Challenges Conversational text Social Media: Facebook, Twitter,
reddit Email Chat/IM Spoken Dialog Systems Text Dialog Systems
Sentiment Analysis Reviews Collaborative Filtering Natural Language
Generation 23
Slide 26
Publicly available web-data Social Media twitter, google plus,
forums, etc. Reviews amazon, tripadvisor, etc. Wikipedia. Find
missing links in wikipedia Find potentially incorrect information
in wikipedia YouTube videos, soundcloud songs. Can you classify
topics? Music genres? 24
Slide 27
Use of web technologies The feedback loop. The use of the tool
provides information that can be used to improve the tool. The use
of the product provides training data. Which search results are
best. Which ads are useful Which recommendations are correct
25
Slide 28
Feedback in Google Rank the top hits in response to a query
When someone clicks on a link, boost its ranking/relevan ce Same
for ads UI/UX experimnets 26
Slide 29
Feedback in Amazon Try to give users an offer. If they take it
increase its value. 27
Slide 30
Feedback in Netflix Suggestions for people like you How do you
group people How do you group movies 28
Slide 31
Project ideas Look at the most recent conferences in NLP and
Speech ICASSP, Interspeech, ASRU ACL, EMNLP, NAACL-HLT, CoLING
Also, Journals Computational Linguistics Computer Speech and
Language IEEE transactions on Audio Speech and Language Processing
Consider real-world problems and applications 29