Upload
ngodan
View
239
Download
2
Embed Size (px)
Citation preview
复旦大学大数据学院School of Data Science, Fudan University
DATA130006 Text Management and Analysis
Sentiment Analysis魏忠钰
October11th,2017
AdaptedfromStanfordU124
Outline
§ Whatissentimentanalysis?
Positive or negative movie review?
§ unbelievablydisappointing§ Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists
§ thisisthegreatestscrewballcomedyeverfilmed§ Itwaspathetic.Theworstpartaboutitwastheboxingscenes.
Google Product Search
Bing Shopping
Twitter sentiment versus Gallup Poll of Consumer Confidence
BrendanO'Connor,Ramnath Balasubramanyan,BryanR.Routledge,andNoahA.Smith.2010.FromTweetstoPolls:LinkingTextSentimenttoPublicOpinionTimeSeries.InICWSM-2010
Twitter sentiment:
JohanBollen,Huina Mao,Xiaojun Zeng.2011.Twittermoodpredictsthestockmarket,JournalofComputationalScience2:1,1-8.10.1016/j.jocs.2010.12.007.
Target Sentiment on Twitter
§ TwitterSentimentApp
§ AlecGo,Richa Bhayani,LeiHuang.2009.TwitterSentimentClassificationusingDistantSupervision
Sentiment analysis has many other names
§Opinionextraction§Opinionmining§ Sentimentmining§ Subjectivityanalysis
Why sentiment analysis?
§Movie:isthisreviewpositiveornegative?§ Products:whatdopeoplethinkaboutthenewiPhone?§ Publicsentiment:howisconsumerconfidence?Isdespairincreasing?
§ Politics:whatdopeoplethinkaboutthiscandidateorissue?§ Prediction:predictelectionoutcomesormarkettrendsfromsentiment
Scherer Typology of Affective States
§ Emotion:brieforganicallysynchronized…evaluationofamajorevent§ angry,sad,joyful,fearful,ashamed,proud,elated
§ Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling§ cheerful,gloomy,irritable,listless,depressed,buoyant
§ Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction§ friendly,flirtatious,distant,cold,warm,supportive,contemptuous
§ Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons§ liking,loving,hating,valuing,desiring
§ Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies§ nervous,anxious,reckless,morose,hostile,jealous
Scherer Typology of Affective States
§ Emotion:brieforganicallysynchronized…evaluationofamajorevent§ angry,sad,joyful,fearful,ashamed,proud,elated
§ Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling§ cheerful,gloomy,irritable,listless,depressed,buoyant
§ Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction§ friendly,flirtatious,distant,cold,warm,supportive,contemptuous
§ Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons§ liking,loving,hating,valuing,desiring
§ Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies§ nervous,anxious,reckless,morose,hostile,jealous
Sentiment Analysis
§ Sentimentanalysisisthedetectionofattitudes“enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons”1. Holder(source)ofattitude2. Target(aspect)ofattitude3. Typeofattitude
§ Fromasetoftypes§ Like,love,hate,value,desire, etc.
§ Or(morecommonly)simpleweightedpolarity:§ positive,negative,neutral,togetherwithstrength
4. Text containingtheattitude§ Sentenceorentiredocument
Sentiment Analysis
§Simplesttask:§ Istheattitudeofthistextpositiveornegative?
§Morecomplex:§ Ranktheattitudeofthistextfrom1to5
§Advanced:§ Detectthetarget,source,orcomplexattitudetypes
Sentiment Analysis
§Simplesttask:§ Istheattitudeofthistextpositiveornegative?
§Morecomplex:§ Ranktheattitudeofthistextfrom1to5
§Advanced:§ Detectthetarget,source,orcomplexattitudetypes
Outline
§ Whatissentimentanalysis?§ ABaselineAlgorithm
Sentiment Classification in Movie Reviews
BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.ACL,271-278
§ Polaritydetection:§ IsanIMDBmoviereviewpositiveornegative?
§ Data:PolarityData2.0:§ http://www.cs.cornell.edu/people/pabo/movie-review-data
IMDB data in the Pang and Lee database
when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]
whenhan sologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.
cool.
_october sky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]
“snakeeyes”isthemostaggravatingkindofmovie:thekindthatshowssomuchpotentialthenbecomesunbelievablydisappointing.it’snotjustbecausethisisabriandepalma film,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolas cageandsincehegivesabrauvara performance,thisfilmishardlyworthhistalents.
✓ ✗
Baseline Algorithm (adapted from Pang and Lee)
§ Tokenization§ FeatureExtraction§ Classificationusingdifferentclassifiers
§ NaïveBayes§ MaxEnt§ SVM
Sentiment Tokenization Issues
§ DealwithHTMLandXMLmarkup§ Twittermark-up(names,hashtags)§ Capitalization(preserveforwordsinallcaps)
§ Phonenumbers,dates§ Emoticons§Usefulcode:
§ ChristopherPottssentimenttokenizer§ BrendanO’Connortwittertokenizer
[<>]? # optional hat/brow[:;=8] # eyes[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\-o\*\']? # optional nose
Pottsemoticons
Extracting Features for Sentiment Classification
§Howtohandlenegation§ I didn’t like this movievs
§ I really like this movie
§Whichwordstouse?§ Onlyadjectives§ Allwords
§ Allwordsturnsouttoworkbetter,atleastonthisdata
Negation
Das,Sanjiv andMikeChen.2001.Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssociationAnnualConference(APFA).BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.
AddNOT_toeverywordbetweennegationandfollowingpunctuation:
didn’t like this movie , but I
didn’t NOT_like NOT_this NOT_movie but I
Reminder: Naïve Bayes
P̂(w | c) = count(w,c)+1count(c)+ V
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
Binarized (Boolean feature) Multinomial Naïve Bayes
§ Intuition:§ Forsentiment(andprobablyforothertextclassificationdomains)
§ Wordoccurrencemaymattermorethanwordfrequency§ Theoccurrenceofthewordfantastic tellsusalot§ Thefactthatitoccurs5timesmaynottellusmuchmore.
§ BooleanMultinomialNaïve Bayes§ Clipsallthewordcountsineachdocumentat1
Boolean Multinomial Naïve Bayes: Learning
§ CalculateP(cj) terms§ Foreachcj inC do
docsj¬ alldocswithclass=cj
P(wk | cj )←nk +α
n+α |Vocabulary |
• Textj¬ singledoccontainingalldocsj• For eachwordwk inVocabulary
nk¬ #ofoccurrencesofwk inTextj
§ Fromtrainingcorpus,extractVocabulary
§ CalculateP(wk | cj) terms§ Removeduplicatesineachdoc:
§ Foreachwordtypewindocj§ Retainonlyasingleinstanceofw
Boolean Multinomial Naïve Bayes on a test document d
§ Firstremoveallduplicatewordsfromd§ ThencomputeNBusingthesameequation:
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
Normal vs. Boolean Multinomial NB
Normal Doc Words ClassTraining 1 Chinese BeijingChinese c
2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j
Test 5 ChineseChineseChineseTokyo Japan ?
Boolean Doc Words ClassTraining 1 Chinese Beijing c
2 ChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j
Test 5 ChineseTokyo Japan ?
Binarized (Boolean feature) Multinomial Naïve Bayes
§ Binaryseemstoworkbetterthanfullwordcounts
§Otherpossibility:log(freq(w))
B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.V.Metsis,I.Androutsopoulos,G.Paliouras.2006.SpamFilteringwithNaiveBayes– WhichNaiveBayes?CEAS2006- ThirdConferenceonEmailandAnti-Spam.K.-M.Schneider.2004.OnwordfrequencyinformationandnegativeevidenceinNaiveBayestextclassification.ICANLP,474-485.JDRennie,LShih,JTeevan.2003.Tacklingthepoorassumptionsofnaivebayes textclassifiers.ICML2003
Cross-Validation
§ Breakupdatainto10folds§ (Equalpositiveandnegativeinsideeachfold?)
§ Foreachfold§ Choosethefoldasatemporarytestset
§ Trainon9folds,computeperformanceonthetestfold
§ Reportaverageperformanceofthe10runs
TrainingTest
Test
Test
Test
Test
Training
Training Training
Training
Training
Iteration
1
2
3
4
5
Thwarted Expectations and Ordering Effects
§ “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcan’tholdup.”
§WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourne isnotsogoodeither,Iwassurprised.
Outline
§ Whatissentimentanalysis?§ ABaselineAlgorithm§ SentimentLexicons
The General Inquirer
PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress
§ Homepage:http://www.wjh.harvard.edu/~inquirer§ ListofCategories:
http://www.wjh.harvard.edu/~inquirer/homecat.htm§ Spreadsheet:
http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls
§ Categories:§ Positiv (1915words)andNegativ (2291words)§ StrongvsWeak,ActivevsPassive,OverstatedversusUnderstated§ Pleasure,Pain,Virtue,Vice,Motivation,CognitiveOrientation,etc
§ FreeforResearchUse
LIWC (Linguistic Inquiry and Word Count)
Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX
§Homepage:http://www.liwc.net/§ 2300words,>70classes§ AffectiveProcesses
§ negativeemotion(bad,weird,hate,problem,tough)§ positiveemotion(love,nice,sweet)
§ CognitiveProcesses§ Tentative(maybe,perhaps,guess)
§ Pronouns,Negation(no,never),Quantifiers(few,many)
§ Notfree
MPQA Subjectivity Cues Lexicon
Theresa Wilson,Janyce Wiebe,andPaulHoffmann(2005).Recognizing Contextual Polarity inPhrase-LevelSentiment Analysis.Proc.ofHLT-EMNLP-2005.
Riloff andWiebe (2003).Learningextractionpatternsforsubjectiveexpressions.EMNLP-2003.
§Homepage:http://www.cs.pitt.edu/mpqa/subj_lexicon.html
§ 6885wordsfrom8221lemmas§ 2718positive§ 4912negative
§ Eachwordannotatedforintensity(strong,weak)§GNUGPL
Bing Liu Opinion Lexicon
Minqing HuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.
• BingLiu'sPageonOpinionMining• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
• 6786words• 2006positive• 4783negative
SentiWordNet
StefanoBaccianella,AndreaEsuli,andFabrizioSebastiani.2010SENTIWORDNET3.0:AnEnhanced Lexical ResourceforSentiment AnalysisandOpinionMining.LREC-2010
§ Homepage:http://sentiwordnet.isti.cnr.it/§ AllWordNetsynsets automaticallyannotatedfordegreesof
positivity,negativity,andneutrality/objectiveness§ [estimable(J,3)]“maybecomputedorestimated”
Pos 0 Neg 0 Obj 1 § [estimable(J,1)]“deservingofrespectorhighregard”
Pos .75 Neg 0 Obj .25
Disagreements between polarity lexicons
ChristopherPotts,SentimentTutorial,2011
OpinionLexicon
GeneralInquirer
SentiWordNet LIWC
MPQA 33/5402 (0.6%) 49/2867(2%) 1127/4214(27%) 12/363(3%)
OpinionLexicon 32/2411 (1%) 1004/3994 (25%) 9/403(2%)
GeneralInquirer 520/2306(23%) 1/204 (0.5%)
SentiWordNet 174/694(25%)
Analyzing the polarity of each word in IMDB
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
§Howlikelyiseachwordtoappearineachsentimentclass?
§ Count(“bad”)in1-star,2-star,3-star,etc.§ Butcan’tuserawcounts:§ Instead,likelihood:
§Makethemcomparablebetweenwords§ Scaledlikelihood:
P(w | c) = f (w,c)f (w,c)
w∈c∑
P(w | c)P(w)
Analyzing the polarity of each word in IMDB
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
●●
●●
●●
●
●
●●
POS good (883,417 tokens)
1 2 3 4 5 6 7 8 9 10
0.080.10.12
● ● ● ● ●●
●
●
●
●
amazing (103,509 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.17
0.28
●●
●●
●
●
●
●
●
●
great (648,110 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.11
0.17
● ● ● ●●
●
●
●
●
●
awesome (47,142 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.16
0.27
Pr(c|w)
Rating
● ● ● ●
●
●
●
●● ●
NEG good (20,447 tokens)
1 2 3 4 5 6 7 8 9 10
0.03
0.1
0.16● ●
●●
●
●● ● ●
●
depress(ed/ing) (18,498 tokens)
1 2 3 4 5 6 7 8 9 10
0.080.110.13
●
●
●
●
●
●
●
●● ●
bad (368,273 tokens)
1 2 3 4 5 6 7 8 9 10
0.04
0.12
0.21
●
●
●
●
●
●
●● ● ●
terrible (55,492 tokens)
1 2 3 4 5 6 7 8 9 10
0.03
0.16
0.28
Pr(c|w)
Rating
Scaled
likelihoo
dP(w|c)/P(w)
Scaled
likelihoo
dP(w|c)/P(w)
Other sentiment feature: Logical negation
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
§ Islogicalnegation(no,not)associatedwithnegativesentiment?
§ Pottsexperiment:§ Countnegation(not,n’t,no,never)inonlinereviews§ Regressagainstthereviewrating
Outline
§ Whatissentimentanalysis?§ ABaselineAlgorithm§ SentimentLexicons§ LearningSentimentLexicons
Semi-supervised learning of lexicons
§Useasmallamountofinformation§ Afewlabeledexamples§ Afewhand-builtpatterns
§ Tobootstrapalexicon
Hatzivassiloglou and McKeown intuition for identifying word polarity
Vasileios Hatzivassiloglou andKathleenR.McKeown.1997.PredictingtheSemanticOrientationofAdjectives.ACL,174–181
§ Adjectivesconjoinedby“and”havesamepolarity§ Fairand legitimate,corruptand brutal§ *fairand brutal,*corruptand legitimate
§ Adjectivesconjoinedby“but”donot§ fairbutbrutal
Hatzivassiloglou & McKeown 1997: Step 1
§ Labelseedsetof1336adjectives(all>20in21millionwordWSJcorpus)§ 657positive
§ adequatecentralcleverfamousintelligentremarkablereputedsensitiveslenderthriving…
§ 679negative§ contagiousdrunkenignorantlankylistlessprimitivestridenttroublesomeunresolvedunsuspecting…
Hatzivassiloglou & McKeown 1997: Step 2
§ Expandseedsettoconjoinedadjectives
nice, helpful
nice, classy
Hatzivassiloglou & McKeown 1997:Step 3
§ Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resultingingraph:
classy
nice
helpful
fair
brutal
irrationalcorrupt
Hatzivassiloglou & McKeown 1997: Step 4
§ Clusteringforpartitioningthegraphintotwo
classy
nice
helpful
fair
brutal
irrationalcorrupt
+ -
Output polarity lexicon
§ Positive§ bolddecisivedisturbinggenerousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…
§Negative§ ambiguouscautiouscynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…
Output polarity lexicon
§ Positive§ bolddecisivedisturbing generousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrange talentedvigorouswitty…
§Negative§ ambiguouscautious cynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasant recklessriskyselfishtediousunsupportedvulnerablewasteful…
Turney Algorithm
Turney (2002):ThumbsUporThumbsDown?SemanticOrientationAppliedtoUnsupervisedClassificationofReviews
1. Extractaphrasallexiconfromreviews2. Learnpolarityofeachphrase3. Rateareviewbytheaveragepolarityofitsphrases
Extract two-word phrases with adjectives
FirstWord SecondWord ThirdWord (notextracted)
JJ NNorNNS anythingRB, RBR,RBS JJ NotNNnorNNSJJ JJ NotNNorNNSNNorNNS JJ NorNNnor NNSRB,RBR,orRBS VB,VBD,VBN,VBG anything
How to measure polarity of a phrase?
§ Positivephrasesco-occurmorewith“excellent”§Negativephrasesco-occurmorewith“poor”§ Buthowtomeasureco-occurrence?
Pointwise Mutual Information
§Mutualinformationbetween2randomvariablesXandY
§ Pointwisemutualinformation:§ Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
I(X,Y ) = P(x, y)y∑
x∑ log2
P(x,y)P(x)P(y)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
Pointwise Mutual Information
§ Pointwisemutualinformation:§ Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
§ PMIbetweentwowords:§ Howmuchmoredotwowordsco-occurthaniftheywereindependent?
PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
How to Estimate Pointwise Mutual Information
§Querysearchengine(Altavista)§P(word)estimatedbyhits(word)/N§P(word1,word2)byhits(word1 NEAR word2)/N2
PMI(word1,word2 ) = log2hits(word1 NEAR word2)hits(word1)hits(word2)
Does phrase appear more with “poor” or “excellent”?
Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")
= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!
"#
$
%&
= log2hits(phrase NEAR "excellent")
hits(phrase)hits("excellent")− log2
hits(phrase NEAR "poor")hits(phrase)hits("poor")
= log2hits(phrase NEAR "excellent")
hits(phrase)hits("excellent")hits(phrase)hits("poor")
hits(phrase NEAR "poor")
Phrases from a thumbs-up review
Phrase POStags Polarityonline service JJNN 2.8
onlineexperience JJNN 2.3
directdeposit JJNN 1.3
localbranch JJNN 0.42…
lowfees JJNNS 0.33
trueservice JJNN -0.73
other bank JJNN -0.85
inconveniently located JJNN -1.5
Average 0.32
Phrases from a thumbs-down review
Phrase POStags Polaritydirectdeposits JJNNS 5.8
onlineweb JJNN 1.9
veryhandy RB JJ 1.4…
virtual monopoly JJNN -2.0
lesserevil RBRJJ -2.3
otherproblems JJNNS -2.8
low funds JJNNS -6.8
unethical practices JJNNS -8.5
Average -1.2
Results of Turney algorithm
§ 410reviewsfromEpinions§ 170(41%)negative§ 240(59%)positive
§Majorityclassbaseline:59%§ Turney algorithm:74%
§ Phrasesratherthanwords§ Learnsdomain-specificinformation
Using WordNet to learn polarity
§WordNet:onlinethesaurus(coveredinlaterlecture).§ Createpositive(“good”)andnegativeseed-words(“terrible”)
§ FindSynonymsandAntonyms§ PositiveSet:Addsynonymsofpositivewords(“well”)andantonymsofnegativewords
§ NegativeSet:Addsynonymsofnegativewords(“awful”)andantonymsofpositivewords(”evil”)
§ Repeat,followingchainsofsynonyms§ Filter
S.M.KimandE.Hovy.2004.Determiningthesentimentofopinions.COLING2004M.HuandB.Liu.Miningandsummarizingcustomerreviews.InProceedingsofKDD,2004
Summary on Learning Lexicons
§ Advantages:§ Canbedomain-specific§ Canbemorerobust(morewords)
§ Intuition§ Startwithaseedsetofwords(‘good’,‘poor’)§ Findotherwordsthathavesimilarpolarity:
§ Using“and”and“but”§ Usingwordsthatoccurnearbyinthesamedocument§ UsingWordNetsynonymsandantonyms
§ Useseedsandsemi-supervisedlearningtoinducelexicons
Outline
§ Whatissentimentanalysis?§ ABaselineAlgorithm§ SentimentLexicons§ LearningSentimentLexicons§OtherSentimentTasks
Finding sentiment of a sentence
§ Importantforfindingaspectsorattributes§ Targetofsentiment
§ The food was great but the service was awful
Finding aspect/attribute/target of sentiment
M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop.
§ Frequentphrases+rules§ Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)
§ Filterbyruleslike“occursrightaftersentimentword”§ “…great fish tacos”meansfish tacos alikelyaspect
Casino casino,buffet,pool,resort,bedsChildren’s Barber haircut,job,experience,kidsGreekRestaurant food,wine,service,appetizer,lambDepartmentStore selection,department,sales,shop,clothing
Finding aspect/attribute/target of sentiment
§ Theaspectnamemaynotbeinthesentence§ Forrestaurants/hotels,aspectsarewell-understood§ Supervisedclassification
§ Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect§ food,décor,service,value,NONE
§ Trainaclassifiertoassignanaspecttoa sentence§ “Giventhissentence,istheaspectfood,décor,service,value,or NONE”
Putting it all together: Finding sentiment for aspects
S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop
ReviewsFinalSummary
Sentences&Phrases
Sentences&Phrases
Sentences&Phrases
TextExtractor
SentimentClassifier
AspectExtractor
Aggregator
Results of Blair-Goldensohn et al. method
Rooms(3/5stars,41comments)(+) Theroomwascleanandeverythingworkedfine– eventhewaterpressure...
(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...
(-)…theworsthotelIhadeverstayedat...Service(3/5stars,31comments)
(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...
(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...
(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.
Dining(3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.thefoodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay
Baseline methods assume classes have equal frequencies!
§ Ifnotbalanced(commonintherealworld)§ can’tuseaccuraciesasanevaluation§ needtouseF-scores
§ Severeimbalancing alsocandegradeclassifierperformance
§ Twocommonsolutions:§ Resamplingintraining
§ Randomundersampling§ Cost-sensitivelearning
§ PenalizeSVMmoreformisclassificationoftherarething
How to deal with 7 stars?
BoPangandLillianLee.2005.Seeingstars:Exploitingclassrelationshipsforsentimentcategorizationwithrespecttoratingscales.ACL,115–124
1. Maptobinary2. Uselinearorordinalregression• Orspecializedmodelslikemetriclabeling
Summary on Sentiment
§Generallymodeledasclassificationorregressiontask§ predictabinaryorordinallabel
§ Features:§ Negationisimportant§ Usingallwords(innaïvebayes)workswellforsometasks§ Findingsubsetsofwordsmayhelpinothertasks
§ Hand-builtpolaritylexicons§ Useseedsandsemi-supervisedlearningtoinducelexicons
Scherer Typology of Affective States
§ Emotion:brieforganicallysynchronized…evaluationofamajorevent§ angry,sad,joyful,fearful,ashamed,proud,elated
§ Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling§ cheerful,gloomy,irritable,listless,depressed,buoyant
§ Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction§ friendly,flirtatious,distant,cold,warm,supportive,contemptuous
§ Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons§ liking,loving,hating,valuing,desiring
§ Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies§ nervous,anxious,reckless,morose,hostile,jealous
Computational work on other affective states
§ Emotion:§ Detectingannoyedcallerstodialoguesystem§ Detectingconfused/frustratedversusconfidentstudents
§Mood:§ Findingtraumatizedordepressedwriters
§ Interpersonalstances:§ Detectionofflirtationorfriendlinessinconversations
§ Personalitytraits:§ Detectionofextroverts
Detection of Friendliness
Ranganath,Jurafsky,McFarland
§ Friendlyspeakersusecollaborativeconversationalstyle§ Laughter§ Lessuseofnegativeemotionalwords§ Moresympathy
§ That’s too bad I’m sorry to hear that
§ Moreagreement§ I think so too
§ Lesshedges§ kind of sort of a little …