73
复旦大学大数据学院 School of Data Science, Fudan University DATA130006 Text Management and Analysis Sentiment Analysis 魏忠钰 October 11 th , 2017 Adapted from Stanford U124

复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

  • Upload
    ngodan

  • View
    239

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

复旦大学大数据学院School of Data Science, Fudan University

DATA130006 Text Management and Analysis

Sentiment Analysis魏忠钰

October11th,2017

AdaptedfromStanfordU124

Page 2: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Outline

§ Whatissentimentanalysis?

Page 3: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Positive or negative movie review?

§ unbelievablydisappointing§ Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists

§ thisisthegreatestscrewballcomedyeverfilmed§ Itwaspathetic.Theworstpartaboutitwastheboxingscenes.

Page 4: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Google Product Search

Page 5: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Bing Shopping

Page 6: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Twitter sentiment versus Gallup Poll of Consumer Confidence

BrendanO'Connor,Ramnath Balasubramanyan,BryanR.Routledge,andNoahA.Smith.2010.FromTweetstoPolls:LinkingTextSentimenttoPublicOpinionTimeSeries.InICWSM-2010

Page 7: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Twitter sentiment:

JohanBollen,Huina Mao,Xiaojun Zeng.2011.Twittermoodpredictsthestockmarket,JournalofComputationalScience2:1,1-8.10.1016/j.jocs.2010.12.007.

Page 8: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Target Sentiment on Twitter

§ TwitterSentimentApp

§ AlecGo,Richa Bhayani,LeiHuang.2009.TwitterSentimentClassificationusingDistantSupervision

Page 9: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Sentiment analysis has many other names

§Opinionextraction§Opinionmining§ Sentimentmining§ Subjectivityanalysis

Page 10: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Why sentiment analysis?

§Movie:isthisreviewpositiveornegative?§ Products:whatdopeoplethinkaboutthenewiPhone?§ Publicsentiment:howisconsumerconfidence?Isdespairincreasing?

§ Politics:whatdopeoplethinkaboutthiscandidateorissue?§ Prediction:predictelectionoutcomesormarkettrendsfromsentiment

Page 11: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Scherer Typology of Affective States

§ Emotion:brieforganicallysynchronized…evaluationofamajorevent§ angry,sad,joyful,fearful,ashamed,proud,elated

§ Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling§ cheerful,gloomy,irritable,listless,depressed,buoyant

§ Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction§ friendly,flirtatious,distant,cold,warm,supportive,contemptuous

§ Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons§ liking,loving,hating,valuing,desiring

§ Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies§ nervous,anxious,reckless,morose,hostile,jealous

Page 12: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Scherer Typology of Affective States

§ Emotion:brieforganicallysynchronized…evaluationofamajorevent§ angry,sad,joyful,fearful,ashamed,proud,elated

§ Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling§ cheerful,gloomy,irritable,listless,depressed,buoyant

§ Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction§ friendly,flirtatious,distant,cold,warm,supportive,contemptuous

§ Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons§ liking,loving,hating,valuing,desiring

§ Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies§ nervous,anxious,reckless,morose,hostile,jealous

Page 13: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Sentiment Analysis

§ Sentimentanalysisisthedetectionofattitudes“enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons”1. Holder(source)ofattitude2. Target(aspect)ofattitude3. Typeofattitude

§ Fromasetoftypes§ Like,love,hate,value,desire, etc.

§ Or(morecommonly)simpleweightedpolarity:§ positive,negative,neutral,togetherwithstrength

4. Text containingtheattitude§ Sentenceorentiredocument

Page 14: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Sentiment Analysis

§Simplesttask:§ Istheattitudeofthistextpositiveornegative?

§Morecomplex:§ Ranktheattitudeofthistextfrom1to5

§Advanced:§ Detectthetarget,source,orcomplexattitudetypes

Page 15: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Sentiment Analysis

§Simplesttask:§ Istheattitudeofthistextpositiveornegative?

§Morecomplex:§ Ranktheattitudeofthistextfrom1to5

§Advanced:§ Detectthetarget,source,orcomplexattitudetypes

Page 16: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Outline

§ Whatissentimentanalysis?§ ABaselineAlgorithm

Page 17: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Sentiment Classification in Movie Reviews

BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.ACL,271-278

§ Polaritydetection:§ IsanIMDBmoviereviewpositiveornegative?

§ Data:PolarityData2.0:§ http://www.cs.cornell.edu/people/pabo/movie-review-data

Page 18: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

IMDB data in the Pang and Lee database

when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]

whenhan sologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.

cool.

_october sky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]

“snakeeyes”isthemostaggravatingkindofmovie:thekindthatshowssomuchpotentialthenbecomesunbelievablydisappointing.it’snotjustbecausethisisabriandepalma film,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolas cageandsincehegivesabrauvara performance,thisfilmishardlyworthhistalents.

✓ ✗

Page 19: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Baseline Algorithm (adapted from Pang and Lee)

§ Tokenization§ FeatureExtraction§ Classificationusingdifferentclassifiers

§ NaïveBayes§ MaxEnt§ SVM

Page 20: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Sentiment Tokenization Issues

§ DealwithHTMLandXMLmarkup§ Twittermark-up(names,hashtags)§ Capitalization(preserveforwordsinallcaps)

§ Phonenumbers,dates§ Emoticons§Usefulcode:

§ ChristopherPottssentimenttokenizer§ BrendanO’Connortwittertokenizer

[<>]? # optional hat/brow[:;=8] # eyes[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\-o\*\']? # optional nose

Pottsemoticons

Page 21: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Extracting Features for Sentiment Classification

§Howtohandlenegation§ I didn’t like this movievs

§ I really like this movie

§Whichwordstouse?§ Onlyadjectives§ Allwords

§ Allwordsturnsouttoworkbetter,atleastonthisdata

Page 22: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Negation

Das,Sanjiv andMikeChen.2001.Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssociationAnnualConference(APFA).BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.

AddNOT_toeverywordbetweennegationandfollowingpunctuation:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Page 23: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Reminder: Naïve Bayes

P̂(w | c) = count(w,c)+1count(c)+ V

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Page 24: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Binarized (Boolean feature) Multinomial Naïve Bayes

§ Intuition:§ Forsentiment(andprobablyforothertextclassificationdomains)

§ Wordoccurrencemaymattermorethanwordfrequency§ Theoccurrenceofthewordfantastic tellsusalot§ Thefactthatitoccurs5timesmaynottellusmuchmore.

§ BooleanMultinomialNaïve Bayes§ Clipsallthewordcountsineachdocumentat1

Page 25: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Boolean Multinomial Naïve Bayes: Learning

§ CalculateP(cj) terms§ Foreachcj inC do

docsj¬ alldocswithclass=cj

P(wk | cj )←nk +α

n+α |Vocabulary |

• Textj¬ singledoccontainingalldocsj• For eachwordwk inVocabulary

nk¬ #ofoccurrencesofwk inTextj

§ Fromtrainingcorpus,extractVocabulary

§ CalculateP(wk | cj) terms§ Removeduplicatesineachdoc:

§ Foreachwordtypewindocj§ Retainonlyasingleinstanceofw

Page 26: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Boolean Multinomial Naïve Bayes on a test document d

§ Firstremoveallduplicatewordsfromd§ ThencomputeNBusingthesameequation:

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Page 27: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Normal vs. Boolean Multinomial NB

Normal Doc Words ClassTraining 1 Chinese BeijingChinese c

2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseChineseChineseTokyo Japan ?

Boolean Doc Words ClassTraining 1 Chinese Beijing c

2 ChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseTokyo Japan ?

Page 28: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Binarized (Boolean feature) Multinomial Naïve Bayes

§ Binaryseemstoworkbetterthanfullwordcounts

§Otherpossibility:log(freq(w))

B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.V.Metsis,I.Androutsopoulos,G.Paliouras.2006.SpamFilteringwithNaiveBayes– WhichNaiveBayes?CEAS2006- ThirdConferenceonEmailandAnti-Spam.K.-M.Schneider.2004.OnwordfrequencyinformationandnegativeevidenceinNaiveBayestextclassification.ICANLP,474-485.JDRennie,LShih,JTeevan.2003.Tacklingthepoorassumptionsofnaivebayes textclassifiers.ICML2003

Page 29: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Cross-Validation

§ Breakupdatainto10folds§ (Equalpositiveandnegativeinsideeachfold?)

§ Foreachfold§ Choosethefoldasatemporarytestset

§ Trainon9folds,computeperformanceonthetestfold

§ Reportaverageperformanceofthe10runs

TrainingTest

Test

Test

Test

Test

Training

Training Training

Training

Training

Iteration

1

2

3

4

5

Page 30: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Thwarted Expectations and Ordering Effects

§ “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcan’tholdup.”

§WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourne isnotsogoodeither,Iwassurprised.

Page 31: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Outline

§ Whatissentimentanalysis?§ ABaselineAlgorithm§ SentimentLexicons

Page 32: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

The General Inquirer

PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress

§ Homepage:http://www.wjh.harvard.edu/~inquirer§ ListofCategories:

http://www.wjh.harvard.edu/~inquirer/homecat.htm§ Spreadsheet:

http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls

§ Categories:§ Positiv (1915words)andNegativ (2291words)§ StrongvsWeak,ActivevsPassive,OverstatedversusUnderstated§ Pleasure,Pain,Virtue,Vice,Motivation,CognitiveOrientation,etc

§ FreeforResearchUse

Page 33: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

LIWC (Linguistic Inquiry and Word Count)

Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX

§Homepage:http://www.liwc.net/§ 2300words,>70classes§ AffectiveProcesses

§ negativeemotion(bad,weird,hate,problem,tough)§ positiveemotion(love,nice,sweet)

§ CognitiveProcesses§ Tentative(maybe,perhaps,guess)

§ Pronouns,Negation(no,never),Quantifiers(few,many)

§ Notfree

Page 34: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

MPQA Subjectivity Cues Lexicon

Theresa Wilson,Janyce Wiebe,andPaulHoffmann(2005).Recognizing Contextual Polarity inPhrase-LevelSentiment Analysis.Proc.ofHLT-EMNLP-2005.

Riloff andWiebe (2003).Learningextractionpatternsforsubjectiveexpressions.EMNLP-2003.

§Homepage:http://www.cs.pitt.edu/mpqa/subj_lexicon.html

§ 6885wordsfrom8221lemmas§ 2718positive§ 4912negative

§ Eachwordannotatedforintensity(strong,weak)§GNUGPL

Page 35: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Bing Liu Opinion Lexicon

Minqing HuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.

• BingLiu'sPageonOpinionMining• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar

• 6786words• 2006positive• 4783negative

Page 36: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

SentiWordNet

StefanoBaccianella,AndreaEsuli,andFabrizioSebastiani.2010SENTIWORDNET3.0:AnEnhanced Lexical ResourceforSentiment AnalysisandOpinionMining.LREC-2010

§ Homepage:http://sentiwordnet.isti.cnr.it/§ AllWordNetsynsets automaticallyannotatedfordegreesof

positivity,negativity,andneutrality/objectiveness§ [estimable(J,3)]“maybecomputedorestimated”

Pos 0 Neg 0 Obj 1 § [estimable(J,1)]“deservingofrespectorhighregard”

Pos .75 Neg 0 Obj .25

Page 37: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Disagreements between polarity lexicons

ChristopherPotts,SentimentTutorial,2011

OpinionLexicon

GeneralInquirer

SentiWordNet LIWC

MPQA 33/5402 (0.6%) 49/2867(2%) 1127/4214(27%) 12/363(3%)

OpinionLexicon 32/2411 (1%) 1004/3994 (25%) 9/403(2%)

GeneralInquirer 520/2306(23%) 1/204 (0.5%)

SentiWordNet 174/694(25%)

Page 38: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Analyzing the polarity of each word in IMDB

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

§Howlikelyiseachwordtoappearineachsentimentclass?

§ Count(“bad”)in1-star,2-star,3-star,etc.§ Butcan’tuserawcounts:§ Instead,likelihood:

§Makethemcomparablebetweenwords§ Scaledlikelihood:

P(w | c) = f (w,c)f (w,c)

w∈c∑

P(w | c)P(w)

Page 39: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Analyzing the polarity of each word in IMDB

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

●●

●●

●●

●●

POS good (883,417 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.10.12

● ● ● ● ●●

amazing (103,509 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.17

0.28

●●

●●

great (648,110 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.11

0.17

● ● ● ●●

awesome (47,142 tokens)

1 2 3 4 5 6 7 8 9 10

0.05

0.16

0.27

Pr(c|w)

Rating

● ● ● ●

●● ●

NEG good (20,447 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.1

0.16● ●

●●

●● ● ●

depress(ed/ing) (18,498 tokens)

1 2 3 4 5 6 7 8 9 10

0.080.110.13

●● ●

bad (368,273 tokens)

1 2 3 4 5 6 7 8 9 10

0.04

0.12

0.21

●● ● ●

terrible (55,492 tokens)

1 2 3 4 5 6 7 8 9 10

0.03

0.16

0.28

Pr(c|w)

Rating

Scaled

likelihoo

dP(w|c)/P(w)

Scaled

likelihoo

dP(w|c)/P(w)

Page 40: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Other sentiment feature: Logical negation

Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.

§ Islogicalnegation(no,not)associatedwithnegativesentiment?

§ Pottsexperiment:§ Countnegation(not,n’t,no,never)inonlinereviews§ Regressagainstthereviewrating

Page 41: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Outline

§ Whatissentimentanalysis?§ ABaselineAlgorithm§ SentimentLexicons§ LearningSentimentLexicons

Page 42: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Semi-supervised learning of lexicons

§Useasmallamountofinformation§ Afewlabeledexamples§ Afewhand-builtpatterns

§ Tobootstrapalexicon

Page 43: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Hatzivassiloglou and McKeown intuition for identifying word polarity

Vasileios Hatzivassiloglou andKathleenR.McKeown.1997.PredictingtheSemanticOrientationofAdjectives.ACL,174–181

§ Adjectivesconjoinedby“and”havesamepolarity§ Fairand legitimate,corruptand brutal§ *fairand brutal,*corruptand legitimate

§ Adjectivesconjoinedby“but”donot§ fairbutbrutal

Page 44: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Hatzivassiloglou & McKeown 1997: Step 1

§ Labelseedsetof1336adjectives(all>20in21millionwordWSJcorpus)§ 657positive

§ adequatecentralcleverfamousintelligentremarkablereputedsensitiveslenderthriving…

§ 679negative§ contagiousdrunkenignorantlankylistlessprimitivestridenttroublesomeunresolvedunsuspecting…

Page 45: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Hatzivassiloglou & McKeown 1997: Step 2

§ Expandseedsettoconjoinedadjectives

nice, helpful

nice, classy

Page 46: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Hatzivassiloglou & McKeown 1997:Step 3

§ Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resultingingraph:

classy

nice

helpful

fair

brutal

irrationalcorrupt

Page 47: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Hatzivassiloglou & McKeown 1997: Step 4

§ Clusteringforpartitioningthegraphintotwo

classy

nice

helpful

fair

brutal

irrationalcorrupt

+ -

Page 48: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Output polarity lexicon

§ Positive§ bolddecisivedisturbinggenerousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…

§Negative§ ambiguouscautiouscynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…

Page 49: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Output polarity lexicon

§ Positive§ bolddecisivedisturbing generousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrange talentedvigorouswitty…

§Negative§ ambiguouscautious cynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasant recklessriskyselfishtediousunsupportedvulnerablewasteful…

Page 50: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Turney Algorithm

Turney (2002):ThumbsUporThumbsDown?SemanticOrientationAppliedtoUnsupervisedClassificationofReviews

1. Extractaphrasallexiconfromreviews2. Learnpolarityofeachphrase3. Rateareviewbytheaveragepolarityofitsphrases

Page 51: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Extract two-word phrases with adjectives

FirstWord SecondWord ThirdWord (notextracted)

JJ NNorNNS anythingRB, RBR,RBS JJ NotNNnorNNSJJ JJ NotNNorNNSNNorNNS JJ NorNNnor NNSRB,RBR,orRBS VB,VBD,VBN,VBG anything

Page 52: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

How to measure polarity of a phrase?

§ Positivephrasesco-occurmorewith“excellent”§Negativephrasesco-occurmorewith“poor”§ Buthowtomeasureco-occurrence?

Page 53: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Pointwise Mutual Information

§Mutualinformationbetween2randomvariablesXandY

§ Pointwisemutualinformation:§ Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

I(X,Y ) = P(x, y)y∑

x∑ log2

P(x,y)P(x)P(y)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 54: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Pointwise Mutual Information

§ Pointwisemutualinformation:§ Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

§ PMIbetweentwowords:§ Howmuchmoredotwowordsco-occurthaniftheywereindependent?

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 55: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

How to Estimate Pointwise Mutual Information

§Querysearchengine(Altavista)§P(word)estimatedbyhits(word)/N§P(word1,word2)byhits(word1 NEAR word2)/N2

PMI(word1,word2 ) = log2hits(word1 NEAR word2)hits(word1)hits(word2)

Page 56: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Does phrase appear more with “poor” or “excellent”?

Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")

= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!

"#

$

%&

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")− log2

hits(phrase NEAR "poor")hits(phrase)hits("poor")

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")hits(phrase)hits("poor")

hits(phrase NEAR "poor")

Page 57: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Phrases from a thumbs-up review

Phrase POStags Polarityonline service JJNN 2.8

onlineexperience JJNN 2.3

directdeposit JJNN 1.3

localbranch JJNN 0.42…

lowfees JJNNS 0.33

trueservice JJNN -0.73

other bank JJNN -0.85

inconveniently located JJNN -1.5

Average 0.32

Page 58: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Phrases from a thumbs-down review

Phrase POStags Polaritydirectdeposits JJNNS 5.8

onlineweb JJNN 1.9

veryhandy RB JJ 1.4…

virtual monopoly JJNN -2.0

lesserevil RBRJJ -2.3

otherproblems JJNNS -2.8

low funds JJNNS -6.8

unethical practices JJNNS -8.5

Average -1.2

Page 59: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Results of Turney algorithm

§ 410reviewsfromEpinions§ 170(41%)negative§ 240(59%)positive

§Majorityclassbaseline:59%§ Turney algorithm:74%

§ Phrasesratherthanwords§ Learnsdomain-specificinformation

Page 60: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Using WordNet to learn polarity

§WordNet:onlinethesaurus(coveredinlaterlecture).§ Createpositive(“good”)andnegativeseed-words(“terrible”)

§ FindSynonymsandAntonyms§ PositiveSet:Addsynonymsofpositivewords(“well”)andantonymsofnegativewords

§ NegativeSet:Addsynonymsofnegativewords(“awful”)andantonymsofpositivewords(”evil”)

§ Repeat,followingchainsofsynonyms§ Filter

S.M.KimandE.Hovy.2004.Determiningthesentimentofopinions.COLING2004M.HuandB.Liu.Miningandsummarizingcustomerreviews.InProceedingsofKDD,2004

Page 61: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Summary on Learning Lexicons

§ Advantages:§ Canbedomain-specific§ Canbemorerobust(morewords)

§ Intuition§ Startwithaseedsetofwords(‘good’,‘poor’)§ Findotherwordsthathavesimilarpolarity:

§ Using“and”and“but”§ Usingwordsthatoccurnearbyinthesamedocument§ UsingWordNetsynonymsandantonyms

§ Useseedsandsemi-supervisedlearningtoinducelexicons

Page 62: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Outline

§ Whatissentimentanalysis?§ ABaselineAlgorithm§ SentimentLexicons§ LearningSentimentLexicons§OtherSentimentTasks

Page 63: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Finding sentiment of a sentence

§ Importantforfindingaspectsorattributes§ Targetofsentiment

§ The food was great but the service was awful

Page 64: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Finding aspect/attribute/target of sentiment

M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop.

§ Frequentphrases+rules§ Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)

§ Filterbyruleslike“occursrightaftersentimentword”§ “…great fish tacos”meansfish tacos alikelyaspect

Casino casino,buffet,pool,resort,bedsChildren’s Barber haircut,job,experience,kidsGreekRestaurant food,wine,service,appetizer,lambDepartmentStore selection,department,sales,shop,clothing

Page 65: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Finding aspect/attribute/target of sentiment

§ Theaspectnamemaynotbeinthesentence§ Forrestaurants/hotels,aspectsarewell-understood§ Supervisedclassification

§ Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect§ food,décor,service,value,NONE

§ Trainaclassifiertoassignanaspecttoa sentence§ “Giventhissentence,istheaspectfood,décor,service,value,or NONE”

Page 66: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Putting it all together: Finding sentiment for aspects

S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop

ReviewsFinalSummary

Sentences&Phrases

Sentences&Phrases

Sentences&Phrases

TextExtractor

SentimentClassifier

AspectExtractor

Aggregator

Page 67: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Results of Blair-Goldensohn et al. method

Rooms(3/5stars,41comments)(+) Theroomwascleanandeverythingworkedfine– eventhewaterpressure...

(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...

(-)…theworsthotelIhadeverstayedat...Service(3/5stars,31comments)

(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...

(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...

(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.

Dining(3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.thefoodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay

Page 68: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Baseline methods assume classes have equal frequencies!

§ Ifnotbalanced(commonintherealworld)§ can’tuseaccuraciesasanevaluation§ needtouseF-scores

§ Severeimbalancing alsocandegradeclassifierperformance

§ Twocommonsolutions:§ Resamplingintraining

§ Randomundersampling§ Cost-sensitivelearning

§ PenalizeSVMmoreformisclassificationoftherarething

Page 69: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

How to deal with 7 stars?

BoPangandLillianLee.2005.Seeingstars:Exploitingclassrelationshipsforsentimentcategorizationwithrespecttoratingscales.ACL,115–124

1. Maptobinary2. Uselinearorordinalregression• Orspecializedmodelslikemetriclabeling

Page 70: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Summary on Sentiment

§Generallymodeledasclassificationorregressiontask§ predictabinaryorordinallabel

§ Features:§ Negationisimportant§ Usingallwords(innaïvebayes)workswellforsometasks§ Findingsubsetsofwordsmayhelpinothertasks

§ Hand-builtpolaritylexicons§ Useseedsandsemi-supervisedlearningtoinducelexicons

Page 71: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Scherer Typology of Affective States

§ Emotion:brieforganicallysynchronized…evaluationofamajorevent§ angry,sad,joyful,fearful,ashamed,proud,elated

§ Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling§ cheerful,gloomy,irritable,listless,depressed,buoyant

§ Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction§ friendly,flirtatious,distant,cold,warm,supportive,contemptuous

§ Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons§ liking,loving,hating,valuing,desiring

§ Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies§ nervous,anxious,reckless,morose,hostile,jealous

Page 72: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Computational work on other affective states

§ Emotion:§ Detectingannoyedcallerstodialoguesystem§ Detectingconfused/frustratedversusconfidentstudents

§Mood:§ Findingtraumatizedordepressedwriters

§ Interpersonalstances:§ Detectionofflirtationorfriendlinessinconversations

§ Personalitytraits:§ Detectionofextroverts

Page 73: 复旦大学大数据学院 SentimentAnalysis · 2017-10-16 · §Opinion mining §Sentiment mining ... Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Detection of Friendliness

Ranganath,Jurafsky,McFarland

§ Friendlyspeakersusecollaborativeconversationalstyle§ Laughter§ Lessuseofnegativeemotionalwords§ Moresympathy

§ That’s too bad I’m sorry to hear that

§ Moreagreement§ I think so too

§ Lesshedges§ kind of sort of a little …