27
Building Sentiment Resources On Chinese Reviews Zhang Haochen

Building Sentiment Resources On Chinese Reviews

  • Upload
    ganit

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Building Sentiment Resources On Chinese Reviews. Zhang Haochen. Self Introduction. Zhang Haochen ( 张昊辰 ) Ph.D student THUIR, Tsinghua University, China Football, Cooking. Overview. Introduction Related work Issue description Approach Prototype design Pre-processing - PowerPoint PPT Presentation

Citation preview

PowerPoint

Building Sentiment Resources On Chinese ReviewsZhang HaochenSelf IntroductionZhang Haochen ()Ph.D studentTHUIR, Tsinghua University, ChinaFootball, Cooking

OverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workOverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workIntroductionContent : Factual vs. SubjectiveUGC in Web 2.0Reviews on entities: product, movie, news Opinionated information: tweet, BBS, Application E-commercialPublic opinionRecommendationFact: The WING is a group about IR and NLP.Opinion: WING is such a fantastic group.5OverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workRelated workTypical tasks (Pang., 2008):Extraction: feature / aspect, opinionClassification: subjective, polaritySummarizationSearch and ComparisonApproaches:syntax-basedsupervised vs. unsupervisedbootstrap / propagationdifficult to build trainset7OverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workIssue descriptionI/OReviews of particular domain / products.Sentiment dictionary for the domain / products.CorpusChinese : Segmentation, POS taggingInternet : Spam, OOV, OralDifficultiesNoisesVarious patternsOral and OOVSolutionSyntax-based + OOVPruning

OverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workPrototype design

Pre-processingCross ValidationParsing tool 1Parsing tool 2/v /z /mq /n/v /n /mq /n/v /v /mq /n/z/n /d /a/v /n/d /v /n /v /v/n /n /n/d /v /n /v /vFilter noises of POS tagging resultsIf A is the subset of B, then take A For completely unmatched tags, annotate with unknown(z)Same segmentation, diff tag, annotated with unknown (z)Remove redundant sentencesRemove sentences with too many punctuations.Feature extractionSpecific patternsmore than nounverb, morpheme involvedwith frequency greater than given thresholdmore noisesVerbal stop wordsverb as part of phrase.verb as predicate

As part of feature wordsPredicate/n /v /d /adesign/v /n /afind/v /aoperation/v /d /afeel/v /v /nimage stabilization/n /v /veasy to use/V/V/V/V/V/V/V/V/VV + N/V|/NphotoreceptorN + V/N|/Vcolor renditionV + V/V|/Vlens hoodG + G/G|/GacutanceFeature extractionOOVcontext entropy gainwhether B should compose phrase with Amutual informationwhether AB should be composediteratively

Feature extractionCo-occurrence frequency with adjective wordsSectional threshold

Filter common words with background corpus (from SogouT, 20M size)

Opinion extractionSyntax-basedadjacent adjective wordsignore adverb wordsin specific windows.contribute about 70% of the final resultsOpinion extractionOOVassumption:F + adv. + O + func.adv. and func. setbetween F and Adj.between Adj and Punc.phrases between adv. and func.Pruningfrequencyco-occur with featuresadv. examplesmore and morealso verybe to usenot so much func. examplesetc.OOV examplesease to start upgood value for moneyPolarity classificationFeature-opinion vs. opinionhigh - ?high price - negativeInitial with polarity of words.HowNetTsinghuaNTU Sentiment DictionaryPolarity classificationClassify iterativelyClassify unlabeled FO pairs with adjacent FO pairs in one sentence

Classify FO pairs in the entire corpus

OverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workEvaluationReviews in domain of camera100, 000+ sentences769 feature phrases806 opinion phrases8640 feature-opinion pairs5745 positive315 neutral1948 negative632 unknown (treated as neutral in final results)Performancefeature extraction

opinion extraction

polarity classification

PrecisionRecallFmeasureMacro0.7850.5120.620Micro0.8330.8770.854accuracycoverage0.8810.594accuracy0.894OverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workConclusionChinese corpus is different from English corpus and is more troublesome.Syntax-based method is proved to be easy but efficient to explicit features and opinions on well-expressed corpus.Syntax-based method may perform badly on oral corpus.OverviewIntroductionRelated workIssue descriptionApproachPrototype designPre-processingFeature extractionOpinion extractionPolarity classificationEvaluationConclusionFuture workFuture workmore accurate and proper modelemploy and refer to some approaches of other AI research wordsapply learning methodsimplicit features and opinionscross different domainsQ & AThank you