23
学学学学学学 魏魏 Email: [email protected]

学期工作总结 魏巍 Email: [email protected]. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Embed Size (px)

Citation preview

Page 1: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

学期工作总结

魏巍Email: [email protected]

Page 2: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Outline

• Introduce to Opinion Mining

• My work:– Product feature words extraction– Product opinion words extraction– Opinion words orientation identification

• Conclusion and Future Work

Page 3: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

· Opinion mining

Page 4: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Introduction of Opinion Mining

• Why opinion mining?– User generated content or user generate media (more),

like bbs, blog etc.– It’s hard to get some person’s opinion towards a special

thing or topic.

• Opinion granularity(level):– Document level – Genre classification(subjective or

objective)– Sentence level– Feature(word) level– object have attributes(product)

Page 5: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Problem definition (feature-based opinion mining)

• Object:– product, person, entity or event, etc.

• Feature: explicit and implicit feature

– “The battery life of this camera is too short.”– “It’s really too large.(size)”

• Opinion: adjectives near the feature

– “The battery life of this camera is too short.”– “It’s really too large.(size)”

Page 6: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Feature-based opinion mining

• Be able to form a table as:

Att1 Att2 Att3 … … … Attn

Pos

Neg

neu

Feature

Re

view

of so

me

on

e

Page 7: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Objective用户评论:canon XX

R1:------------

R2:------------

R3:------------

R4:------------

这款相机的电池寿命很短。

这个相机镜头很大。

例子:

<电池寿命,短 >

<镜头 , 大 >

抽取 <feature, opinion>

<电池寿命,短 >--negative

<镜头 , 大 >---positive

Step2: Opinion Orientation Identify

特征 正面 负面 中性电池寿命 40% 30% 30%

镜头 60% 20% 20%

… … … …

Step1:

Feature & opinion extraction

Page 8: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

· My work

1.feature & opinion extraction

Page 9: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Feature and Opinion words extraction

Query product’s reviews

Relevant reviews

Irrelevant reviews

Rr

Ir

Qr

candidate

Feature extraction

Prune features

Syntax pattern extraction

Pattern matching

Opinion words extraction

general features:

Specific features:

Page 10: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

• N-gram method is used to extract noun single word and noun phrase. – a. “ 我 /r 觉得 /v 清洁 /a 效果 /n 显著 /a”

(“I feel the cleaning ability is remarkable ”)

b. “ 泡沫 /n 相当 /d 丰富 /a”

(“The foam is very abundant ”)

• In this step, we get a candidate feature list, for each unit in the list , we keep a data structure below:

Candidate feature generation

struct unit{

string word; int rel_num; //how many relevant reviews contain this word

int frq; int irrel_num;//how many irrelevant reviews contain this word

int sen_num; int op_sen_num; //how many sentences have adjectives near

int sen_id[MAX]; … this word

Page 11: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Prune & Divide the feature list

• Pruning rules:• rule 1:

– eliminate candidate features according to some patterns of the combinations of the POS tags. (eg: “ 效果 / 很 / 好” has tags of “n/d/a”)

• rule 2:– eliminate candidate features according to the word’s rel_num value and

irrel_num value.

– Divide the feature into general feature list and specific feature list.

• rule 3:– eliminate candidate features according to the proportions of sentences

containing the feature word that have an adjective nearby. (op_sen_num/sen_num)

Page 12: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Syntax pattern extraction & match

• We believe that consumers may has the same expression model on different product features. (syntax pattern)

• Eg: a. “泡沫 /n 相当 /d 丰富 /a”(“The foam is very abundant ”)

(feature + 相当 /d + adjective)

b. “ 很 /d 便宜 /a 的 /u 价格 /n”(“The price is very low”)

( 很 /d + adjective + 的 /u +feature)

· We keep a pattern list and use these patterns to find new

features.

Eg: b. “ 很 /d 便宜 /a 的 /u 价格 /n”(“The price is very low”)

( 很 /d + adjective + 的 /u +feature)

-> “ 很 /d 耐用 /a 的 /u 电池 /n” -> new feature “电池”

Page 13: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

• To avoid reviews only have opinion but not have explicit feature, we separate this two steps.– Implicit feature: “It’s really too large.”(size)

Opinion words extraction

Review

Features extraction

Opinion extraction

Merge

Page 14: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Experiment

• Fail to use Liu et[2004]’s method. – For each sentence, only keep the noun segments to generate feature

words.

• We use N-gram instead.

Pruning +Relevant/irrelevant

reviews

FOXS

Recall Prec Recall Prec Data1 0.84 0.75 0.92 0.76Data2 0.8 0.53 0.8 0.45Data3 0.73 0.55 0.93 0.6Data4 0.74 0.85 0.86 0.82Data5 0.59 0.81 0.82 0.82Avg. 0.74 0.69 0.87 0.69

recall precision

Data1 0.84 0.48

Data2 0.5 0.5

Data3 0.73 0.37

Data4 0.74 0.5

Data5 0.76 0.6

Avg. 0.71 0.49

Page 15: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Supplementary( 补充 )

• Try to tackle with implicit features.

Ri: 真是太贵了。

Rn: 感觉价格太贵了。

……

Review : implicit features:

{ 贵,大,高,… }

贵大高

价格,价钱

Talk about “ 价格”

Page 16: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

· My work

2.opinion orientation identification

Page 17: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Opinion orientation identification

• Methods in English language:– Based on WordNet– A seed list: positive and negative list– Context-dependent opinion: context rules

w1

w2

w3

w4

Seed list…

positive

w1

w2

w3

w4

negative

Page 18: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Opinion orientation identification(cont.)

• We can’t use WordNet in Chinese.

• What we can use now:• Positive sentiment word seed list (Pset) - (howNet gives)• Negative sentiment word seed list (Nset) - (howNet gives)• Context-related sentiment word list (CRset) (suppose we

have whole set)• Conjunction words set • Some heuristic rules (Liu et [2008])

– And , but, etc.

Page 19: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Opinion orientation identification(cont.)

S1: , 。 , 。 。,

f1f2

opw1 opw2

1. Check the opw’s type in every sentence. <good, excellent…>

<bad, dirty…>

<large, small…>

Positive list

Negative list

Context dependent list

<a, b, c …>

Unknown word list

2. For every <f1, opw1>, but opw1 in

<f1, opw1>

<fi, opwi>...

< fn, opwn>

Save <fi, opwi>…C-d

list

Unknown words

Add to p-list or n-list

3. 利用句法等规则判断

Page 20: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Opinion orientation identification(cont.)

• 现阶段得出的结果分析:• 效果不是很理想, Unknown opinion list 中

的未判断出极性的词还很多• 跟初始 seed 词表的规模有关• 继续…

Page 21: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

· Future work

Page 22: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Future work

• Implicit feature identification.

• Improving opinion orientation identification.

Page 23: 学期工作总结 魏巍 Email: zauri@ruc.edu.cn. Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion

Thank you! And Happy Dragon Boat Festival!

Q & A