21
OM, CRF and KAF Rubén Izquierdo Beviá CLTL tutorial 11-July-2013

CLTL presentation: training an opinion mining system from KAF files using CRF

Embed Size (px)

Citation preview

Page 1: CLTL presentation: training an opinion mining system from KAF files using CRF

OM, CRF and KAF

Rubén Izquierdo BeviáCLTL tutorial11-July-2013

Page 2: CLTL presentation: training an opinion mining system from KAF files using CRF

OM, CRF and KAF1. OM

Develop an Opinion Miner tool

2. CRFUsing supervised Machine Learning (CRF)

3. KAF

Using KAF files as input

Page 3: CLTL presentation: training an opinion mining system from KAF files using CRF

Opinion Miner

EXPRESSION

TARGETHOLDER

Detecting and extracting fine grained opinions in text.

Opinion elements:Expression the actual subjective statementHolder mentions of whom the opinion is fromTarget what the opinion is about

My wife said that the room was really dirty.

Page 4: CLTL presentation: training an opinion mining system from KAF files using CRF

CRF Conditional Random Fields

Statistical modeling method Obtain conditional probably distribution over sequences Suitable for segmenting and labeling structured data (sequences,

trees…) Expressions, holders and targets are sequences

Many different packages: Mallet (http://mallet.cs.umass.edu) CRFSuite (http://www.chokkan.org/software/crfsuite)

Most used input format: Sequential data One token per line, represented by features

Page 5: CLTL presentation: training an opinion mining system from KAF files using CRF

KAFKAF modified for OpeNER

Different layers for different information

All the features are extracted from the KAF

No external linguistics processors are called

Page 6: CLTL presentation: training an opinion mining system from KAF files using CRF

First stepsDefine which will be our “output classes”

Target, Holder, Positive, Negative

Define which features will represent each tokenToken, lemma, pos, polarity, entity, polarity and

bi/tri-grams around

Study the input format of your selected CRF package (CRFSuite in my case)

Page 7: CLTL presentation: training an opinion mining system from KAF files using CRF

CRFSuite input format Input format of CRFSuite

One file with all data Sequences separated by empty lines One token per line with the format:

CLASS [TAB] FEATSCLASS O| B-class | I-class

O no class B-class the first element of a sequence of type “class” I-class element inside of a sequence of type “class”

FEATS feat1=val1 [TAB] feat2=val2 …

B-NP a=He b=reckons c=the d=He|reckons e=P

Page 8: CLTL presentation: training an opinion mining system from KAF files using CRF

Simple Example

NP NPVP

B-NP t=He p=PRP

pt=O nt=reckons

pp=O np=VBZ

B-VP t=reckons

p=VBZ

pt=He nt=the pp=PRP

np=DT

B-NP t=the p=DT pt=reckons

nt=current pp=VBZ

np=JJ

I-NP t=current p=JJ pt=the nt=account

pp=DT np=NN

I-NP t=account

p=NN pt=current

nt=O pp=JJ np=NN

We want to train a chunker (also sequences)

Tagged data

He/PRP reckons/VBZ the/DT current/JJ account/NN

Features per token: token (t), pos (p), previous token (pt), next token (nt), previous

pos (pp) next pos (np)

Page 9: CLTL presentation: training an opinion mining system from KAF files using CRF

My approach1. Obtain features for each single token

1. Input KAF

2. Output ‘TAB’ format

3. Our own customized feature extractor

2. Generate the final set of features (context)1. Input TAB format

2. Output ‘CRF’ format

3. One existing python script

Page 10: CLTL presentation: training an opinion mining system from KAF files using CRF

KAF feature extractorPython script that reads a KAF file and generates

the ‘TAB’ formatKafParser + Python script

Page 11: CLTL presentation: training an opinion mining system from KAF files using CRF

KAF feature extractorPython script that reads a KAF file and generates

the ‘TAB’ format

Page 12: CLTL presentation: training an opinion mining system from KAF files using CRF

KAF feature extractorPython script that reads a KAF file and generates

the ‘TAB’ formatKafParser + Python script

Page 13: CLTL presentation: training an opinion mining system from KAF files using CRF

Converting to CRFPython script:

Specify the format of your tab fileSpecify the “templates” (features) for each token

Page 14: CLTL presentation: training an opinion mining system from KAF files using CRF

Converting to CRFPython script:

Specify the format of your tab fileSpecify the “templates” for each token

Page 15: CLTL presentation: training an opinion mining system from KAF files using CRF

Converting to CRFPython script:

Specify the format of your tab fileSpecify the “templates” for each tokenRun the script using the TAB and generate OUT

Page 16: CLTL presentation: training an opinion mining system from KAF files using CRF

Extracting opinionsTraining

1. Get all KAF files with annotations

2. Obtain TAB file for each file

3. Convert to CRF for each file

4. Create a single training file with all CRF files

5. Train the MODEL with crfsuitecrfsuite learn –m my_model my_data.crf

Page 17: CLTL presentation: training an opinion mining system from KAF files using CRF

Extracting opinionsTagging one kaf file

1. Generate TAB fileOne line for each TOKEN (<wf>)

2. Convert to CRF

3. Tag with the trained modelcrfsuite tag –m my_model my_kaf.crf

4. Read and align output from crfsuite

Page 18: CLTL presentation: training an opinion mining system from KAF files using CRF

Extracting opinionsTagging one kaf file

Generate TAB fileOne line for each TOKEN (<wf>)

Convert to CRFTag with the trained model

crfsuite tag –m my_model my_kaf.crfRead and align output from crfsuite

Page 19: CLTL presentation: training an opinion mining system from KAF files using CRF

Extracting opinionsTagging one kaf file

Generate TAB fileOne line for each TOKEN (<wf>)

Convert to CRFTag with the trained model

crfsuite tag –m my_model my_kaf.crfRead and align output from crfsuiteGenerate the KAF layer

Page 20: CLTL presentation: training an opinion mining system from KAF files using CRF

Extracting opinionsTagging one kaf file

Generate TAB fileOne line for each TOKEN (<wf>)

Convert to CRFTag with the trained model

crfsuite tag –m my_model my_kaf.crfRead and align output from crfsuiteGenerate the KAF layer

Page 21: CLTL presentation: training an opinion mining system from KAF files using CRF

How to adapt this?1. Adapt the KAF feature extractor (+++)

2. Adapt the TAB-CRF converter (+)

3. Train your model (+)

4. Adapt the CRF-> KAF de-converter (++)