Upload
ruben-izquierdo-bevia
View
466
Download
2
Embed Size (px)
Citation preview
OM, CRF and KAF
Rubén Izquierdo BeviáCLTL tutorial11-July-2013
OM, CRF and KAF1. OM
Develop an Opinion Miner tool
2. CRFUsing supervised Machine Learning (CRF)
3. KAF
Using KAF files as input
Opinion Miner
EXPRESSION
TARGETHOLDER
Detecting and extracting fine grained opinions in text.
Opinion elements:Expression the actual subjective statementHolder mentions of whom the opinion is fromTarget what the opinion is about
My wife said that the room was really dirty.
CRF Conditional Random Fields
Statistical modeling method Obtain conditional probably distribution over sequences Suitable for segmenting and labeling structured data (sequences,
trees…) Expressions, holders and targets are sequences
Many different packages: Mallet (http://mallet.cs.umass.edu) CRFSuite (http://www.chokkan.org/software/crfsuite)
Most used input format: Sequential data One token per line, represented by features
KAFKAF modified for OpeNER
Different layers for different information
All the features are extracted from the KAF
No external linguistics processors are called
First stepsDefine which will be our “output classes”
Target, Holder, Positive, Negative
Define which features will represent each tokenToken, lemma, pos, polarity, entity, polarity and
bi/tri-grams around
Study the input format of your selected CRF package (CRFSuite in my case)
CRFSuite input format Input format of CRFSuite
One file with all data Sequences separated by empty lines One token per line with the format:
CLASS [TAB] FEATSCLASS O| B-class | I-class
O no class B-class the first element of a sequence of type “class” I-class element inside of a sequence of type “class”
FEATS feat1=val1 [TAB] feat2=val2 …
B-NP a=He b=reckons c=the d=He|reckons e=P
Simple Example
NP NPVP
B-NP t=He p=PRP
pt=O nt=reckons
pp=O np=VBZ
B-VP t=reckons
p=VBZ
pt=He nt=the pp=PRP
np=DT
B-NP t=the p=DT pt=reckons
nt=current pp=VBZ
np=JJ
I-NP t=current p=JJ pt=the nt=account
pp=DT np=NN
I-NP t=account
p=NN pt=current
nt=O pp=JJ np=NN
We want to train a chunker (also sequences)
Tagged data
He/PRP reckons/VBZ the/DT current/JJ account/NN
Features per token: token (t), pos (p), previous token (pt), next token (nt), previous
pos (pp) next pos (np)
My approach1. Obtain features for each single token
1. Input KAF
2. Output ‘TAB’ format
3. Our own customized feature extractor
2. Generate the final set of features (context)1. Input TAB format
2. Output ‘CRF’ format
3. One existing python script
KAF feature extractorPython script that reads a KAF file and generates
the ‘TAB’ formatKafParser + Python script
KAF feature extractorPython script that reads a KAF file and generates
the ‘TAB’ format
KAF feature extractorPython script that reads a KAF file and generates
the ‘TAB’ formatKafParser + Python script
Converting to CRFPython script:
Specify the format of your tab fileSpecify the “templates” (features) for each token
Converting to CRFPython script:
Specify the format of your tab fileSpecify the “templates” for each token
Converting to CRFPython script:
Specify the format of your tab fileSpecify the “templates” for each tokenRun the script using the TAB and generate OUT
Extracting opinionsTraining
1. Get all KAF files with annotations
2. Obtain TAB file for each file
3. Convert to CRF for each file
4. Create a single training file with all CRF files
5. Train the MODEL with crfsuitecrfsuite learn –m my_model my_data.crf
Extracting opinionsTagging one kaf file
1. Generate TAB fileOne line for each TOKEN (<wf>)
2. Convert to CRF
3. Tag with the trained modelcrfsuite tag –m my_model my_kaf.crf
4. Read and align output from crfsuite
Extracting opinionsTagging one kaf file
Generate TAB fileOne line for each TOKEN (<wf>)
Convert to CRFTag with the trained model
crfsuite tag –m my_model my_kaf.crfRead and align output from crfsuite
Extracting opinionsTagging one kaf file
Generate TAB fileOne line for each TOKEN (<wf>)
Convert to CRFTag with the trained model
crfsuite tag –m my_model my_kaf.crfRead and align output from crfsuiteGenerate the KAF layer
Extracting opinionsTagging one kaf file
Generate TAB fileOne line for each TOKEN (<wf>)
Convert to CRFTag with the trained model
crfsuite tag –m my_model my_kaf.crfRead and align output from crfsuiteGenerate the KAF layer
How to adapt this?1. Adapt the KAF feature extractor (+++)
2. Adapt the TAB-CRF converter (+)
3. Train your model (+)
4. Adapt the CRF-> KAF de-converter (++)