Upload
dothuan
View
245
Download
0
Embed Size (px)
Citation preview
Segment-Level Sequence Tagging using GatedRecursive Semi-Markov Conditional Random Fields
Jingwei Zhuo1,2, Yong Cao2, Jun Zhu1, Bo Zhang1, ZaiqingNie2
1Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing2Microsoft Research, Beijing
June 16, 2016
Outline
• Motivations• Our Model• Experiments• Conclusion and Future Work
2 of 13
Motivations
• Sequence tagging problems◦ Given a sentence x = (x1, ..., xT ), to assign each word or each set
of words (i.e., segment) a tag.◦ E.g, [NP He] [VP reckons] [NP the current account deficit] [VP will
narrow] [PP to] [NP only 1.8 billion] [PP in] [NP September]• Word-level modeling: Conditional Random Fields (CRFs)◦ To represent tags as y = (y1, ..., yT )
p(y|x) = 1Z (x)
exp
(T∑
t=1
F (yt ,x) + A(yt−1, yt)
), (1)
F (yt ,x) = vTyt
f(yt ,x) + byt , (2)
◦ F (yt ,x): tag score; f(yt ,x): features.◦ f(yt ,x) can be hand-crafted or automatically extracted (neural
networks).3 of 13
Motivations
• Segment-level modeling: Semi-Markov CRFs (Semi-CRFs)◦ To represent tags as s = (s1, ..., sT ), where sj = 〈hj ,dj , yj〉
p(s|x) = 1Z (x)
exp
|s|∑j=1
F (sj ,x) + A(yj−1, yj)
, (3)
F (sj ,x) = vTyj
f(sj ,x) + byj , (4)
◦ Pros: Modeling segments directly.◦ Cons: Hard to design features; No automatic feature extractor.
• Can we combine the advantages of Semi-CRFs and neuralnetworks?◦ Fully leveraging segment-level information.◦ Features for Semi-CRFs can be extracted automatically.
4 of 13
Our Model
• Key idea: To extract features for all the segments by onepropagation using gated recursive convolutional neuralnetworks (grConvs) (Cho et al., 2014)
5 of 13
Our Model
• Local building block (Cho et al., 2014)
z(d)k = θL ◦ z(d−1)k + θR ◦ z(d−1)
k+1 + θM ◦ z(d)k , (5)
z(d)k = g(WLz(d−1)k + WRz(d−1)
k+1 + bW), (6)
◦ Intuition: sources for the construction of one segment z(d)k• prefix z(d−1)
k
• suffix z(d−1)k+1
• Interaction of both, i.e., z(d−1)k
6 of 13
Our Model
• Connection to Semi-CRFs◦ For segment sj = 〈hj ,dj , yj〉,
• feature f (sj , x) = z(d)hj
,
• tag score F (sj , x) = v(dj )yj
Tf(sj , x) + byj .
7 of 13
Experiments
• Settings◦ Datasets
• CONLL-2000 dataset (text chunking) and CONLL-2003 dataset(named entity recognition, NER)
◦ Compared models• Neural models: Senna (Collobert et al., 2011) and BI-LSTM-CRF
(Huang et al., 2015)• Non-neural models: JESS-CM (Suzuki and Isozaki, 2008), etc.
◦ Hyperparameters
Hyperparameters CONLL 2000 CONLL 2003Segment length 15 10Dropout 0.3 0.3Learning rate 0.3 0.3Epochs 15 20Minibatches 10 10Window width 2 2
8 of 13
Experiments
• Comparison with the state-of-the-art.
Models CONLL 2000 CONLL 2003
OursgrSemi-CRF (Random embeddings) 93.92 84.66grSemi-CRF (Senna embeddings) 95.01 89.44 (90.87)
Neural Models
Senna (Random embeddings) 90.33 81.47Senna (Senna embeddings) 94.32 88.67 (89.59)
BI-LSTM-CRF (Random embeddings) 94.13 84.26BI-LSTM-CRF (Senna embeddings) 94.46 88.83 (90.10)
Non-Neural Models
JESS-CM (Suzuki and Isozaki, 2008), 15M 94.67 89.36JESS-CM (Suzuki and Isozaki, 2008), 1B 95.15 89.92
Ratinov and Roth (2009) – 90.57Lin and Wu (2009) – 90.90
Passos et al. (2014) – 90.90
9 of 13
Experiments
• Impact of external information• Embeddings, Brown clusters and gazetteers
Input Features CONLL 2000 CONLL 2003None 93.92 84.66
Brown(NYT) 94.18 86.57Brown(RCV1) 94.05 88.22
Emb 94.73 88.12Gaz – 87.94
Emb + Brown(NYT) 95.01 88.86Emb + Brown(RCV1) 94.87 89.44
Emb + Gaz – 89.88Brown(NYT) + Gaz – 88.69
Brown(RCV1) + Gaz – 89.82All(NYT) – 90.00
All(RCV1) – 90.87
10 of 13
Experiments
• Visualization of segment-level features learnt on the CONLL2003 dataset.
QueriedFilippo Inzaghi AC Milan Central African Republic Asian Cup
SegmentsPierluigi Casiraghi FC Hansa From Central African Republic Scottish CupFabrizio Ravanelli SC Freiburg Southeast Asian Nations European Cup
Bogdan Stelea FC Cologne In Central African Republic African CupNearest Francesco Totti Aston Villa The Central African Republic World Cup
Neighbour Predrag Mijatovic Red Cross South African Breweries UEFA CupResults Fausto Pizzi Yasuto Honda Of Southeast Asian Nations Europoean Cup
Pierre Laigle NAC Breda New South Wales Asian GamesPavel Nedved La Plagne Central African Republic . Europa Cup
Anghel Iordanescu Sporting Gijon Papua New Guinea National LeagueZeljko Petrovic NEC Nijmegen Central Africa F.A. Cup
11 of 13
Conclusions and Future Work
• We proposed grSemi-CRF, a neural network-based model forsegment-level sequence tagging that◦ models segments explicitly as Semi-CRFs,◦ extracts segment-level features automatically,◦ and achieves high performance compared to CRF models
• Future Work◦ Exploring better way to utilize unlabelled data, e.g., learning
segment-level embeddings in an unsupervised way◦ Extending to other sequence tagging tasks
12 of 13
Questions and Answers
• Motivations◦ Segment-level sequence tagging◦ CRFs and Semi-CRFs
• Our Model◦ Feature extractor◦ Connection to Semi-CRFs
• Experiments◦ Comparison with state-of-the-art◦ Impact of external information◦ Visualization of learnt features
• Conclusion and Future Work
13 of 13