Segment-Level Sequence Tagging using Gated Recursive Semi...

Preview:

Citation preview

Segment-Level Sequence Tagging using GatedRecursive Semi-Markov Conditional Random Fields

Jingwei Zhuo1,2, Yong Cao2, Jun Zhu1, Bo Zhang1, ZaiqingNie2

1Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing2Microsoft Research, Beijing

June 16, 2016

Outline

• Motivations• Our Model• Experiments• Conclusion and Future Work

2 of 13

Motivations

• Sequence tagging problems◦ Given a sentence x = (x1, ..., xT ), to assign each word or each set

of words (i.e., segment) a tag.◦ E.g, [NP He] [VP reckons] [NP the current account deficit] [VP will

narrow] [PP to] [NP only 1.8 billion] [PP in] [NP September]• Word-level modeling: Conditional Random Fields (CRFs)◦ To represent tags as y = (y1, ..., yT )

p(y|x) = 1Z (x)

exp

(T∑

t=1

F (yt ,x) + A(yt−1, yt)

), (1)

F (yt ,x) = vTyt

f(yt ,x) + byt , (2)

◦ F (yt ,x): tag score; f(yt ,x): features.◦ f(yt ,x) can be hand-crafted or automatically extracted (neural

networks).3 of 13

Motivations

• Segment-level modeling: Semi-Markov CRFs (Semi-CRFs)◦ To represent tags as s = (s1, ..., sT ), where sj = 〈hj ,dj , yj〉

p(s|x) = 1Z (x)

exp

|s|∑j=1

F (sj ,x) + A(yj−1, yj)

, (3)

F (sj ,x) = vTyj

f(sj ,x) + byj , (4)

◦ Pros: Modeling segments directly.◦ Cons: Hard to design features; No automatic feature extractor.

• Can we combine the advantages of Semi-CRFs and neuralnetworks?◦ Fully leveraging segment-level information.◦ Features for Semi-CRFs can be extracted automatically.

4 of 13

Our Model

• Key idea: To extract features for all the segments by onepropagation using gated recursive convolutional neuralnetworks (grConvs) (Cho et al., 2014)

5 of 13

Our Model

• Local building block (Cho et al., 2014)

z(d)k = θL ◦ z(d−1)k + θR ◦ z(d−1)

k+1 + θM ◦ z(d)k , (5)

z(d)k = g(WLz(d−1)k + WRz(d−1)

k+1 + bW), (6)

◦ Intuition: sources for the construction of one segment z(d)k• prefix z(d−1)

k

• suffix z(d−1)k+1

• Interaction of both, i.e., z(d−1)k

6 of 13

Our Model

• Connection to Semi-CRFs◦ For segment sj = 〈hj ,dj , yj〉,

• feature f (sj , x) = z(d)hj

,

• tag score F (sj , x) = v(dj )yj

Tf(sj , x) + byj .

7 of 13

Experiments

• Settings◦ Datasets

• CONLL-2000 dataset (text chunking) and CONLL-2003 dataset(named entity recognition, NER)

◦ Compared models• Neural models: Senna (Collobert et al., 2011) and BI-LSTM-CRF

(Huang et al., 2015)• Non-neural models: JESS-CM (Suzuki and Isozaki, 2008), etc.

◦ Hyperparameters

Hyperparameters CONLL 2000 CONLL 2003Segment length 15 10Dropout 0.3 0.3Learning rate 0.3 0.3Epochs 15 20Minibatches 10 10Window width 2 2

8 of 13

Experiments

• Comparison with the state-of-the-art.

Models CONLL 2000 CONLL 2003

OursgrSemi-CRF (Random embeddings) 93.92 84.66grSemi-CRF (Senna embeddings) 95.01 89.44 (90.87)

Neural Models

Senna (Random embeddings) 90.33 81.47Senna (Senna embeddings) 94.32 88.67 (89.59)

BI-LSTM-CRF (Random embeddings) 94.13 84.26BI-LSTM-CRF (Senna embeddings) 94.46 88.83 (90.10)

Non-Neural Models

JESS-CM (Suzuki and Isozaki, 2008), 15M 94.67 89.36JESS-CM (Suzuki and Isozaki, 2008), 1B 95.15 89.92

Ratinov and Roth (2009) – 90.57Lin and Wu (2009) – 90.90

Passos et al. (2014) – 90.90

9 of 13

Experiments

• Impact of external information• Embeddings, Brown clusters and gazetteers

Input Features CONLL 2000 CONLL 2003None 93.92 84.66

Brown(NYT) 94.18 86.57Brown(RCV1) 94.05 88.22

Emb 94.73 88.12Gaz – 87.94

Emb + Brown(NYT) 95.01 88.86Emb + Brown(RCV1) 94.87 89.44

Emb + Gaz – 89.88Brown(NYT) + Gaz – 88.69

Brown(RCV1) + Gaz – 89.82All(NYT) – 90.00

All(RCV1) – 90.87

10 of 13

Experiments

• Visualization of segment-level features learnt on the CONLL2003 dataset.

QueriedFilippo Inzaghi AC Milan Central African Republic Asian Cup

SegmentsPierluigi Casiraghi FC Hansa From Central African Republic Scottish CupFabrizio Ravanelli SC Freiburg Southeast Asian Nations European Cup

Bogdan Stelea FC Cologne In Central African Republic African CupNearest Francesco Totti Aston Villa The Central African Republic World Cup

Neighbour Predrag Mijatovic Red Cross South African Breweries UEFA CupResults Fausto Pizzi Yasuto Honda Of Southeast Asian Nations Europoean Cup

Pierre Laigle NAC Breda New South Wales Asian GamesPavel Nedved La Plagne Central African Republic . Europa Cup

Anghel Iordanescu Sporting Gijon Papua New Guinea National LeagueZeljko Petrovic NEC Nijmegen Central Africa F.A. Cup

11 of 13

Conclusions and Future Work

• We proposed grSemi-CRF, a neural network-based model forsegment-level sequence tagging that◦ models segments explicitly as Semi-CRFs,◦ extracts segment-level features automatically,◦ and achieves high performance compared to CRF models

• Future Work◦ Exploring better way to utilize unlabelled data, e.g., learning

segment-level embeddings in an unsupervised way◦ Extending to other sequence tagging tasks

12 of 13

Questions and Answers

• Motivations◦ Segment-level sequence tagging◦ CRFs and Semi-CRFs

• Our Model◦ Feature extractor◦ Connection to Semi-CRFs

• Experiments◦ Comparison with state-of-the-art◦ Impact of external information◦ Visualization of learnt features

• Conclusion and Future Work

13 of 13

Recommended