13
Segment-Level Sequence Tagging using Gated Recursive Semi-Markov Conditional Random Fields Jingwei Zhuo 1,2 , Yong Cao 2 , Jun Zhu 1 , Bo Zhang 1 , Zaiqing Nie 2 1 Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing 2 Microsoft Research, Beijing June 16, 2016

Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

  • Upload
    dothuan

  • View
    245

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Segment-Level Sequence Tagging using GatedRecursive Semi-Markov Conditional Random Fields

Jingwei Zhuo1,2, Yong Cao2, Jun Zhu1, Bo Zhang1, ZaiqingNie2

1Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing2Microsoft Research, Beijing

June 16, 2016

Page 2: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Outline

• Motivations• Our Model• Experiments• Conclusion and Future Work

2 of 13

Page 3: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Motivations

• Sequence tagging problems◦ Given a sentence x = (x1, ..., xT ), to assign each word or each set

of words (i.e., segment) a tag.◦ E.g, [NP He] [VP reckons] [NP the current account deficit] [VP will

narrow] [PP to] [NP only 1.8 billion] [PP in] [NP September]• Word-level modeling: Conditional Random Fields (CRFs)◦ To represent tags as y = (y1, ..., yT )

p(y|x) = 1Z (x)

exp

(T∑

t=1

F (yt ,x) + A(yt−1, yt)

), (1)

F (yt ,x) = vTyt

f(yt ,x) + byt , (2)

◦ F (yt ,x): tag score; f(yt ,x): features.◦ f(yt ,x) can be hand-crafted or automatically extracted (neural

networks).3 of 13

Page 4: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Motivations

• Segment-level modeling: Semi-Markov CRFs (Semi-CRFs)◦ To represent tags as s = (s1, ..., sT ), where sj = 〈hj ,dj , yj〉

p(s|x) = 1Z (x)

exp

|s|∑j=1

F (sj ,x) + A(yj−1, yj)

, (3)

F (sj ,x) = vTyj

f(sj ,x) + byj , (4)

◦ Pros: Modeling segments directly.◦ Cons: Hard to design features; No automatic feature extractor.

• Can we combine the advantages of Semi-CRFs and neuralnetworks?◦ Fully leveraging segment-level information.◦ Features for Semi-CRFs can be extracted automatically.

4 of 13

Page 5: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Our Model

• Key idea: To extract features for all the segments by onepropagation using gated recursive convolutional neuralnetworks (grConvs) (Cho et al., 2014)

5 of 13

Page 6: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Our Model

• Local building block (Cho et al., 2014)

z(d)k = θL ◦ z(d−1)k + θR ◦ z(d−1)

k+1 + θM ◦ z(d)k , (5)

z(d)k = g(WLz(d−1)k + WRz(d−1)

k+1 + bW), (6)

◦ Intuition: sources for the construction of one segment z(d)k• prefix z(d−1)

k

• suffix z(d−1)k+1

• Interaction of both, i.e., z(d−1)k

6 of 13

Page 7: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Our Model

• Connection to Semi-CRFs◦ For segment sj = 〈hj ,dj , yj〉,

• feature f (sj , x) = z(d)hj

,

• tag score F (sj , x) = v(dj )yj

Tf(sj , x) + byj .

7 of 13

Page 8: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Experiments

• Settings◦ Datasets

• CONLL-2000 dataset (text chunking) and CONLL-2003 dataset(named entity recognition, NER)

◦ Compared models• Neural models: Senna (Collobert et al., 2011) and BI-LSTM-CRF

(Huang et al., 2015)• Non-neural models: JESS-CM (Suzuki and Isozaki, 2008), etc.

◦ Hyperparameters

Hyperparameters CONLL 2000 CONLL 2003Segment length 15 10Dropout 0.3 0.3Learning rate 0.3 0.3Epochs 15 20Minibatches 10 10Window width 2 2

8 of 13

Page 9: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Experiments

• Comparison with the state-of-the-art.

Models CONLL 2000 CONLL 2003

OursgrSemi-CRF (Random embeddings) 93.92 84.66grSemi-CRF (Senna embeddings) 95.01 89.44 (90.87)

Neural Models

Senna (Random embeddings) 90.33 81.47Senna (Senna embeddings) 94.32 88.67 (89.59)

BI-LSTM-CRF (Random embeddings) 94.13 84.26BI-LSTM-CRF (Senna embeddings) 94.46 88.83 (90.10)

Non-Neural Models

JESS-CM (Suzuki and Isozaki, 2008), 15M 94.67 89.36JESS-CM (Suzuki and Isozaki, 2008), 1B 95.15 89.92

Ratinov and Roth (2009) – 90.57Lin and Wu (2009) – 90.90

Passos et al. (2014) – 90.90

9 of 13

Page 10: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Experiments

• Impact of external information• Embeddings, Brown clusters and gazetteers

Input Features CONLL 2000 CONLL 2003None 93.92 84.66

Brown(NYT) 94.18 86.57Brown(RCV1) 94.05 88.22

Emb 94.73 88.12Gaz – 87.94

Emb + Brown(NYT) 95.01 88.86Emb + Brown(RCV1) 94.87 89.44

Emb + Gaz – 89.88Brown(NYT) + Gaz – 88.69

Brown(RCV1) + Gaz – 89.82All(NYT) – 90.00

All(RCV1) – 90.87

10 of 13

Page 11: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Experiments

• Visualization of segment-level features learnt on the CONLL2003 dataset.

QueriedFilippo Inzaghi AC Milan Central African Republic Asian Cup

SegmentsPierluigi Casiraghi FC Hansa From Central African Republic Scottish CupFabrizio Ravanelli SC Freiburg Southeast Asian Nations European Cup

Bogdan Stelea FC Cologne In Central African Republic African CupNearest Francesco Totti Aston Villa The Central African Republic World Cup

Neighbour Predrag Mijatovic Red Cross South African Breweries UEFA CupResults Fausto Pizzi Yasuto Honda Of Southeast Asian Nations Europoean Cup

Pierre Laigle NAC Breda New South Wales Asian GamesPavel Nedved La Plagne Central African Republic . Europa Cup

Anghel Iordanescu Sporting Gijon Papua New Guinea National LeagueZeljko Petrovic NEC Nijmegen Central Africa F.A. Cup

11 of 13

Page 12: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Conclusions and Future Work

• We proposed grSemi-CRF, a neural network-based model forsegment-level sequence tagging that◦ models segments explicitly as Semi-CRFs,◦ extracts segment-level features automatically,◦ and achieves high performance compared to CRF models

• Future Work◦ Exploring better way to utilize unlabelled data, e.g., learning

segment-level embeddings in an unsupervised way◦ Extending to other sequence tagging tasks

12 of 13

Page 13: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries

Questions and Answers

• Motivations◦ Segment-level sequence tagging◦ CRFs and Semi-CRFs

• Our Model◦ Feature extractor◦ Connection to Semi-CRFs

• Experiments◦ Comparison with state-of-the-art◦ Impact of external information◦ Visualization of learnt features

• Conclusion and Future Work

13 of 13