45
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song , Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

Embed Size (px)

Citation preview

Page 1: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INS

TIT

UTE O

F C

OM

PU

TIN

G

TEC

HN

OLO

GY

Bagging-based System Combination for Domain

Adaptation

Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu

Institute of Computing Technology

Chinese Academy of Sciences

Page 2: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

2

An Example

Page 3: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

3

An Example

Initial MT system

Page 4: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

4

An Example

Development setA:90% B:10%

Initial MT system Tuned MT system that fits domain A

The translation styles of A and B

are quite different

Page 5: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

5

An Example

Development setA:90% B:10%

Initial MT system Tuned MT system that fits domain A

Test setA:10% B:90%

Page 6: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

6

An Example

Development setA:90% B:10%

Initial MT system Tuned MT system that fits domain A

Test setA:10% B:90%

The translation style fits A, but we mainly want to translate B

Page 7: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

7

Traditional Methods

Monolingual data with domain annotation

Page 8: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

8

Traditional Methods

Monolingual data with domain annotation

Domain recognizer

Page 9: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

9

Traditional Methods

Bilingual training data

Page 10: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

10

Traditional Methods

Bilingual training data

Domain recognizer

training data : domain A

training data : domain B

Page 11: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

11

Traditional Methods

Bilingual training data

Domain recognizer

training data : domain A

training data : domain B

MT system domain A

MT system domain B

Page 12: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

12

Traditional Methods

Test set

Page 13: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

13

Traditional Methods

Domain recognizer

Test set

Test set domain A

Test set domain B

Page 14: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

14

Traditional Methods

The translation result

MT system domain A

MT system domain B

Test set domain A

Test set domain B

The translation result domain A

The translation result domain B

Page 15: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

15

The merits

Simple and effective

Fits Human’s intuition

Page 16: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

16

The drawbacks

Classification Error (CE) Especially for unsupervised methods

Supervised methods can make CE low, yet requiring annotation data limits its usage

Page 17: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

17

Our motivation

Jump out of the alley of doing adaptation directly

Statistics methods (such as Bagging) can help.

Page 18: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

18

The general framework of Bagging

Preliminary

Page 19: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

19

General framework of Bagging

Training set D

Page 20: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

20

General framework of Bagging

C1

Training set D

Training set D1 Training set D2 Training set D3 ……

C2 C3 ……

Page 21: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

21

General framework of Bagging

C1 C2 C3 ……

Test sample

Page 22: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

22

General framework of Bagging

C1 C2 C3 ……

Test sample

Result of C1 Result of C2 Result of C3 ……

Voting result

Page 23: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

23

Our method

Page 24: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

24

Training

A,A,A,B,B

Suppose there is a development set

For simplicity, there are only 5 sentences, 3 belong A, 2 belong B

Page 25: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

25

Training

A,A,A,B,B

A,B,B,B,B

A,A,B,B,B

A,A,B,B,B

A,A,A,B,B

A,A,A,A,B

……

We bootstrap N new development

sets

Page 26: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

26

Training

A,A,A,B,B

A,B,B,B,B

A,A,B,B,B

A,A,B,B,B

A,A,A,B,B

A,A,A,A,B

MT system-1

……

MT system-2

MT system-3

MT system-4

MT system-5

……

For each set, a subsystem is tuned

Page 27: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

27

Decoding For simplicity, Suppose only 2 subsystem has

been tuned

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

Page 28: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

28

Decoding

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

A B

Now a sentence “A B” needs a translation

Page 29: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

29

Decoding

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

A B

a b; <0.2, 0.2>a c; <0.2, 0.3>

a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>

After translation, each system generate its N-

best candidate

Page 30: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

30

Decoding

a b; <0.1, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>

Fuse these N-best lists and eliminate deductions

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

A B

a b; <0.2, 0.2>a c; <0.2, 0.3>

a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>

Page 31: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

31

Decoding

a b; <0.1, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

A B

a b; <0.2, 0.2>a c; <0.2, 0.3>

a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>

Candidates are identical only if their target strings

and feature values are entirely equal

Page 32: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

32

Decoding

Calculate the voting score

a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

S

ttcfeatcscorefinal

1

)(_

a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18

S represent the number of subsystems

Page 33: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

33

Decoding

The one with the highest score

wins

a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18

S

ttcfeatcscorefinal

1

)(_

Page 34: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

34

Decoding

The one with the highest score

wins

a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>

Subsystem-1W:<-0.8,0.2>

Subsystem-1W:<-0.6,0.4>

a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18

Since subsystems are different copies of the same model and share unique training

data, calibration is unnecessary

S

ttcfeatcscorefinal

1

)(_

Page 35: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

35

Experiments

Page 36: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

36

Basic Setups

Data: NTCIR9 Chinese-English patent corpus 1k sentence pairs as development set Another 1k pairs as test set The remains are used for training

System: hierarchical phrase based model

Alignment: GIZA++ grow-diag-final

Page 37: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

37

Effectiveness : Show and Prove

Tune 30 subsystems using Bagging

Tune 30 subsystems with random initial weight

Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare

Page 38: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

38

Results: 1-best

1 5 10 15 20 3031.00

31.10

31.20

31.30

31.40

31.50

31.60

31.70

31.80

31.90

32.00

31.08

31.51

31.64

31.7331.8

31.9

31.08 31.11 31.1331.17

31.23 31.2

baggingrandom

Number of subsystem

+0.82

Page 39: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

39

Results: 1-best

1 5 10 15 20 3031.00

31.10

31.20

31.30

31.40

31.50

31.60

31.70

31.80

31.90

32.00

31.08

31.51

31.64

31.7331.8

31.9

31.08 31.11 31.1331.17

31.23 31.2

baggingrandom

Number of subsystem

+0.70

Page 40: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

40

Results: Oracle

1 5 10 15 20 3036.00

37.00

38.00

39.00

40.00

41.00

42.00

43.00

36.74

40.35

42.2742.52 42.74 42.96

36.74

38.3538.67 38.82 39.04 39.25

baggingrandom

Number of subsystem

+6.22

Page 41: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

41

Results: Oracle

1 5 10 15 20 3036.00

37.00

38.00

39.00

40.00

41.00

42.00

43.00

36.74

40.35

42.2742.52 42.74 42.96

36.74

38.3538.67 38.82 39.04 39.25

baggingrandom

Number of subsystem

+3.71

Page 42: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

42

Compare with traditional methods

Evaluate a supervised method For tackling data sparsity only operate on

development set and test set

Evaluate a unsupervised method Similar to Yamada (2007) To avoid data sparsity, only LM specific

Page 43: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

43

Results

baseline bagging supervise unsupervise31.00

31.10

31.20

31.30

31.40

31.50

31.60

31.70

31.80

31.90

32.00

31.08

31.9

31.63

31.24

1-best

Page 44: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

44

Conclusions

Propose a bagging-based method to address multi-domain translation problem.

Experiments shows that: Bagging is effective for domain adaptation

problem Our method surpass baseline explicitly, and is

even better than some traditional methods.

Page 45: INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing

INSTITUTE OF COMPUTING TECHNOLOGY

45

Thank you for listeningAnd any questions?