22
Mid-term report of the project DIPF Frankfurt, July 18 Supervisor: Dr. Ivan Habernal Student: Anil Narassiguin Identification of Argumentative Texts in User-Generated Content on Educational Controversies

Mid-Term Report of my research project

Embed Size (px)

DESCRIPTION

Here is are the slides of my presentation I made on July 18th 2014 about identification of argumentative texts in user-generated content on educational controversies. It deals with the main goals of our study, the features used to perform our classification, and the future work planned for the rest of the internship.

Citation preview

Page 1: Mid-Term Report of my research project

Mid-term report of the projectDIPF Frankfurt, July 18

Supervisor: Dr. Ivan HabernalStudent: Anil Narassiguin

Identification of Argumentative Texts in User-Generated Content

on Educational Controversies

Page 2: Mid-Term Report of my research project

Persuasion The act of persuading or seeking to persuade... Persuade 1- To prevail on (a person) to do something, as by advising or urging. 2- To induce to believe by appealing to reason or understanding.

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | Anil Narassiguin 2

Page 3: Mid-Term Report of my research project

Persuasive or not ?

“Purposely raising your children to be outcasts is child abuse, I don't care what kind of cult you belong to. „

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | Anil Narassiguin 3

Debate about homeschooling

Persuasive

Page 4: Mid-Term Report of my research project

Persuasive or not ?

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 4

I have no intention of waiting for scientists to reach agreement on this issue before I decide how I will educate my children. Leave us alone.

Debate about single sex education

Non Persuasive

Page 5: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 5

Persuasive or not ?

You are a bad parent if you do not get your child the best education that is feasible for him/her under your circumstances. The notion that your child(ren)

should be used to indulge some social agenda is pathetic.

Debate about public and private schools

Not obvious...

Page 6: Mid-Term Report of my research project

Summary

Presentation of the task

Our approach

Problems encountered and future work

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 6

Page 7: Mid-Term Report of my research project

PRESENTATION OF THE TASK

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 7

Page 8: Mid-Term Report of my research project

Argumentation mining

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 8

ArgumentationMining

NLP Information Retrieval

MachineLearning

Philosophy

Logic

Psychology

Page 9: Mid-Term Report of my research project

Our project

Classification of user-generated text documents as persuasive (P1) and non-persuasive (P2).

Supervised Learning Three annotators classified manually a corpus of 990 texts as P1 or P2.

Cohen's kappa: 0.50 – 0.60

6 domains or topics such as: redshirting, prayer in schools, homeschooling, single sex education, mainstreaming and public private schools.

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 9

Page 10: Mid-Term Report of my research project

Our project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 10

Cross Validation (CV) - Full CV : Cross Validation performed on all the data - In Domain CV : Cross Validation performed on texts of only one domain - Cross Domain Validation: One domain is tested on the 5 other ones.

Page 11: Mid-Term Report of my research project

Our data

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 11

Classredshirting prayer

in schools

homeschooling

single sex education

mainstreaming public private schools

P1 38 77 86 26 10 287

P2 30 66 138 24 19 189

Total 68 143 224 50 29 476

Page 12: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 12

OUR APPROACH

Page 13: Mid-Term Report of my research project

Feature Engineering

Baseline features Based on the tokens, we extract the 10,000 most common 1, 2 and 3-grams.

Part of speech (POS) features Total number and Ratio of all the POS defined in DKPro.

Syntactic features - Statistics (mean, max) about the number of clauses per sentence.- Average and maximal depth of the dependency trees- Presence of a dependency rule (still need to be done)(Ref: Identifying Argumentative Discourse Structures in Persuasive Essays, Christian Stab)

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 13

Page 14: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 14

Lexical features- Statistics about sentences and tokens- Length of a post, number of token per sentence and number of tokens with more than 6 letters (Ref: "Stance Classification of Ideological Debates: Data, Models, Features, and Constraints" Kazi Saidul Hasan and Vincent Ng)

- Ratio of punctuation marks and presence of multiple punctuation

Sentiment featuresIntegration of GPL Stanford Deep Learning for Sentiment Analysis in DKPro

Feature Engineering

Page 15: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 15

Feature Engineering

Sentiment featuresSentiment Analysis tool output: for each sentence, 5 scores are computed. We are creating 20 features for those scores.

Page 16: Mid-Term Report of my research project

Classifier used

We're currently using the common classifier SVM

At the moment, we don't consider to compare the results for several classifiers

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 16

Page 17: Mid-Term Report of my research project

Results

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 17

Full CVBaseline 69.0 / 68.9

All the features

67.9 / 67.8

Baseline+Sentiment Analysis

68.0 / 68.0

Class redshirting prayer in schools

homeschooling single sex education

mainstreamingpublic private schools

Baseline 51.5 / 51.4

74.1 / 73.9

74.1 / 70.5

74.1 / 70.5

65.5 / 39.6

70.0 / 68.5

All the features

54.4 / 54.1

73.4 / 73.3

75.0 / 71.5

68.0 / 67.5

65.5 / 39.6

72.1 / 70.6

Baseline+Sentiment Analysis

54.4 / 54.1

73.4 / 73.3

75.0 / 71.5

68.0 / 67.5

65.5 / 39.6

72.1 / 70.6

In Domain CV

– Blue: Accuracy– Orange: F-measure

Page 18: Mid-Term Report of my research project

Results

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 18

Cross Domain

Class redshirting prayer in schools

homeschooling single sex education

mainstreaming public private schools

Baseline 54.4 / 54.1

53.9 / 52.6

65.6 / 61.0 64.0 / 62.5

65.5 / 61.8

66.1 / 65.9

All the features

55.9 / 55.5

53.9 / 53.0

65.1 / 61.2 62.0 / 60.1

65.5 / 61.8

65.3 / 64.9

Baseline+Sentiment Analysis

54.4 / 54.1

73.4 / 73.3

74.5 / 70.9 66.0 / 65.3

65.5 / 39.6

71.0 / 69.6

Page 19: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 19

PROBLEMS ENCOUNTERED AND FUTURE WORK

Page 20: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal |

Problems encountered

20

Knowledge in NLP

Getting hands on DKPro

Computation times (might solve it soon...)

Page 21: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal |

Future work

Error Analysis

Feature selection

Hyper parameter optimization for SVM

Bootstrap the model with new data from debate portals

21

Page 22: Mid-Term Report of my research project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 22

Questions ?