Mid-Term Report of my research project

Preview:

DESCRIPTION

Here is are the slides of my presentation I made on July 18th 2014 about identification of argumentative texts in user-generated content on educational controversies. It deals with the main goals of our study, the features used to perform our classification, and the future work planned for the rest of the internship.

Citation preview

Mid-term report of the projectDIPF Frankfurt, July 18

Supervisor: Dr. Ivan HabernalStudent: Anil Narassiguin

Identification of Argumentative Texts in User-Generated Content

on Educational Controversies

Persuasion The act of persuading or seeking to persuade... Persuade 1- To prevail on (a person) to do something, as by advising or urging. 2- To induce to believe by appealing to reason or understanding.

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | Anil Narassiguin 2

Persuasive or not ?

“Purposely raising your children to be outcasts is child abuse, I don't care what kind of cult you belong to. „

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | Anil Narassiguin 3

Debate about homeschooling

Persuasive

Persuasive or not ?

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 4

I have no intention of waiting for scientists to reach agreement on this issue before I decide how I will educate my children. Leave us alone.

Debate about single sex education

Non Persuasive

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 5

Persuasive or not ?

You are a bad parent if you do not get your child the best education that is feasible for him/her under your circumstances. The notion that your child(ren)

should be used to indulge some social agenda is pathetic.

Debate about public and private schools

Not obvious...

Summary

Presentation of the task

Our approach

Problems encountered and future work

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 6

PRESENTATION OF THE TASK

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 7

Argumentation mining

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 8

ArgumentationMining

NLP Information Retrieval

MachineLearning

Philosophy

Logic

Psychology

Our project

Classification of user-generated text documents as persuasive (P1) and non-persuasive (P2).

Supervised Learning Three annotators classified manually a corpus of 990 texts as P1 or P2.

Cohen's kappa: 0.50 – 0.60

6 domains or topics such as: redshirting, prayer in schools, homeschooling, single sex education, mainstreaming and public private schools.

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 9

Our project

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 10

Cross Validation (CV) - Full CV : Cross Validation performed on all the data - In Domain CV : Cross Validation performed on texts of only one domain - Cross Domain Validation: One domain is tested on the 5 other ones.

Our data

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 11

Classredshirting prayer

in schools

homeschooling

single sex education

mainstreaming public private schools

P1 38 77 86 26 10 287

P2 30 66 138 24 19 189

Total 68 143 224 50 29 476

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 12

OUR APPROACH

Feature Engineering

Baseline features Based on the tokens, we extract the 10,000 most common 1, 2 and 3-grams.

Part of speech (POS) features Total number and Ratio of all the POS defined in DKPro.

Syntactic features - Statistics (mean, max) about the number of clauses per sentence.- Average and maximal depth of the dependency trees- Presence of a dependency rule (still need to be done)(Ref: Identifying Argumentative Discourse Structures in Persuasive Essays, Christian Stab)

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 13

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 14

Lexical features- Statistics about sentences and tokens- Length of a post, number of token per sentence and number of tokens with more than 6 letters (Ref: "Stance Classification of Ideological Debates: Data, Models, Features, and Constraints" Kazi Saidul Hasan and Vincent Ng)

- Ratio of punctuation marks and presence of multiple punctuation

Sentiment featuresIntegration of GPL Stanford Deep Learning for Sentiment Analysis in DKPro

Feature Engineering

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 15

Feature Engineering

Sentiment featuresSentiment Analysis tool output: for each sentence, 5 scores are computed. We are creating 20 features for those scores.

Classifier used

We're currently using the common classifier SVM

At the moment, we don't consider to compare the results for several classifiers

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 16

Results

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 17

Full CVBaseline 69.0 / 68.9

All the features

67.9 / 67.8

Baseline+Sentiment Analysis

68.0 / 68.0

Class redshirting prayer in schools

homeschooling single sex education

mainstreamingpublic private schools

Baseline 51.5 / 51.4

74.1 / 73.9

74.1 / 70.5

74.1 / 70.5

65.5 / 39.6

70.0 / 68.5

All the features

54.4 / 54.1

73.4 / 73.3

75.0 / 71.5

68.0 / 67.5

65.5 / 39.6

72.1 / 70.6

Baseline+Sentiment Analysis

54.4 / 54.1

73.4 / 73.3

75.0 / 71.5

68.0 / 67.5

65.5 / 39.6

72.1 / 70.6

In Domain CV

– Blue: Accuracy– Orange: F-measure

Results

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 18

Cross Domain

Class redshirting prayer in schools

homeschooling single sex education

mainstreaming public private schools

Baseline 54.4 / 54.1

53.9 / 52.6

65.6 / 61.0 64.0 / 62.5

65.5 / 61.8

66.1 / 65.9

All the features

55.9 / 55.5

53.9 / 53.0

65.1 / 61.2 62.0 / 60.1

65.5 / 61.8

65.3 / 64.9

Baseline+Sentiment Analysis

54.4 / 54.1

73.4 / 73.3

74.5 / 70.9 66.0 / 65.3

65.5 / 39.6

71.0 / 69.6

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 19

PROBLEMS ENCOUNTERED AND FUTURE WORK

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal |

Problems encountered

20

Knowledge in NLP

Getting hands on DKPro

Computation times (might solve it soon...)

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal |

Future work

Error Analysis

Feature selection

Hyper parameter optimization for SVM

Bootstrap the model with new data from debate portals

21

17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 22

Questions ?