15
VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen – ePi Technology Thi Dam Nguyen – ePi Technology

VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Embed Size (px)

Citation preview

Page 1: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

VNLP: An Open Source Framework for Vietnamese Natural Language Processing

Ngoc Minh Le - ePi TechnologyBich Ngoc Do – ePi TechnologyVi Duong Nguyen – ePi TechnologyThi Dam Nguyen – ePi Technology

Page 2: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Major tasks in Natural Language Processing

2

Automatic summarization

Machine translation

Sentiment analysis

...

High level ApplicationWord segmentation

Part-of-speech tagging

...

Fundamental task

Page 3: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Fundamental Tasks

3

Word segmentation

Part-of-speech tagging

Syntactic Parser

 Named Entity Recognizer (NER)

Coreference resolution

Page 4: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Framework for Vietnamese NLP?

4

Stanford CoreNLP Framework for English

Framework for Vietnamese Natural Language Processing

Page 5: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

JVnTextPro

JVnTextPro Tokenizer POS Tagging

5

Enough? Solution?

Page 6: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Word segmentation

VnTokenizer with accuracy upto 96%-98%.

Some improvement are to speed up vnTokenizer: Reading XML-encode data via SAX Tokenize a document by LL parser Using automaton with default transition.

6

Page 7: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Part-of-speech tagging

VnTagger 95%

JVnTagger 91.3%

VnQTag 85.57 %

Page 8: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Syntactic parsing

Tree adjoining grammar, head driven phrase structure grammar,… No software deliverable.

MaltParser Open-source Independent of language

Acceptable accuracy 70%

8

Page 9: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Named-Entity Recognition

Using rule-based method.The rule-based NER includes two part:

a word searching component called gazetteer in GATE's terminology

a pattern matching component called transducer

Accuracy 59%

9

Page 10: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Coreference resolution

Approaching by heuristic rules consists of two component: Orthographical matcher (orthormatcher) with 17

rules. Co-referencer performs pronominal co-

referencing and integrate everything into co-reference lists

10

Page 11: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Open Source Framework for Vietnamese NLP

1111

VnTagger

Sentence splitter

MaltParser Vn-Ner

Co-reference

Named-entity recognition

Syntactic parsing

VNLP

VnTokenizer

Document Reset PR

Page 12: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Application of VNLP

12 12

Online Reputation Managerment - noti5.vn• applications of sentiment analysis• all mention about a brand• determine positive and negative opinion

Automatic synthesis and classification webpages

Page 13: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

PART 5 – CONCLUSION AND FUTURE WORK

13

Page 14: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

14

Page 15: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –

Thank for your attention!

Q & A