Upload
claribel-henderson
View
226
Download
0
Embed Size (px)
Citation preview
VNLP: An Open Source Framework for Vietnamese Natural Language Processing
Ngoc Minh Le - ePi TechnologyBich Ngoc Do – ePi TechnologyVi Duong Nguyen – ePi TechnologyThi Dam Nguyen – ePi Technology
Major tasks in Natural Language Processing
2
Automatic summarization
Machine translation
Sentiment analysis
...
High level ApplicationWord segmentation
Part-of-speech tagging
...
Fundamental task
Fundamental Tasks
3
Word segmentation
Part-of-speech tagging
Syntactic Parser
Named Entity Recognizer (NER)
Coreference resolution
Framework for Vietnamese NLP?
4
Stanford CoreNLP Framework for English
Framework for Vietnamese Natural Language Processing
JVnTextPro
JVnTextPro Tokenizer POS Tagging
5
Enough? Solution?
Word segmentation
VnTokenizer with accuracy upto 96%-98%.
Some improvement are to speed up vnTokenizer: Reading XML-encode data via SAX Tokenize a document by LL parser Using automaton with default transition.
6
Part-of-speech tagging
VnTagger 95%
JVnTagger 91.3%
VnQTag 85.57 %
Syntactic parsing
Tree adjoining grammar, head driven phrase structure grammar,… No software deliverable.
MaltParser Open-source Independent of language
Acceptable accuracy 70%
8
Named-Entity Recognition
Using rule-based method.The rule-based NER includes two part:
a word searching component called gazetteer in GATE's terminology
a pattern matching component called transducer
Accuracy 59%
9
Coreference resolution
Approaching by heuristic rules consists of two component: Orthographical matcher (orthormatcher) with 17
rules. Co-referencer performs pronominal co-
referencing and integrate everything into co-reference lists
10
Open Source Framework for Vietnamese NLP
1111
VnTagger
Sentence splitter
MaltParser Vn-Ner
Co-reference
Named-entity recognition
Syntactic parsing
VNLP
VnTokenizer
Document Reset PR
Application of VNLP
12 12
Online Reputation Managerment - noti5.vn• applications of sentiment analysis• all mention about a brand• determine positive and negative opinion
Automatic synthesis and classification webpages
PART 5 – CONCLUSION AND FUTURE WORK
13
14
Thank for your attention!
Q & A