26
NLTK: Natural Language Toolkit Overview and Application Jimmy Lai [email protected] Software Engineer @ Oxygen Intelligence 2012/03/21 1

Nltk natural language toolkit overview and application @ PyHug

Embed Size (px)

DESCRIPTION

NLTK is a python toolkit for Natural Language Processing. In this slide, the author provides overview for NLTK and demonstrates an application in Chinese text classification.

Citation preview

Page 1: Nltk  natural language toolkit overview and application @ PyHug

1

NLTK: Natural Language Toolkit Overview and Application

Jimmy [email protected]

Software Engineer @ Oxygen Intelligence2012/03/21

Page 2: Nltk  natural language toolkit overview and application @ PyHug

2

Outline

1. An application based on NLP: 聚寶評2. Introduction to Natural Language Processing3. Brief History of NLTK4. Overview of NLTK5. Application of NLTK: Topic Classification on

PTT

Page 3: Nltk  natural language toolkit overview and application @ PyHug

3

聚寶評 www.ezpao.com

美食搜尋引擎

Page 4: Nltk  natural language toolkit overview and application @ PyHug

4

聚寶評 www.ezpao.com

語意分析搜尋引擎

Page 5: Nltk  natural language toolkit overview and application @ PyHug

5

網友分享菜分析

正評 / 負評分析

評論主題分析

Page 6: Nltk  natural language toolkit overview and application @ PyHug

6

Natural Language Processing (NLP)

• 語音識別( Speech recognition )• 詞性標註( Part-of-speech tagging )• 句法分析( Parsing )• 自然語言生成 (Natural language generation)• 文本分類( Text classification )• 信息抽取( Information extraction )• 機器翻譯( Machine translation )• 文字蘊涵( Textual entailment )

via Wikipedia

Page 7: Nltk  natural language toolkit overview and application @ PyHug

7

NLTK: Natural Language Toolkit

• http://www.nltk.org/• Author: Steven Bird, Edward Loper, Ewan Klein• Originally developed for class student has

background either in computer science or linguistics.

• Currently:– Education: over 100 courses in 23 countries.– Research: over 250 papers cites NLTK.

Page 8: Nltk  natural language toolkit overview and application @ PyHug

8

Outline

1. An application based on NLP: 聚寶評2. Introduction to Natural Language Processing3. Brief History of NLTK4. Overview of NLTK5. Application of NLTK: Topic Classification on

PTT

Page 9: Nltk  natural language toolkit overview and application @ PyHug

9

NLP in NLTK – Annotated Text Corpora

Resources:from nltk.corpus import *

Page 10: Nltk  natural language toolkit overview and application @ PyHug

10

NLP in NLTK – Text Tokenization, Normalization

Text Processing Flow

Resources:from nltk.tokenize import *

Page 11: Nltk  natural language toolkit overview and application @ PyHug

11

NLP in NLTK – Part-of-speech Tagging

Resources:from nltk.tag import *

Page 12: Nltk  natural language toolkit overview and application @ PyHug

12

NLP in NLTK – Text Classification

Resources:from nltk.classify import *

Page 13: Nltk  natural language toolkit overview and application @ PyHug

13

NLP in NLTK – Entity Recognition

Resources:from nltk.chunk import *

Page 14: Nltk  natural language toolkit overview and application @ PyHug

14

NLP in NLTK – Grammar Tree

Resources:from nltk.parse import *

Page 15: Nltk  natural language toolkit overview and application @ PyHug

15

NLP in NLTK – Semantic of Sentence

• Propositional Logic• First-Order Logic• Disclosure Semantics

Resources:from nltk.sem import *

Page 16: Nltk  natural language toolkit overview and application @ PyHug

16

Outline

1. An application based on NLP: 聚寶評2. Introduction to Natural Language Processing3. Brief History of NLTK4. Overview of NLTK5. Application of NLTK: Topic Classification on

PTT

Page 17: Nltk  natural language toolkit overview and application @ PyHug

17

Topic Classification on PTT

• 熱門看板 :– 文章主題明確 : Food( 美食版 ), HatePolitics( 政

黑板 ), Baseball( 棒球版 ), Stock( 股票版 ), Boy-Girl( 男女版 )

– 文章主題廣泛 : Gossiping( 八卦版 )• 目標 : 將八卦版的文章依照主題分類,就

可以只挑選有興趣的主題的文章來閱讀。

Page 18: Nltk  natural language toolkit overview and application @ PyHug

18

System Flow

Page 19: Nltk  natural language toolkit overview and application @ PyHug

19

Tokenization

Page 20: Nltk  natural language toolkit overview and application @ PyHug

20

Text Classification

Page 21: Nltk  natural language toolkit overview and application @ PyHug

21

Result – Boy-Girl

Page 22: Nltk  natural language toolkit overview and application @ PyHug

22

Result – HatePolitics

Page 23: Nltk  natural language toolkit overview and application @ PyHug

23

Result – Food

Page 24: Nltk  natural language toolkit overview and application @ PyHug

24

Result – Stock

Page 25: Nltk  natural language toolkit overview and application @ PyHug

25

Reference

• Steven Bird, Ewan Klein, and Edward Loper, “Natural Language Processing with Python”, 2009.

• Jacob Perkins, “Python Text Processing with NLTK 2.0 Cookbook”, 2010

• Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. Proceedings of the ACL02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics.

Page 26: Nltk  natural language toolkit overview and application @ PyHug

26

We are hiring

• 核心引擎演算法研發工程師• 系統研發工程師• 網路應用研發工程師• 市場研究及網路服務產品設計經理

• Oxygen-Intelligence Taiwan Limited引京聚點 知識結構搜索股份有限公司

• 公司簡介、職缺簡介: http://goo.gl/amjAJ• 請將履歷寄到 [email protected]