69
1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中中中中中中中中 中中中 [email protected]

1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 [email protected]

Embed Size (px)

Citation preview

Page 1: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

1

Implications of Web 2.0 on Information Research

Wen-Lian HsuAcademia Sinica, Taiwan

中央研究院資訊所 許聞廉[email protected]

Page 2: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

2

Outline

What is Web 2.0? Web 2.0 and Research

Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info

Conclusion

Page 3: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

3

Page 4: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

4

What is Web 2.0?

Web 2.0 Conference (October 2004) Tim O'Reilly

The Web As a Platform Harnessing Collective Intelligence Data is the Next Intel Inside End of the Software Release Cycle Lightweight Programming Models Software Above the Level of a Single Device Rich User Experiences

Page 5: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

5

Key Web 2.0 services/applications

Blogs Wikis Tagging and social bookmarking Multimedia sharing RSS and syndication Podcasting P2P

Page 6: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

6

Social Bookmarking

Source: http://funp.com/push/

Page 7: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

7Soruce: http://www.hemidemi.com/

Source: http://digg.com/

Page 8: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

8

Blog

ContentContent

comments

comments

adsenseadsenseSocial bookmark

Social bookmark

Source: http://carol.bluecircus.net/

Page 9: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

9

Skype

Source: S.A Baset, H. Schulzrinne (September 14, 2004). An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol. Technical Report. Columbia University.

Page 10: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

10

Wikipedia

Page 11: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

11

Second Life

Page 12: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

12

Symbiosis ( 共生機制 ) is the Key

Blog Social bookmark

Page 13: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

13

The Web Changes in Several Dimensions

Dynamics Heterogeneity Collaboration Composition Socialization

Page 14: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

14

Current Research Activities Information Retrieval on Blogs

NTCIR-7 CLIRB (Cross-Lingual Information Retrieval for Blog) Question Answering on Blogs

TREC 2007 QA Track Question Answering on Wikipedia

QA@CLEF 2007 CLEF 2006 WiQA

given a Wikipedia page, locate information snippets in Wikipedia PASCAL Ontology Learning Challenge

Ontology construction Ontology extension Ontology population Concept naming

LinkKDD2006, Textlink2007, MRDM2007

Page 15: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

15

International Competition

1st/9 place in the NTCIR5 2005 CLQA Chinese Question Answering Contest (44.5%)

1st/13 place in the WS CityU closed track of the SIGHAN 2006 Word Segmentation Contest (97.2%)

2nd/10 place in the WS CKIP closed track of the SIGHAN 2006 Word Segmentation Contest (95.7%)

2nd/8 place in the NER CityU closed track of the SIGHAN 2006 Named Entity Recognition Contest (88%)

1st place in the NTCIR6 2006 CLQA Chinese Question Answering Contest (55.3%)

1st place in the NTCIR6 2006 CLQA English-Chinese Question Answering Contest (34%)

Page 16: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

16

Factoid Questions PERSON:

請問芬蘭第一位女總統為誰? Who is Finland's first woman president?

LOCATION:請問狂牛症最早起源於何國?Which country is the mad cow disease originated from?

ORGANIZATION:請問收購南韓三星汽車的外國廠商為何?Which corporation bought South Korea's Samsung Motors?

TIME NUMBER ARTIFACT

Page 17: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

17

IASL QA Architecture

SVM

InfoMap

Question ProcessingQuestion Processing

AutoTag Mencius

ME

Lucene AutoTag

Passage RetrievalPassage Retrieval Answer RankingAnswer Ranking

Mencius

Filter

word indexword index char indexchar index documentsdocuments

Answers

Answer ExtractionAnswer Extraction

Page 18: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

18

Chinese Question Taxonomyfor NTCIR CLQA Factoid Question Answering

Page 19: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

19

Knowledge Representation of Chinese Questions

Chinese Question:

2004 年奧運在哪一個城市舉行 ?

(In which city were the Olympics held in 2004?)

[5 Time]:[3 Organization]:[7 Q_Location]:([9 LocaitonRelatedEvent])

Page 20: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

20

QC by SVM Two types of feature used for CQC

Syntactic features Bag-of-Words

character-based bigram (CB) word-based bigram (WB)

Part-of-Speech (POS) AUTOTAG

POS tagger developed by CKIP, Academia Sinica Semantic Features

HowNet Senses HowNet Main Definition (HNMD) HowNet Definition (HND)

Page 21: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

21

Question Classification Accuracy

Chinese Question Classification (CQC)

73.5%

88.0%92.0%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

Machine LearningApproach

(SVM)

Knowledge-basedApproach

(INFOMAP)

Hybrid Approach (SVM + INFOMAP)

Acc

urac

y

Page 22: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

22

Answer Extraction

Mencius

Filter

Answer ExtractionAnswer Extraction廿一世紀美國總統 總統父子檔美國第二對 美國總統性事錄 翻開美國總統傳訊史 美國總統匆忙赴晚宴 陸文斯基瘋狂愛上美國總統美國總統大選選舉人票分析前越南總統阮文紹病逝美國美國總統柯林頓表示 陸文斯基

阮文紹柯林頓

Page 23: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

23

Templates generated by local alignment .. 因 /Cbb/O 台中縣 /Nc/LOC 議長 /Na/OCC 顏清標 /Nb/PER 涉嫌 /VK/O..

.. 清朝 /Nd/O 台灣 /Nc/LOC 巡撫 /Na/OCC 劉銘傳 /Nb/PER 所 /D/O.. LOC OCC PER (contains only NEs)

被 /P/O 大陸 /Nc/LOC 國家 /Na/O 主席 /Na/OCC 江澤民 /Nb/O 形容為 /VG/O../COMMA/O 香港 /Nc/LOC 行政 /Na/O 長官 /Na/OCC 董建華 /Nb/PER 近日 .. 俄羅斯 /Nc/LOC 男子 /Na/O 選手 /Na/OCC 史莫契柯夫 /Nb/O 在 /P/O.. LOC Na OCC Nb (template contains POS-tag)

由 /P/O 建業 /Nc/O 所長 /Na/OCC 張龍憲 /Nb/PER 擔任 /VG/O 由 /P/O 安侯 /Nb/O 所長 /Na/OCC 魏忠華 /Nb/PER 擔任 /VG/O 由 N 所長 PER 擔任 (template contains paritial POS-tag, word)

在 /P/O 卡達首都 /Nc/LOC 多哈 /D/PER,LOC 舉行 /VC/O於 /P/O 國父紀念館 /Nc/ORG - 舉行 /VC/O在 /P/O 國父紀念館 /Nc/ORG 廣場 /Nc/O 舉行 /VC/O P Nc – 舉行 (template with gap ‘-’ )

Page 24: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

24

Answer Extraction from Template Question: 誰是台灣國防部長?

Q-Type: PERSON Q-KEYWORD: 台灣 國防部長

Tagged Passages 前任 /A/O 美國 /Nc/LOC 國防部長 /Na/OCC 溫柏格 /Nb/PER 認為 /VE/O , /COMMACATEGOR

Y/O 美國 /Nc/LOC 國防部長 /Na/OCC 柯恩 /Nb/PER 今天 /Nd/O 表示 /VE/O , /COMMA/O 華府 /Nc/

ORG,LOC 當局 /Na/O 正 /D/O 設法 /VF/O 釐清 /VC/O 台灣 /Nc/LOC 【 /PAR/O 路透 /Nb/ORG 東京 /Nc/LOC 十九日 /Nd/TIME 電 /VC/ART 】 /PAREN/O 台灣 /Nc/

LOC 國防部長 /Na/OCC 唐飛 /Nb/PER 昨天 /Nd/O

Template matching and Relation building Template: LOC OCC PER Relation:

美國 , 國防部長 , 溫柏格 , 柯恩 台灣 , 國防部長 , 唐飛

Page 25: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

25

Answer Extraction from Template Question: 黛安娜王妃的死亡車禍事故發生在哪裡?

Q-TYPE: LOCATION Q-KEYWORD: 黛安娜 王妃 死亡 車禍 事故 發生

Tagged Passages .. 則 /D/O 把 /P/O 英國 /Nc/LOC 黛安娜 /Nb/PER 王妃 /Na/O 的 /DE/O 巴黎 /Nc/L

OC 死亡 /VH/O 車禍 /Na/O , /COMMA/O 搬上 /VC/O 舞台 /Na/O .. .. 英國 /Nc/LOC 王妃 /Na/O 黛安娜 /Nb/PER 離開 /VC/O 人世 /Nc/O 四個多月 /Nd

/TIME ..

Template matching and Relation building Template:

PER Na DE LOC – Na LOC Na PER - VC

Relation: 黛安娜 /PER, 王妃 /Na, 巴黎 /LOC, 車禍 /Na 英國 /LOC, 黛安娜 /PER, 王妃 /Na, 離開 /VC

Page 26: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

26

Answer Ranking Features are combined as weighted sum Answer Ranking Features

IR Score Answer Frequency (voting) * QFocus adjacency:

“ 美國總統 [ 布希 ] 表示” “ 前往 [ 惠氏藥廠 ] 參觀”

* Question Term and Answer Term (QAT) Co-occurrence * Answer Template

Page 27: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

27

Web 2.0 and Research

Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info

Page 28: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

28

Human-based Computation

Page 29: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

29

Human-based Computation

Social Search wayfinding tools informed by human judgment

CAPTCHA reversed Turing test (Turing test 是由人來詢問系統,這裡

則是由系統來詢問使用者) Interactive Genetic Algorithm (IGA)

a genetic algorithm informed by human judgment. 由人工提供 fitness function結果

例子:描繪罪犯畫像,系統以 GA方式產生嫌犯畫像,目擊者負責評分看那個比較像,不斷重複過程直到接近罪犯樣子為止

Page 30: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

30

CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart

A CAPTCHA is a type of challenge-response test used in computing to determine whether the user is human. wikipedia

SOURCE: http://recaptcha.net/

Page 31: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

31

CAPTCHA

blog

CAPTCHA

blog

CAPTCHA

blog

CAPTCHA

Unrecognizedtext

Recognizedtext

Page 32: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

32

The ESP Game a two-player game The goal is to guess what yo

ur partner is typing on each image.

Once you both type the same word(s), you get scores.

Source: http://www.espgame.org/

ESPESP

Page 33: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

33

The Phetch Game

Play as a describer

Page 34: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

34

The Phetch Game

Play as a seeker

PhetchPhetch

Page 35: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

35

How about a game for describing idioms?

罄竹難書 如沐春風

高抬貴手 不動如山壞事做太多罄竹難書 : 壞事做太多虎頭蛇尾 : 做事沒有毅力………

Page 36: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

36

Folksonomy (Social Tagging)

Page 37: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

37

Folksonomy (Social Tagging)

Also known as social tagging, collaborative tagging, social classification, social indexing

Folksonomy is the practice and method of collaboratively creating and managing tags to annotate and categorize content.

Wikipedia

Page 38: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

38

Page 39: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

39

del.icio.usTags: Descriptive words applied by users to links. Tags are searchable

My Tags: Words I’ve used to describe links in a way that makes sense to me

Page 40: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

40

Semantic Web

Source: Tim Berners-Lee

Page 41: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

41

Using Folksonomy to Help Semantic Web Top-down Semantic Annotation

Approach Define an ontology first Use the ontology to add semantic markups to web

resources. The semantics is provided by the ontology which is

shared among different web agents and applications. Problem

Negotiation Evolution (hard to maintain) High Barrier (background)

Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”

Page 42: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

42

Using Folksonomy to Help Semantic Web Bottom-up approach with social tagging Advantage

No common ontology or dictionary are needed Easy to access Sensitive to information drift

Disadvantage Ambiguity Problem: For example, “XP” can refer to either

“Extreme Programming” or “Windows XP”. Group Synonymy Problem: two seemingly different

annotations may bear the same meaning.

Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”

Page 43: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

43

Or Folksonomy is the Solution?

Ontology is Overrated Classification of the web has failed Classification itself is filled with bias and error Tagging is the solution

Source: http://www.shirky.com/writings/ontology_overrated.html

Page 44: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

44

Academic Data Analysis

Page 45: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

45

Academic Data Analysis

CiteSeer

Google Scholar

e-Lib, Lib 2.0 concept adding

into application, so search platform

provide open API for collecting more

data

Users participate and

interact with data and people

Add My Library, TagEx. Citeulike, BibSonomy

Add Comments, Rating, Recommendation

Ex. Techlens

Domain Focus GroupsEx. Botanicus

Windows Live Academic Search

PudMed

Arxiv

Citation indexPapers , journal/conference, authors

Page 46: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

46

An Example

Let’s use an example of TechLen to imagine what research on IR /NLP can do.

Authors Readers

Papers

Page 47: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

47

The Terminology

Alfred V AhoEntities

Alfred Aho AV AhoAho, A. V.References

LinksAlfred Aho, John Hopcroft, Jeffrey Ullman

AV Aho, BW Kernighan, PJ Weinberger

Entity Groups G1(Programming Languages)

G2(Databases)

G3(Algorithms)

Page 48: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

48

Imagine how we can make use of them

Papers

Authors

Readers

Comments

Rating

Reference Extraction

Entity Resolution

Page 49: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

49

New Research Topics From those changes, key emerging challenge for “Data Mining” is

tackling the problem of dealing with richly structured, finding patterns behind heterogeneous datasets, …, etc.

Several researches focus on those problem like (Social) Network Analysis Link Mining PASCAL Ontology Learning Challenge …

Page 50: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

50

Society

Nodes: individuals (Authors, Readers)

Links: social relationship (family/work/friendship/belong to,…etc.)

S. Milgram (1967)

Social networks: Many individuals with diverse social interactions between them.

John Guare

Six Degrees of Separation,

Science

source: www.cs.uiuc.edu/~hanj

Page 51: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

51

Communication networks

The Earth is developing an electronic nervous system, a network with diverse nodes and links are

-computers

-routers

-satellites

-Papers

-User IP

-Comments

-Response

-…

-phone lines

-TV cables

-EM waves

- Relations between artifacts

Communication networks: Many non-identical components with diverse connections between them.

source: www.cs.uiuc.edu/~hanj

Artifacts in Techlens

Page 52: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

52

Link-based Object Ranking Perhaps the most well known link mining task is that of link-based o

bject ranking (LBR), which is a primary focus of the link analysis community. The objective of LBR is to exploit the link structure of a graph to order or prioritize the set of objects within the graph.

Example PageRank What paper is most important in this area? What journal/conference is most important in this area? What topic is important in this area?

Page 53: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

53

Link-based Object Classification/ Link-based Classification (LBC)

Predicting the category of an object based on its attributes and its links and attributes of linked objects

Web: Predict the category of a web page, based on words that occur on the page, links between pages, anchor text, html tags, etc.

Citation: Predict the topic of a paper, based on word occurrence, citations, co-citations

Epidemic : Predict disease type based on characteristics of the people; predict person’s age based on ages of people they have been in contact with and disease type

Page 54: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

54

Group Detection Cluster the nodes in the graph into groups that

share common characteristics. That is, Predicting when a set of entities belong to the same group based on clustering both object attribute values and link structure.

Web: identifying communities Citation: identifying research communities

Page 55: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

55

Entity Resolution Predicting when two objects are the same,

based on their attributes and their links Web: predict when two sites are mirrors of each

other Citation: predicting when two citations are

referring to the same paper Epidemics: predicting when two disease strains are

the same Biology: learning when two names refer to the

same protein

Page 56: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

56

Link Prediction Predict whether a link exists between two

entities, based on attributes and other observed links Web: predict if there will be a link between two

pages Citation: predicting if a paper will cite another

paper, or predict the venue type of a publication (conference, journal, workshop) based on properties of the paper

Epidemics: predicting who a patient’s contacts are ( 在流行病學上需要去找出病源 (灶 )/ 傳染源 )

Page 57: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

57

Other Possible Research Directions

Expert Finding like giving a suggestion of Paper Reviewer,

Conference committee member Ecological Evolution of Some Research

Like one topic with different solution in a time period

A domain’s topic distribution

Page 58: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

58

GEO-Info 地理資訊

Page 59: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

59

GEO-Info

Google Earth/Map

GISLimited user, limited usage

Open for every one

Google Earth Community

Google Earth Blog

Ogle Earth ….

User Participate

GML

Photo-sharing User Annotation

Page 60: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

60

Some Research Topics Until now, a lot of information can be combined into g

oogle earth/map by KML. Hence such information can be integrated by geocodin

g, some models become very interesting, such as

Photo Annotation, Sharing, and Search Live information Planning 3D, Flights Animation Travel experience, comments Transportation information, survival information Climate Change

Page 61: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

61

Some Information bundled with Google Earth/Map ( 中山公園 )

Integrated with Youtube (video & tags)

Photo sharing, (photo & Tags)

Page 62: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

62

Some Application Integrate more Information on Map

Personal Life Information Integration

GeoDDupe: A Novel Interface for Interactive Entity Resolution in Geospatial Data

Page 63: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

6363

Photo link with Map

Source: http://www.panoramio.com

Page 64: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

64

Image-based Rendering (IBR)

IBR relies on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene.

Web 2.0 enables sharing of photographs on a truly massive scale

Page 65: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

65

Microsoft PhotoSynth

From SIFT to PhotoSynth

Page 66: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

66

Conclusion Research results can be easily integrated on the Web 2.0 platform make restricted-domain research more useful for the public (such

as image-based rendering) Software agent

Benefit human-based computation Certain research topics will be easier to tackle, such as personaliz

ation in virtual world (more data available) Data becomes more task oriented (e.g. Wikipedia) More versatile data networks available

Page 67: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

67

誠徵研究助理(歡迎替代役)1. 資訊相關研究所畢業。2. 具備研讀英文論文能力。3. 對 「中文自然語言處理」 (「自然輸入法」、「問答系統」 )或「生物資訊」(「生物資訊演算法」、「生物文獻檢索分析」)研究有熱忱。

4. 熟悉下列任一程式語言: C/C++/C#/JAVA 與問題解決能力

5. 應徵輸入法相關工作者具下列任一條件尤佳:WinCE/Win32 API。

6. 善於溝通與團隊合作。

Page 68: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

68

Acknowledgement

I would also like to thank two Ph. D. students of mine who help organize the slides: 李政緯,呂俊宏

Page 69: 1 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

69

Thank You