24
Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉劉劉 Dept. of Information Management Chung Hua University 劉劉劉劉劉劉劉劉劉劉

Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

Embed Size (px)

Citation preview

Page 1: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

Interactive Identification of Information Needs andIts Application to Medical Informatics

Rey-Long Liu

劉瑞瓏

Dept. of Information Management

Chung Hua University

中華大學資訊管理學系

Page 2: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

2

Outline

Introduction Information Need Identification (INI): What & Why Interactive INI

INEED: Incremental Mining for Interactive INI The profile miner The information need identifier

Empirical evaluation Application to Medical Informatics Conclusion

Page 3: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

3

Introduction

Information Need Identification (INI) for Information portals Online service guidance Internet search engines People finding

Interactive INI, which needs to consider Precision (P) Precision Effectiveness (PE) Recall (R) Recall Effectiveness (RE)

C

R

C

n2

2

C

n2

1

C

n1

2

C

n1

1

C

n

2

C

n

1

C

n

C

1

2

1

2

C

1

2

1

1

C

12

2

C

12

1

C

1

1

C

1

2

C

1

C

11

2

C

11

1

C

12

2

C

12

1

C

1

2

C

1

1C

1

2

1

2

C

1

2

1

1

C

1

C

1

2

1

2

C

1

2

1

1

C

1

2

1

2

C

1

2

1

1

C

n22

C

1

2

1

2

C

1

2

‧‧‧

Page 4: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

4

Introduction (Cont.)

Main Challenges Each information space has its own content and

structure. Each information space is intrinsically dynamic. Users are often unable (or unwilling) to precisely

express their information needs (INs). Their queries are often quite short.

Users prefer simpler and fewer interactions.

Page 5: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

5

INEED

(3) Information

Information Storage

Interface

Information Provider

(4) Information Required

Profile Miner

IN Identifier

INEED

Category Profile

(0)Content & Taxonomy

(2)Request

(1)Interaction

Page 6: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

6

The Profile Miner

Incremental profile mining

Given: The document d to be added to category c.Effect: Updating the profiles of c and related categories. Procedure:(1) While c is not the root of the text hierarchy, do

(1.1) For each distinct word w in d, do(1.1.1) If w is not a profile term for c, add <w, sw,c> to the profile of c (strength sw,c is unknown);

(1.2) For each pair <w, sw,c> in the profile of c, do(1.2.1) sw,c = P(w|c) (Bc / iP(w|ci));

(1.2.2) For each sibling b of c, update sw,b in the profile of b; (1.3) c father of c.

Page 7: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

7

The Profile Miner (Cont.)

f

Updating the profiles of related categories once a document is added

New document added to f

The s-values of the profile terms are updated ‧‧‧

‧‧‧

‧‧‧

‧‧‧

The s-values of the profile terms are updated

Page 8: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

8

The Profile Miner (Cont.)

經理人員

決策制定、協調整合

業務處

市場規劃、商品推展

管理處

內務行政、績效管理

研發處

整合評估、流程制定

行銷部

行銷文宣、廣告宣傳

客戶部

訂單管理、銷售分析

品保部

品質維護、產品測試

製造部

產品生產、設計製造

行政部

營運管理

資訊部

系統規劃、研發維護

人事課員工聘用、人才培育

會計課

帳目管理、預算編排

出納課

款項收付

電腦整合課

生產資訊、資訊運用

資訊管理課

系統管理、辦公室自動化

An example:

Page 9: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

9

管理處

內務、行政、管理

研發處

研發、生產、流程

品保部

品質、管理、測試

資訊部

資訊、系統、建置

電腦整合課

生產、整合、運用

……

……

……

生產管理之相關資訊 ?

The Profile Miner (Cont.)

經理人員

業務處

市場、規劃、銷售

行銷部

行銷、廣告、宣傳

客戶部

訂單、管理、分析

具有代表性 P(w|c) 高區別能力 P(w|c) * Bc/ iP(w|ci) 強

S=P(w|c) * (Bc / iP(w|ci)管理處

內務、行政、管理

研發處

研發、生產、流程

品保部

品質、管理、測試

資訊部

資訊、系統、建置

電腦整合課

生產、整合、運用

……

……

……

生產管理系統建置與維護

生產品質維護

context

Page 10: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

10

The IN Identifier

Page 11: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

11

The IN Identifier (Cont.)

(1) For each category c, HitScorec 0;(2) For each pair (w, c), where w is a word in the query Q and c is a category,

(2.1) If sw,c > 1 and Support(w, c) minSupport,(2.1.1) ns (sw,c – 1) / (number of siblings of c);(2.1.2) HitScorec HitScorec + ns TF(w, Q);

(3) S The set of all categories; (4) While the target category has not been identified and interaction is still allowed, do

(4.1) Let p1 and p2 be two pedigrees (in S) with the highest average HitScore;(4.2) Let t1 and t2 be the categories with the highest HitScore in p1 and p2;(4.3) Display t1 and t2 (and their basic information) for the user to select;(4.4) If either t1 or t2 is exactly the target, return the space under the target;(4.5) Else if neither t1 nor t2 is of interest, S S – {the categories under t1 and t2};(4.6) Else if both t1 and t2 are of interest, g ClimbUp(common ancestor of t1 and t2), and return the space under g;(4.7) Else

(4.7.1) Let t be the category that is of interest;(4.7.2) If t is a leaf, g ClimbUp(father of t), and return the space under g;(4.7.3) Else S {the categories under t};

(5) Return S;

Page 12: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

12

The IN Identifier (Cont.)

Finding two candidate categories for interaction

(1) (2) (3)

(4) (5)

p1

p2

t1t2

Page 13: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

13

The IN Identifier (Cont.)

Function ClimbUp(f), where f is a category to start climbing (1) If f is the root, return f;(2) While the target category has not been identified and interaction is still allowed,

(2.1) fsibling A sibling of f;(2.2) funcle A sibling of the father of f;(2.3) Display fsibling and funcle (and their basic information) for the user to select;(2.4) If either fsibling or funcle is exactly the target, return the target;(2.5) Else if neither fsibling nor funcle is of interest, return f;(2.6) Else if both fsibling and funcle are of interest,

(2.6.1) f grandfather of f;(2.6.2) If f is the root, return f;

(2.7) Else if fsibling is of interest, return father of f;(2.8) Else return {f, funcle};

(3) Return f;

Page 14: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

14

The IN Identifier (Cont.)

Generalization by climbing the hierarchy

Possible results of generalizationFinding two categories for generalization

fsibling

funclef

2.6

2.4

2.42.5

2.6

2.7

Page 15: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

15

Experiment

Experimental Data Source: Yahoo! (http://www.yahoo.com) Coverage: Computers & Internet, Society and

Culture, and Science Size: 214 categories; depth: 8 Training data: 2216 documents Test data: 168 queries extracted from another set

of site summaries

Page 16: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

16

Experiment (Cont.)

Each system could conduct at most 5 interactions for each query

System Description Note

INEED As described with two settings for minSupport: 0.001 and 0.0005.INEED-0.001

INEED-0.0005

BruteForceAs in most search engines, the whole information space is considered (no INI is conducted).

RandomCNThe system employs top-down navigation. At each level, two categories are randomly selected for the user to confirm.

Repeat 10 times

IdealCNThe system employs top-down navigation. At each level, the target is always in the candidates identified by the system.

NBThe output category is determined by the conditional probabilities of the query terms occurring the categories, with two feature set sizes: 5000 and 8000.

NB-5000

NB-8000

Page 17: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

17

Experiment (Cont.)

Precision BruteForce was poor Interaction is good for

precision INEED improved 14%~2

0% w.r.t NB0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

最大允許互動次數

Pre

cisi

on

INEED-0.001

INEED-0.0005

BruteForce

RandomCN

IdealCN

NB-5000

NB-8000

0.92

0.94

0.96

0.98

1

0 1 2 3 4 5

最大允許互動次數

Rec

all

INEED-0.001

INEED-0.0005

BruteForce 1

RandomCN

IdealCN

NB-5000

NB-8000

Recall INEED was good in both

precision and recall BruteForce and CN

achieved 100% recall INEED achieved 100%

recall using only 2 interactions

Page 18: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

18

Experiment (Cont.)

00.10.20.30.40.50.60.70.8

1 2 3 4 5

最大允許互動次數

Prec

isio

n-ef

fect

iven

ess INEED-0.001

INEED-0.0005

RandomCN

IdealCN

NB-5000

NB-8000

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

最大允許互動次數

Rec

all-

effe

ctiv

enes

s

INEED-0.001

INEED-0.0005

RandomCN

IdealCN

NB-5000

NB-8000

Precision-effectiveness BruteForce was excluded INEED improved more

(19%~32%) w.r.t. NB interactions by INEED were more effective

Recall-effectiveness INEED performed best INEED improved 2%~2

0% w.r.t. NB

Page 19: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

19

Experiment (Cont.)

0.92

0.94

0.96

0.98

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Precision

Rec

all

INEED-0.001

INEED-0.0005

BruteForce

RandomCN

IdealCN

NB-5000

NB-8000

0.448

0.64

0.418

0.646

0.469 0.4650.437 0.468

0

0.2

0.4

0.6

0.8

Precision Recall

INEED-0.001

INEED-0.0005

NB-5000

NB-8000

Precision vs.Recall BruteForec and CN

always achieved 100% recall

INEED performed best (its curve lied on the upper right corner)

When no interaction is allowed

INEED improved 38% recall w.r.t. NB

Precision of INEED improved 62% in the first interaction (NB only improved 29%)

Page 20: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

20

Experiment (Cont.)

Test query:Virtual world featuring 3-D ray-traced graphics. Wander around, meet other netizens, and try to solve some puzzles. Features animation and sound clips,

Correct target identified by INEED:Computers and Internet → Multimedia → Virtual Reality → Exhibits

Erroneous category identified by NB:Computers and Internet → Software → Operating Systems → Windows → Windows 95

An example:

Page 21: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

21

Application to Medical Informatics

Medical knowledge management People finding Knowledge finding

Medical information portal Online navigation guidance Cost-effective retrieval of information

Page 22: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

22

Application to Medical Informatics (Cont.)

Medical e-community Community establishment & retention Information recommendation

Medical decision support Assimilation of new cases Retrieval & analysis of similar cases

Page 23: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

23

Conclusion

Interactive INI as an essential component for the sharing, navigation, and recommendation of medical information and knowledge

INEED as an effective tool for interactive INI Exactly identify the information space that may satisfy the user’s

information needs Effectively interact with the user Intelligently reduce the user’s load in query formation and result

cognition

Page 24: Interactive Identification of Information Needs and Its Application to Medical Informatics Rey-Long Liu 劉瑞瓏 Dept. of Information Management Chung Hua University

24

ThanksThanks