8
IR Project, Team 9 1 Information Retrieval Project Team 9 資資90522035 資資資 資資90522045 資資資 資資90522077 資資資

Information Retrieval Project

  • Upload
    duer

  • View
    43

  • Download
    1

Embed Size (px)

DESCRIPTION

Information Retrieval Project. Team 9 資研一 90522035 黃國瑜 資研一 90522045 何聰鑫 資研一 90522077 丁智凱. System architecture. CPU Speed PIII 1G RAM 256 Mb OS Win 2000 Programming php Database MySQL. Indexing method(1/3). Indexing Using lower case of letter Elimination of stopwords - PowerPoint PPT Presentation

Citation preview

Page 1: Information Retrieval Project

IR Project, Team 9

1

Information Retrieval Project

Team 9資研一 90522035 黃國瑜資研一 90522045 何聰鑫資研一 90522077 丁智凱

Page 2: Information Retrieval Project

IR Project, Team 9

2

System architecture CPU Speed

– PIII 1G RAM

– 256 Mb OS

– Win 2000 Programming

– php Database

– MySQL

Page 3: Information Retrieval Project

IR Project, Team 9

3

Indexing method(1/3) Indexing

– Using lower case of letter– Elimination of stopwords

Using hash table 317 word

– Removing punctuation mark– Removing letters with length less than 3– Removing <tag>

Page 4: Information Retrieval Project

IR Project, Team 9

4

Indexing method(2/3) Database Table

– IndexMap (Index, TermID, DocID, Line, Pattern)

– DocMap (DocID, FileName, DocTitle)

– TermMap (TermID, Term)

Page 5: Information Retrieval Project

IR Project, Team 9

5

Indexing method(3/3) Indexing Speed

– 130 sec/Mb– Total : 125sec * 490Mb=17 hr– E.q

File Name : FB496255 File Size : 997438 Total Term : 8523 Start : 1004540338.9145 sec End : 1004540464.1279 sec Total : 125.2134180069 sec

Page 6: Information Retrieval Project

IR Project, Team 9

6

Query(1/3) Interface

– Query– Insert New Data– Existed Data View– Help– Mail

Page 7: Information Retrieval Project

IR Project, Team 9

7

Query(2/3) Query

– Feature Multiple keyword query Title Query

– Speed Match String : 6448

– Search Time : 2.3293360471725 sec Match String : 239

– Search Time : 0.72075593471527 sec

( Base on speed of netwrok and result number)

Page 8: Information Retrieval Project

IR Project, Team 9

8

Query(3/3) Output

– Performance Match String Search Time

– Query Result File Name Document Title Line ( show 5 line ) # of Pattern ( Highlight Mark )