Click here to load reader

의료정보검색 (Information Retrieval) 2003. 9. 24. 최진욱. 2 정보검색이란 Information Retrieval 원하는 정보를 찾는 것 Data retrieval vs. Information retrieval

Embed Size (px)

Citation preview

  • (Information Retrieval)2003. 9. 24.

  • Information Retrieval Data retrieval vs. Information retrieval

  • What is IR?

    Information Retrieval is a science which deals with the knowledge representation, storage, organization and access of information items.

  • Need for Information Retrieval ? ?Index, search ? ? ?

  • NLM and Medline10 million articles3,500 journals since 1966PubMed, Internet Grateful Medhttp://www.nlm.nih.gov

  • IR InternetSearch EngineVocabulary SystemInformation ModelingFiltering and Classification Natural Language Processing.

  • InternetTCP/IP networkpublic(not free, but open to everyone)carrier of electronic mailconvenient to get free SWterabytes of informationdynamic rerouting

  • Telephone network

  • ?Another network

  • ARPAnet TCP/IP

    Network in early stage

  • Protocolrules of behavior

    : : TCP/IP2 widely used network protocols : computer network 100 TCP/IP

  • Internet AddressNetwork Host 32 bit8 ~ 24 bit

  • Internet Classes

  • E-mail [email protected]

  • URL

  • Hypertext

  • DNSDomain Name System(DNS) IP

    www.kyobobook.co.kr 203.112.118.101www.cnn.com 198.137.240.92

  • Search engine News group Mailing list

  • Internet Search Enginewww.yahoo.com www.yahoo.co.krwww.altavista.comwww.altavista.co.krwww.excite.comwww.lycos.com, www.lycos.co.kr

    www.dreamwiz.comwww.naver.com

  • Internet Search Engine

  • () . . . () . 1~2 .

    . . .

  • AND searchSearch for Monet AND RenoirSearch for +Monet +RenoirSearch for Monet RenoirAll the words option

  • OR searchSearch for UPS U.P.S.Search for UPS OR U.P.S.Search for UPS U.P.SAny of the Words option

    foreign policy vs foreign policy

  • NOT searchSearch for bugs life -antsSearch for bugs life NOT antsSearch for bugs life AND NOT ants

  • Near SearchKorea NEAR climateAltavista (advanced search)two terms within 10 wordsKorea NEAR climateLycos (advanced search)two terms within 25 words

  • USENET

    - USENET system - news group USENET news server in SNUnews serverin Melbourne

  • Newsgroup Search Engine

  • Mailing list automatic mailing programs LISTSERVMajordomocomputer & privacytravel, weather

  • IR Modeling

  • IR stepsText processingIndexinginverted filesignature fileOrganization in DBQuery processingEvaluation

  • Information-Retrieval ProcessContentDatabaseInformationNeedQueryResultIndexingQueryFormulationRetrievalEvaluationRefinement

  • Indexing Processdocumentaccentspacingstopwordnoungroupstemmingautomaticor manualindexingstructurerecognitionfull textindex terms

  • Classification of IRUser

    TaskRetrieval:AdhocFilteringBrowsingClassic ModelsbooleanvectorprobabilisticStructured Modelsnon-overlapping listsproximal nodesBrowsingFlatStructure GuidedHypertextSet TheoreticFuzzyExtended BooleanAlgebraicGeneralized VectorLat. Semantic IndexNeural NetworkProbabilisticInference NetworkBelief Network

  • Boolean Modelquery can be written in disjunctive normal formq = ka (kb kc)qdnf = (1,1,1) (1,1,0) (1,0,0)KaKcKb

  • Vector Model and Weight functionK = {, 2000cc, , , , kt}D1 = {20, 20, 11, 5, ,5}D2 = {20, 18, 12, 4, , 5}D30 = {0, 20, 12, 3, ,9}weight terms are assumed to be mutually independent !

  • Boolean vs. Vector modelPetroleum Mexico Oil Texas Refinery Ship

    (1 1 1 0 1 0)

    (2.8 1.6 3.5 3 3.1 1)Boolean

    Vector

  • Retrieval IssuesIndexinginverted fileRankingrelevance rankingchronology rankingDisplay(aspirin, prevention)(prevention, )(aspirin, attack, heat)

  • : 27750177 : 20022777035 : RC102: Brain CT (Pre contrast) : Infarction Of Posterior Cerebral Artery Territory : 2002-08-02 : BRAIN MRI + MRA [Finding] PVWM UBO underlying SVD . Left thalamus occipital lobe patchy high signalintensity FLAIR image . T1WI iso signal intensity portion Radiology ResultsInverted FileSignificant words onlyIndexing with Inverted File

    WORDDPWabort1,5aberrant 3brain7,3portion4,8topology5,8

  • Evaluation of IRRecall ( ) Precision

  • Word based indexing Contextmeaning is affected by meaning of other wordshigh, blood, pressurelow pressure at high altitude increase red blood cellPolysemylead vs leadSynonymyhypertension vs. high blood pressureGranularityantibiotics, penicillinFocus of ContentKey word vs Plain word

  • The End