22
Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor Čakulev Intelligent Internet Search Department of Computer Engineering School of Electrical Engineering University of Belgrade POB 35-54, 11120 Belgrade Serbia, Yugoslavia [email protected]

Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Embed Size (px)

Citation preview

Page 1: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin,

Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor Čakulev

Intelligent Internet Search

Department of Computer EngineeringSchool of Electrical Engineering

University of BelgradePOB 35-54, 11120 Belgrade

Serbia, Yugoslavia

[email protected]

Page 2: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Problem statement

• Number of Internet presentations and Web servers grows exponentially• Variety of presentations grows, too

Search and retrieval of documents gets harder

• Existing tools do not give satisfactory results

Page 3: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Existing solutions

• Keyword search and document indexing - e.g. Altavista

• Following links - e.g. Spiders

+ search is exhaustive- too many keywords result in too few documents found, and vice versa- it requires a large database of indexed documents

+ fast, no indexing and no database

- it searches only a limited number of documents

+ possibility of changing the input parameters during the search

- poor evaluation function

Page 4: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Our solution

• Design of intelligent agents for Internet search• Two basic approaches:

1. Simulated annealing - inherently serial 2. Genetic algorithms - inherently parallel

• Character of the search: 1. Local search - following only the links of the input documents - Best First Search Algorithm 2. Global search - following the links of the input documents and occasionally mutating them - Genetic Algorithm

• Spider implementation:

2. Mobile 1. Static

Page 5: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Our research

• Essence: Creating a set of packages for experimenting in the domain of intelligent Internet search

• All written in Sun Java - JDK 1.1

• Lego approach - stand alone applications but easily interfaced with one another

• Code and executable version available at http://galeb.etf.bg.ac.yu/~ebi

• Further research in mobile domain

Page 6: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

• Measure the fitness value for each document in CC Set• Select the best one for the Output Set

Best First Search Algorithm• Select the initial WWW presentation or a set thereof • Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set

CC Set Output Set and add documents linked to it into the CC Set.

Input Set

Page 7: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Basic Genetic Algorithm 1. Initialize the population randomly pick a set of possible solutions

2. Select individuals for the mating pool measure the fitness value and pick the best ones

3. Perform crossover create new individuals using genetic material from parents in the mating pool

4. Perform mutation randomly create new individuals, completely unrelated to those in the mating pool

5. Insert offspring in the population

6. Is the stopping criteria satisfied? desired number of solutions is found or specified time for search has elapsed

No? GOTO Step 2 Yes? The end!

Page 8: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Genetic Algorithm applied to Internet Search• Select the initial WWW presentation or a set thereof • Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set

• Measure the fitness value for each document in CC Set

CC Set Output Set and add documents linked to it into the CC Set.

• Mutate - e.g. by inserting documents from the database of URLs

• Select the best one for the Output Set

Database

Input Set

Page 9: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Mutation operator

mutationoperator

generational selective

DB-based semantic

unsorted

topicsorted

indexed spatiallocality

temporallocality

typelocality

• Generational - generate a new URL

• DB based - pick existing URL from a database

• Semantic - use some logical reasoning to direct the search

Page 10: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Package #1 - Spider

• Spider - off-line browserAuthor: Saša Slijepčević [email protected]

• Fetches all linked documents up to the specified depth and stores them on the local disk in the structure suitable for off-line browsing

Page 11: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

• Agent - program for the Best First Search AlgorithmAuthor: Nela Tomča [email protected]

Package #2 - Agent

• Starts from the input set of URLs and finds the most similar to them following the links in input documents

Page 12: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

• Generator - program for generation of database of topic-sorted URLsAuthors: Mladen Mrkić [email protected]

Vladan Obradović [email protected]

yahooDatabase

Package #3 - Generator

• It fills the existing database with URLs obtained from www.yahoo.com as a result of a query submitted by the user, under the specified category

Page 13: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Package #4 - Pathfinder

• Pathfinder - program for discovering all servers with the same sufix as the one submitted by the user

Author: Igor Čakulev [email protected]

• Example: for galeb.etf.bg.ac.yu it gives orao.etf.bg.ac.yu; zmaj.etf.bg.ac.yu; buef31.etf.bg.ac.yu; kiklop.etf.bg.ac.yu ...

Page 14: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Package #5 - Tropical

• Tropical - program for performing genetic algorithm search with database mutation

Author: Jelena Mirković [email protected]

Database

• Repeating the Hong Kong experiment Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997.

Page 15: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor
Page 16: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor
Page 17: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Packages in progress - Space

• Space - program for performing genetic algorithm search with database mutation and occasional spatial locality mutation

Database

Page 18: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Packages in progress - Time

• Time - program for performing genetic algorithm search with database mutation and occasional temporal locality mutation

TopicDatabase

TimeDatabase

Page 19: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

CONTROL

LOGIC

Agent

Tropical

Space

Time

Generator

Input set

Current set

Output set

D

Key

D1

Pathfinder

Current System

Page 20: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

The Vision

CONTROL

LOGIC

Agent

Tropical

Space

Time

Generator

Input set

Current set

Output set

D

Key

D1

JC

SL

Pathfinder

Page 21: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

Newly open problems

• Too many linked documents imply high network traffic • Disk space consumed increases exponentially with the number of linked documents, while only small percent of them is found to be useful• Program is unable to learn

Future directions

• Implementation in mobile domain • Autonomous agents that transport themselves on the host computer and perform examination of documents there, transferring to the home computer only the best ones network traffic and disk usage decreases• Intelligent agents that remain active in the background able to learn and adapt to user’s needs

Page 22: Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor

References• Goldberg, D., Genetic Algorithms in Search, Optimization and Machine Learning, Addison- Wesley, Reading, Massachusetts, USA 1989.

• Milojičić S., Musliner D., Shroeder-Preikschat W "Agents: Mobility and communication", Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1998.

• Joerg P., Mueller "The Design of Intelligent Agents: A layered approach", Springer-Verlag, Germany, 1997.

• Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997.

• Kraus, L., Milutinovic, V., "Technical Report on a New Genetic Algorithm for Internet Search Based on Priciples of Spatial and Temporal Locality", Proceedings of the SinfoN '97, Zlatibor, Serbia, Yugoslavia, November 1997.

• Tomca, N., A Flexible Tool for Jaccard Score Evaluation, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, November 1997. Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997.

• Slijepcevic, S., A Programmable Agent for Internet Retrieval, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, October 1997. Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997.