Upload
sharleen-james
View
236
Download
1
Embed Size (px)
Citation preview
Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin,
Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor Čakulev
Intelligent Internet Search
Department of Computer EngineeringSchool of Electrical Engineering
University of BelgradePOB 35-54, 11120 Belgrade
Serbia, Yugoslavia
Problem statement
• Number of Internet presentations and Web servers grows exponentially• Variety of presentations grows, too
Search and retrieval of documents gets harder
• Existing tools do not give satisfactory results
Existing solutions
• Keyword search and document indexing - e.g. Altavista
• Following links - e.g. Spiders
+ search is exhaustive- too many keywords result in too few documents found, and vice versa- it requires a large database of indexed documents
+ fast, no indexing and no database
- it searches only a limited number of documents
+ possibility of changing the input parameters during the search
- poor evaluation function
Our solution
• Design of intelligent agents for Internet search• Two basic approaches:
1. Simulated annealing - inherently serial 2. Genetic algorithms - inherently parallel
• Character of the search: 1. Local search - following only the links of the input documents - Best First Search Algorithm 2. Global search - following the links of the input documents and occasionally mutating them - Genetic Algorithm
• Spider implementation:
2. Mobile 1. Static
Our research
• Essence: Creating a set of packages for experimenting in the domain of intelligent Internet search
• All written in Sun Java - JDK 1.1
• Lego approach - stand alone applications but easily interfaced with one another
• Code and executable version available at http://galeb.etf.bg.ac.yu/~ebi
• Further research in mobile domain
• Measure the fitness value for each document in CC Set• Select the best one for the Output Set
Best First Search Algorithm• Select the initial WWW presentation or a set thereof • Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set
CC Set Output Set and add documents linked to it into the CC Set.
Input Set
Basic Genetic Algorithm 1. Initialize the population randomly pick a set of possible solutions
2. Select individuals for the mating pool measure the fitness value and pick the best ones
3. Perform crossover create new individuals using genetic material from parents in the mating pool
4. Perform mutation randomly create new individuals, completely unrelated to those in the mating pool
5. Insert offspring in the population
6. Is the stopping criteria satisfied? desired number of solutions is found or specified time for search has elapsed
No? GOTO Step 2 Yes? The end!
Genetic Algorithm applied to Internet Search• Select the initial WWW presentation or a set thereof • Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set
• Measure the fitness value for each document in CC Set
CC Set Output Set and add documents linked to it into the CC Set.
• Mutate - e.g. by inserting documents from the database of URLs
• Select the best one for the Output Set
Database
Input Set
Mutation operator
mutationoperator
generational selective
DB-based semantic
unsorted
topicsorted
indexed spatiallocality
temporallocality
typelocality
• Generational - generate a new URL
• DB based - pick existing URL from a database
• Semantic - use some logical reasoning to direct the search
Package #1 - Spider
• Spider - off-line browserAuthor: Saša Slijepčević [email protected]
• Fetches all linked documents up to the specified depth and stores them on the local disk in the structure suitable for off-line browsing
• Agent - program for the Best First Search AlgorithmAuthor: Nela Tomča [email protected]
Package #2 - Agent
• Starts from the input set of URLs and finds the most similar to them following the links in input documents
• Generator - program for generation of database of topic-sorted URLsAuthors: Mladen Mrkić [email protected]
Vladan Obradović [email protected]
yahooDatabase
Package #3 - Generator
• It fills the existing database with URLs obtained from www.yahoo.com as a result of a query submitted by the user, under the specified category
Package #4 - Pathfinder
• Pathfinder - program for discovering all servers with the same sufix as the one submitted by the user
Author: Igor Čakulev [email protected]
• Example: for galeb.etf.bg.ac.yu it gives orao.etf.bg.ac.yu; zmaj.etf.bg.ac.yu; buef31.etf.bg.ac.yu; kiklop.etf.bg.ac.yu ...
Package #5 - Tropical
• Tropical - program for performing genetic algorithm search with database mutation
Author: Jelena Mirković [email protected]
Database
• Repeating the Hong Kong experiment Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997.
Packages in progress - Space
• Space - program for performing genetic algorithm search with database mutation and occasional spatial locality mutation
Database
Packages in progress - Time
• Time - program for performing genetic algorithm search with database mutation and occasional temporal locality mutation
TopicDatabase
TimeDatabase
CONTROL
LOGIC
Agent
Tropical
Space
Time
Generator
Input set
Current set
Output set
D
Key
D1
Pathfinder
Current System
The Vision
CONTROL
LOGIC
Agent
Tropical
Space
Time
Generator
Input set
Current set
Output set
D
Key
D1
JC
SL
Pathfinder
Newly open problems
• Too many linked documents imply high network traffic • Disk space consumed increases exponentially with the number of linked documents, while only small percent of them is found to be useful• Program is unable to learn
Future directions
• Implementation in mobile domain • Autonomous agents that transport themselves on the host computer and perform examination of documents there, transferring to the home computer only the best ones network traffic and disk usage decreases• Intelligent agents that remain active in the background able to learn and adapt to user’s needs
References• Goldberg, D., Genetic Algorithms in Search, Optimization and Machine Learning, Addison- Wesley, Reading, Massachusetts, USA 1989.
• Milojičić S., Musliner D., Shroeder-Preikschat W "Agents: Mobility and communication", Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1998.
• Joerg P., Mueller "The Design of Intelligent Agents: A layered approach", Springer-Verlag, Germany, 1997.
• Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997.
• Kraus, L., Milutinovic, V., "Technical Report on a New Genetic Algorithm for Internet Search Based on Priciples of Spatial and Temporal Locality", Proceedings of the SinfoN '97, Zlatibor, Serbia, Yugoslavia, November 1997.
• Tomca, N., A Flexible Tool for Jaccard Score Evaluation, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, November 1997. Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997.
• Slijepcevic, S., A Programmable Agent for Internet Retrieval, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, October 1997. Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997.