Upload
elyssa-peek
View
220
Download
2
Embed Size (px)
Citation preview
1
Some Popular Portals
Yahoo! : www.yahoo.com Portals to the World from the Library
of Congress: www.loc.gov/rr/international/portals.html
AltaVista: www.altavista.com
5
Search Engines?
A search engine is a web site that uses software to browse the Internet.
A search engine will retrieve a listing of World Wide Web sites related to the key words you specify.
6
How Search Engines Work
Read pages they find on the web (spider)
Store text in an “index” When you search, they look for pages
with matching text Other factors involved in “ranking”
those pages, such as “link popularity”
7
Search EngineEngine INdexing Computer-driven search tool 1. Website owners submit web address of
their homepage for inclusion in the database
2. Robots periodically spider the Web, detect the homepage and proceed to scan every page in the entire website(The first 20-25 words on the homepage appear as the ‘result’ the user sees in the search engine)
8
Search Engines Crawler-based Search Engines
“Spiders” or “Crawlers” visit websites and some of their pages periodically, and adds to index
Scans links and adds them to their index Returns the information to the index or catalog Search engine software sifts the index and ranks
in relevant order Human-based Search Engines Mixed
9
Directories Vs Search Engines
When should you use a directory? When you have a broad topic When you want experts to recommend
sites When you want to avoid irrelevant sites Examples topics:
Disabilities Civil War Welfare
10
Directories Vs Search Engines When should you use a search engine?
When you have a narrow topic When you are looking for a specific
website When you want to search for a file type or
language Examples:
Americans with Disabilities Act Battle of Gettsyburg Welfare to Work
11
Start Your Search Engines Here
Google www.google.com AllTheWeb www.alltheweb.com Yahoo www.yahoo.com MSN http://search.msn.com Why? See: http://
searchenginewatch.com/links/major.html
12
Other Search Engine Types
News Search Engines Multimedia Search Engines Metacrawlers Kids Search Engines Regional Search Engines Scientific Search Engines http://
searchenginewatch.com/links/
13
Top Search Engines & Directories
GoogleYahoo!AllTheWebAltaVistaOpen DirectoryMSN SearchAbout
Ask JeevesWiseNutHotBotLookSmartTeomaAOL SearchiLOR
14
Google is the undisputed leader in search engines, with the largest database and highly relevant results
Uses an algorithm based on site popularity The more inbound links pointing to a
particular site from another site Google thinks is worthwhile, then that site will receive a higher page rank in the results
Wary of minimising advertising - no frills design, nice clean look and no pop-up ads
15
AllTheWeb & AltaVista AllTheWeb used to be a Norwegian search engine
FAST and for a while was one of the Web’s best kept secrets
AltaVista was the first search engine in 1995 and was THE search engine before Google existed
Recently, Overture acquired FAST and AltaVista This year, Yahoo! acquired Overture and Inktomi,
making Yahoo! the largest network of major search tools on the Internet
AllTheWeb & AltaVista’s future are now unknown, as many results are simply retrieved from Yahoo
16
Open Directory & Ask Jeeves Open Directory Project is the largest humanly-
compiled search directory on the Web As each website is considered for inclusion by a
human (many don’t make it) - quality is assured Ask Jeeves uses special natural language
technology, so the user can ask a complete question instead of inputting only a few words
It then searches its own database and supplements this with results from Teoma
Ask Jeeves is popular with young Web users
17
Understand Limitations of Search Engines
Search “spiders” or “crawlers” do *not* crawl in real time
Lag times getting info to the index vary by search engine
If a website is not submitted to the search engine it won’t be crawled
Not every page from a website is crawled A webmaster can choose to not have a page crawled Formats like PDF, Flash, Zip files, executable programs,
and others cannot be searched The “Invisible Web”
18
Evaluating Web Sites Continued…
Can you find this news reported on a legitimate news website?
Who is the sponsor of the website? Are there inconsistencies or
inaccuracies in the information? If an organization is mentioned by
name, does the organization have any related information on this website?
19
MetaMeta Search Engine
• Searches more than one search engine simultaneously (often up to fifteen)
• Each meta search engine normally searches a different combination of search engines
• Simultaneous multiple engine searching saves the user lots of time
• But meta search engines only skim the surface of each engine’s database and sometimes lack depth when searching for results
20
Top Meta Search Engines
KartooTurbo 10DogpileMammaRed Hot ChilliMeta EurekaWeb Taxi
VivisimoixquickiBoogieMetacrawlerSupercrawlerSearch.comQuery Server
21
Kartoo & Turbo10
இ These search tools cluster sets of results on similar topics and display them on the side frame
இ Kartoo is arguably the funkiest search facility on the Web, displaying results as a visual mind map
இ There’s a basic and expert version for searching இ Turbo10 is unique because it has a long list of
specialist databases on specific subjects இ Searches the Deep Net (others rarely go there)இ Users can also tailor their searching by selecting
unusual databases of their own choosing
22
Vivisimo, Dogpile & Mammaஇ Vivisimo also uses clustering technology and
allows users to choose their own search enginesஇ Dogpile was one of the earliest meta search
engines and remains very popular todayஇ It’s major advantage lies in its search engines:
Google, Yahoo!, Ask Jeeves, Teoma, About etc.இ Canadian-based Mamma began in 1996 as a
Masters thesis, arguably the first meta searchஇ Today it is a well respected search tool and like
Dogpile, searches the Web’s top engines such as Google, Open Directory, Teoma and others
23
Image ( + Meta) Searching Although pictures on websites often appear
neatly embedded amongst text, each image needs a unique URL, allowing picture searching
Google’s image search is one of the best on the Web, partly because of the size of its database
There are also excellent picture meta search engines: iBoogie, Dogpile, ixquick and 1Banana
picsearch is solely a picture search engine and markets itself as family and user friendly
24
Language Translation Google and AltaVista offer language translation Google will allow you to translate a foreign
language website or page and even allow you to link to the translated page from another website
AltaVista uses Babel Fish for its translation and you can also translate blocks of text
Some of the best websites on Bertolt Brecht and his Epic Theatre are actually in German, so this is an example of where translation tools are worthwhile if you speak another language
25
Useful Reference Tools You can find free dictionaries online, such as
Merriam Webster, Oxford, Macquarie, Cambridge and Dictionary.com
Most dictionaries also have a thesaurus tab The meta dictionary OneLook simultaneously
searches nearly 1,000 generalist and specialist dictionaries!
Some of the weirdest words out there are at the Strange and Unusual Dictionaries website
Or visit RyhmeZone’s Rhyming Dictionary & Thesaurus for a bit of fun!
26
More Reference Tools
If looking for the origin of a phrase or saying, try Brewer’s Dictionary of Phrase and Fable
There’s also the ClichéSite or the Hutchinson Dictionary of Difficult Words
Free encyclopaedias include Encyclopedia.com, Columbia, Encarta (partly free), Wikipedia, Hutchinson & the 1911 Encyclopaedia Britannica considered by many to be the best edition ever!
Way Back Machine has been archiving large portions of the Web since 1996, so if a website has suddenly disappeared, search for it here!
27
The Invisible Web
Web information that does not get indexed by the major search engines.
Hidden mostly in databases or have robot.txt file attached
Data created on the fly from the backend (cgi-bin, etc)
More than ¾ of information on the Web is part of the IW.
28
The Invisible Web – 4 Types
Opaque: search engines choose not to index
The Private Web: password protected The Proprietary Web: registration
required (either fee or free) The Truly Invisible Web: can’t search
certain file formats and databases