View
18
Download
0
Category
Preview:
DESCRIPTION
WWW servers and search engines. 2004, 劉震昌. Web browser and server. tools to read HTML document. client. server. Web browser. Web server (ex. 跑 IIS). send request. click a link. find document. display. return HTML document. Where is the web server?. Probing the Internet (cont.). - PowerPoint PPT Presentation
Citation preview
WWW servers and search engines
2004, 劉震昌
Web browser and server tools to read HTML document
Web browser Web server (ex. 跑 IIS)
client server
click a link send requestfind document
return HTML documentdisplay
Where is the web server?
Probing the Internet (cont.)
tracert, ping
router
source destination
www.yahoo.com.tw
packet
封包 ( 網路上資料傳輸單位 )
Probing the Internet (How do you know you are on Internet?)
ping www.yahoo.com.tw
Pinging rc.tpe.yahoo.com [202.1.237.23] with 32 bytes of data:
Reply from 202.1.237.23: bytes=32 time=4ms TTL=246Reply from 202.1.237.23: bytes=32 time=5ms TTL=246Reply from 202.1.237.23: bytes=32 time=4ms TTL=246Reply from 202.1.237.23: bytes=32 time=4ms TTL=246
Ping statistics for 202.1.237.23: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),Approximate round trip times in milli-seconds: Minimum = 4ms, Maximum = 5ms, Average = 4ms
The route from source to destination
tracert www.yahoo.com.twTracing route to rc.tpe.yahoo.com [202.1.237.23]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms gateway.lan20.csie.ncnu.edu.tw [163.22.20.254] 2 <1 ms <1 ms <1 ms ip253.puli01.ncnu.edu.tw [163.22.1.253] 3 <1 ms <1 ms <1 ms ip090.puli255-64-203.ncnu.edu.tw [203.64.255.90] 4 1 ms 1 ms 1 ms 140.128.251.38 5 17 ms 74 ms 2 ms tc-tanet-gw01.router.hinet.net [211.22.189.186] 6 2 ms 1 ms 1 ms 211.22.189.190 7 1 ms 1 ms 1 ms tc-c12r2.router.hinet.net [211.22.189.74] 8 4 ms 4 ms 4 ms tp-s2-c12r2.router.hinet.net [210.65.200.30] 9 4 ms 4 ms 4 ms tp-s2-c6r8.router.hinet.net [211.22.35.181] 10 9 ms 5 ms 6 ms 211.22.41.89 11 5 ms 5 ms 5 ms rc.tpe.yahoo.com [202.1.237.23]
Trace complete.
Lab#5 Try ping and tracert to access www.googl
e.com.tw Record your results in a text file Email to me with subject: Lab5 學號
動態 IP 如何架站 (WWW,ftp,…) ? DHCP (Dynamic Host Configuration Protoc
ol) DHCP 說明
IP:163.22.123.111
IP:163.22.123.123
.
.
.
If we want to communicate with hime, What’s the IP or domain name?
1. 自己架 DNS (domain name server)2. 動態註冊 IP 與 domain name
www.no-ip.com
IP:163.22.123.111動態 www.no-ip.com
DNS server
Kamiry.no-ip.com註冊 IP 與 domain name的對應
參考: No-IP 使用文件
安裝 IIS (internet information server)
在 Windows CD 片 安裝說明 IIS 設定 Microsoft IIS 太普遍,並且有很多安全漏洞,
可以使用非微軟的 WWW server Ex. Apache, analogx, … 參考文件
HW#3 在自己的電腦上架設 WWW server 將 server 的 domain name email 給我 將自己的個人網頁放到自己的電腦上 助教指定開機時間 server 必須開啟
Searching the Web
Ref: Chapter 13 in “Modern Information Retrieval”
Ricardo Baeza-Yates and Berthier Ribeiro-Neto
Outline Measuring the Web Methods for searching the Web
Search engines Web directories
Searching the Web WWW starts in 1989 Just the textual data is estimated to be
in the order of one terabyte Goal: how to efficiently manage,
retrieve and filter information from the Web?
Challenges Distributed data
Data spans over many computers interconnected without predefined topology
High percentage of volatile data 易變資料 40% of the Web changes every month
Large volume Unstructured and redundant data 重複資料
30% of Web pages are (near) duplicates Heterogeneous data
Different languages
Measuring the Web
Internet
URLsWWW
Webserver
*1998, 3M servers
No. of servers =1/10 no. of computers on Internet
3 百萬
Measuring the Web (cont.) 1998 5Kb per Web page on average 300M Web pages (3億… ) 300M * 5Kb = 1.5 Terabytes Grow at a rate of 20M pages per month
Growth of the Web
1996 1997 1998
100
200
300
Webpages Web
sites
Million
year
Methods for searching the Web
Search engines 搜尋引擎 Index the Web documents as a full-text d
atabase Alta Vista, Google, …
Web directories 入門網站目錄 Classify selected Web documents by subj
ect Yahoo!
Search engines concept 搜尋引擎
Model the Web as a database All queries must be answered without
accessing the Web pages
Userqueries database
Search engines (cont.) AltaVista (www.altavista.com)
20 multi-processor machines 130 Gb of RAM each Over 500 Gb of disk space each 75% resources on the query engine
The top search engines Foreign
Google ( www.google.com ) www.yahoo.com www.altavista.com Inktomi ( www.inktomi.com ) Statistics on search engines
www.searchenginewatch.com http://imt.net/~notess/search
Taiwan Yahoo!/Kimo uses google Openfind ( www.openfind.com.tw )(中正大學吳昇教授 ) Yam ( www.yam.com.tw )
Search engines (cont.) Centralized crawler-indexer
architecture
UserInterface
QueryEngine
Indexdatabase
users
Indexer
Crawler
Web
User Interface Query interface
Keywords Boolean operator
Answer interface Rank the searched pages
Statistics about the term occurrence within the document
Popularity Hyperlink information
UserInterface
QueryEngine
Indexdatabase
users
Indexer
Crawler
Web
Crawler Robots, spiders (蜘蛛 ), wanderers, wal
kers, and knowbots In spite of their name, the crawler runs
on a local system and sends requests to remote Web servers
Method: start with a set of URLs, and from there extract other URLs
Crawler (cont.)
How the Web is traversed, the index of a search engine can be thought as analogous to the stars in a sky Invalid links in search engines vary from
2% to 9% The current fastest crawlers are able
to traverse up to 10M Web pages per day (’98) 300M/10M = 30 days
Web directories 網站目錄 Classify the Web pages by categories Directories are hierarchical taxonomies
that classify human knowledge Yahoo! has close to 1M pages classified How to classify pages?
Pages has to submitted to the Web directories
Manually done by few people Automatic classification is not yet mature Not every page is classified
Some Web directories
Web directories URL Web sites(K) Categories
Yahoo! www.yahoo.com 750LookSmart www.looksmart.com 300 24Lycos Subjects a2z.lycos.com 50eBLAST www.eblast.com 125NewHoo www.newhoo.com 100 23Magellan www.mckinley.com 60Netscape www.netscape.com Snap www.snap.com
Lab about search engine Today 1:00~3:00
Final typing test 10/20 沒達到標準學期總分扣 10 分
Recommended