27
Collecting useful information from web with open source tools

Collecting web information with open source tools

Embed Size (px)

DESCRIPTION

my lightening talk slide at coscup 2011, taipei

Citation preview

Collecting useful information

from web with open source tools

@sammyfung

Hong Kong

First chairman of Hong Kong Linux User Group

opensource.hk webmaster

How does programmers

solve problemsin daily life ?程式員解決

現實問題的方法 ?

Coding!就是寫程式 !

a lot of popular web sites

running on II$ in Hong Kong.

香港很多大型網站都是用 II$

Very slow when you're using!當你在用的時候,就會很慢!

Visiting websites manually, repeatly for any latest update.

為了追蹤最新消息,人手重覆重瀏覽同一網站

Will you still addicted to plurk/twitter without

auto new response/reply alert ?

如果沒有自動新回應提示 , 你還會沉迷噗浪

和推特 ?

What do you need ?你需要甚麼 ?

Regular Expression

HTML Parser

Web Crawling Framework

scrapy.org

About Scrapy

written in python

x = HtmlXPathSelector(response)

torrent = TorrentItem()

torrent['url'] = response.url

torrent['name'] = x.select("//h1/text()").extract()

<h1>Hello World</h1>

all of above are available in

open source!以上所有的也有

開源軟件

Problem #1 a lot of popular web sites

running on II$ in Hong Kong.

develop a list of football matches live

on cable tv做了「電視足球直播時間表」

Problem #2 some web sites doesn't

provide data API.

Hong Kong Weather Info香港天氣

@weatherhk

Alerts of Tropical Cyclones in Northwest Pacific Ocean

@tctrack @tropicalhk

Path and Forecast of active tropical cyclone

Let's solve your own problems with

open source tools.所以多多利用開源軟件

來解決你生活上遇到的問題吧

 Thank you! 謝謝 !

solving problems with open source.

Thank you.