57
Sunday, August 29, 2010

Mapping the world with DataMapper

  • Upload
    ted-han

  • View
    767

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Mapping the world with DataMapper

Sunday, August 29, 2010

Page 2: Mapping the world with DataMapper

Ted Han です

Sunday, August 29, 2010

Page 3: Mapping the world with DataMapper

はじめまして!

Sunday, August 29, 2010

Page 4: Mapping the world with DataMapper

If you would like a copy ofthese slides the are here:

http://cl.ly/6233b0f56bb686e57b74

(or at http://twitter.com/knowtheory)

Sunday, August 29, 2010

Page 5: Mapping the world with DataMapper

Work8

What We Will8

Rest8

•Eight Hours for Work•Eight Hours for Rest•Eight Hours for What We Will!

Labor Rights

This may not be a pattern that hackers are all that familiar with.

Sunday, August 29, 2010

Page 6: Mapping the world with DataMapper

We trade our time and expertise for money at work for 8+ hours a day at work

Sunday, August 29, 2010

Page 7: Mapping the world with DataMapper

But now the 8 hours of our free time are just as valuable

to companies as our work time.

Sunday, August 29, 2010

Page 8: Mapping the world with DataMapper

Who collects your data?Do you know what data they collect?

What do you get in return?

Sunday, August 29, 2010

Page 9: Mapping the world with DataMapper

• Google: Gmail, Search• Apple: iTunes Genius• Amazon: Recommendation• Last.fm: Rec’s & Neighbors• Facebook: ??? (Your friends’ families’ crazy rants)

What do you get for your Data?

Sunday, August 29, 2010

Page 10: Mapping the world with DataMapper

Companies benefit from our data and can ask and answer questions about our behavior.

Sunday, August 29, 2010

Page 11: Mapping the world with DataMapper

We benefit indirectly, but why can’t we benefit

directly as well?

Sunday, August 29, 2010

Page 12: Mapping the world with DataMapper

We can, if we know where and how to look.

Sunday, August 29, 2010

Page 13: Mapping the world with DataMapper

Ruby can help!

Sunday, August 29, 2010

Page 14: Mapping the world with DataMapper

• Data Collection• Data Querying & Manipulation• Data Analysis

Basic Data Mining

Sunday, August 29, 2010

Page 15: Mapping the world with DataMapper

DataMapper will helpwith these things!

Sunday, August 29, 2010

Page 16: Mapping the world with DataMapper

It would be nice to analyzeour search histories, but...

Google doesn’t provide an API.

Sunday, August 29, 2010

Page 17: Mapping the world with DataMapper

But, we can search our Google Chrome histories!

~/Library/Application Support/Google/Chrome/Default/History

(make a copy of your History. sqlite3 dbs are easy to corrupt)

Sunday, August 29, 2010

Page 18: Mapping the world with DataMapper

Once we have a datasourcewe need to answer yes to

at least one of three questionsabout the format of our source.

Sunday, August 29, 2010

Page 19: Mapping the world with DataMapper

• Does a DataMapper Adapter already exist?• Can you write an adapter?• Can you write a scraper to import your data?

Sunday, August 29, 2010

Page 20: Mapping the world with DataMapper

Does a DataMapper Adapter already exist?

Yep! Google Chrome’s History is an sqlite3 database!

Sunday, August 29, 2010

Page 21: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Urls Table

CREATE TABLE urls(

id INTEGER PRIMARY KEY, url LONGVARCHAR, title LONGVARCHAR, visit_count INTEGER DEFAULT 0 NOT NULL, typed_count INTEGER DEFAULT 0 NOT NULL, last_visit_time INTEGER NOT NULL, hidden INTEGER DEFAULT 0 NOT NULL, favicon_id INTEGER DEFAULT 0 NOT NULL);

Querying requires us to map data out of our source. To do this we have to tell DataMapper what the source schema is.

Sunday, August 29, 2010

Page 22: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Url model (naive)class Url include DataMapper::Resource property :id, Serial # Integer, :key=>true property :url, String property :title, String property :visit_count, Integer, :default => 0 property :typed_count, Integer, :default => 0 property :last_visit_time, Integer, :required => true property :hidden, Integer, :default => 0 property :favicon_id, Integer, :default => 0 has n, :segments has n, :visits, :through => :segmentsend

Sunday, August 29, 2010

Page 23: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Url model (naive)class Url include DataMapper::Resource property :id, Serial property :url, String property :title, String property :visit_count, Integer, :default => 0 property :typed_count, Integer, :default => 0 property :last_visit_time, Integer, :required => true property :hidden, Integer, :default => 0 property :favicon_id, Integer, :default => 0 has n, :segments has n, :visits, :through => :segmentsend

Inline Validations

Sunday, August 29, 2010

Page 24: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Urls Table

CREATE TABLE urls(

id INTEGER PRIMARY KEY, url LONGVARCHAR, title LONGVARCHAR, visit_count INTEGER DEFAULT 0 NOT NULL, typed_count INTEGER DEFAULT 0 NOT NULL, last_visit_time INTEGER NOT NULL, hidden INTEGER DEFAULT 0 NOT NULL, favicon_id INTEGER DEFAULT 0 NOT NULL);

Database Constraints

Sunday, August 29, 2010

Page 25: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Sanity Check

The Schemata Match! now lets test.>> Url.first(:url => "http://rubykaigi.org/")=> #<Url @id=1294 @url="http://rubykaigi.org/" @title="RubyKaigi 2010, August 27-29" @visit_count=8 ... >>> Url.count=> 47007>> Url.count("visit_count.lt" => 1)=> 20 >> # wat.

Sunday, August 29, 2010

Page 26: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Url model (w/ Sanity)class Url include DataMapper::Resource property :id, Serial property :url, String, :format => :url property :title, String property :visit_count, Integer, :min => 1 property :typed_count, Integer, :default => 0 property :last_visit_time, Integer, :required => true property :hidden, Integer, :default => 0 property :favicon_id, Integer, :default => 0 has n, :segments has n, :visits, :through => :segmentsend

lets add some businessrule validations

Sunday, August 29, 2010

Page 27: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Data Manipulationclass Url include DataMapper::Resource property :id, Serial property :url, URI, :format => :url property :title, String property :visit_count, Integer, :min => 1 property :typed_count, Integer, :default => 0 property :last_visit_time, Integer, :required => true property :hidden, Integer, :default => 0 property :favicon_id, Integer, :default => 0 has n, :segments has n, :visits, :through => :segmentsend

require ‘dm-types’

Sunday, August 29, 2010

Page 28: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Data Manipulation>> u = Url.first("url.like" => "%rubykaigi%")=> #<Url @id=1294 @url=#<Addressable::URI:0x81c7a1b0 URI:http://rubykaigi.com/ @title="RubyKaigi 2010, August 27-29" @last_visit_time=12927095498867853 ...>>> u.url=> #<Addressable::URI:0x81c7a1b0 URI:http://rubykaigi.com/>>> u.url.host=> "rubykaigi.com" # oops, .org is canonical>> u.url.host = "rubykaigi.org"; u.url=> #<Addressable::URI:0x81ccfdf4 URI:http://rubykaigi.org/>

Sunday, August 29, 2010

Page 29: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Data Manipulation>> u = Url.first("url.like" => "%rubykaigi%")=> #<Url @id=1294 @url=#<Addressable::URI:0x81c7a1b0 URI:http://rubykaigi.com/ @title="RubyKaigi 2010, August 27-29" @last_visit_time=12927095498867853 ...>>> u.last_visit_time=> 12927095498867853 # wtf is this?

Sunday, August 29, 2010

Page 30: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Urls TableCREATE TABLE urls(

id INTEGER PRIMARY KEY, url LONGVARCHAR, title LONGVARCHAR, visit_count INTEGER DEFAULT 0 NOT NULL, typed_count INTEGER DEFAULT 0 NOT NULL, last_visit_time INTEGER NOT NULL, hidden INTEGER DEFAULT 0 NOT NULL, favicon_id INTEGER DEFAULT 0 NOT NULL);

Not a lot of clues here...Okay, it’s an integer time, but it’s also freaking huge:12927095498867853?

Sunday, August 29, 2010

Page 31: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

chromium/src/base/time.h

// Time represents an absolute point // in time, internally represented as// microseconds (s/1,000,000) since // a platform-dependent epoch. Each// platform's epoch, along with other // system-dependent clock interface// routines, is defined in time_PLATFORM.cc.

Sunday, August 29, 2010

Page 32: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

chromium/src/base/time_mac.cc

// Core Foundation uses a double second // count since 2001-01-01 00:00:00 UTC.// The UNIX epoch is 1970-01-01 00:00:00 UTC.// Windows uses a Gregorian epoch of 1601. // We need to match this internally// so that our time representations match across // all platforms. See bug 14734.// irb(main):010:0> Time.at(0).getutc()// => Thu Jan 01 00:00:00 UTC 1970// irb(main):011:0> Time.at(-11644473600).getutc()// => Mon Jan 01 00:00:00 UTC 1601

Examples already in Ruby? Nice.

Sunday, August 29, 2010

Page 33: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Url model v2 (lib types)class Url include DataMapper::Resource property :id, Serial property :url, URI, :format => :url property :title, String property :visit_count, Integer, :min => 1 property :typed_count, Integer, :default => 0 property :last_visit_time, ChromeEpochTime, :required => true property :hidden, Integer, :default => 0 property :favicon_id, Integer, :default => 0 has n, :segments has n, :visits, :through => :segmentsend

write ChromeEpochTime

Sunday, August 29, 2010

Page 34: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

chrome_epoch_time.rbmodule DataMapper class Property class ChromeEpochTime < Integer def load(value) return value unless value.respond_to?(:to_i) ::Time.at((value/10**6)-11644473600) end

def dump(value) case value when ::Integer, ::Time then (value.to_i + 11644473600) * 10**6 when ::DateTime then (value.to_time.to_i + 11644473600) * 10**6 end end end # class ChromeEpochTime end # class Propertyend # module DataMapper

Sunday, August 29, 2010

Page 35: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Data Manipulation>> u = Url.first("url.like" => "%rubykaigi.com%")=> #<Url @id=42846 @url=#<Addressable::URI:0x81e232f0 URI:http://rubykaigi.com/ @title="RubyKaigi 2010, August 27-29" @last_visit_time=Tue Aug 24 12:51:38 +0900 2010 ...>>> u.last_visit_time=> Tue Aug 24 12:51:38 0900 2010

Sunday, August 29, 2010

Page 36: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Histograms, yay! (Analysis)

hour_histogram = Hash.new(0)Visit.all.map do |v| hour_histogram[v.visit_time.hour] += 1end

Sunday, August 29, 2010

Page 37: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Over what span of time?

>> Visit.first.visit_time=> Fri May 28 17:04:39 0900 2010>> Visit.last.visit_time=> Thu Aug 26 01:51:32 0900 2010

Sunday, August 29, 2010

Page 38: Mapping the world with DataMapper

0

1000

2000

3000

4000

5000

6000

7000

8000

Midnight 3am 6am 9am Noon 3pm 6pm 9pm

Aggregate Browsing by Hour

Sunday, August 29, 2010

Page 39: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

More Histograms, yay!

ruby_doc = Url.all("url.like" => "%ruby-doc%"); hour_histogram = Hash.new(0)ruby_doc.visits.map do |v| hour_histogram[v.visit_time.hour] += 1

end

Sunday, August 29, 2010

Page 40: Mapping the world with DataMapper

0

12.5

25

37.5

50

Midnight 3am 6am 9am Noon 3pm 6pm 9pm

Aggregate Browsing for ruby-doc.org by Hour

Sunday, August 29, 2010

Page 41: Mapping the world with DataMapper

But what happens whenWe have a data source

which isn’t well behaved?

Sunday, August 29, 2010

Page 42: Mapping the world with DataMapper

"Does Edge have an anti-PS3 bias?"http://arstechnica.com/civis/viewtopic.php?f=22&t=62024

Last year a thread on Ars Technica titled "Does Edge have an anti-PS3 bias?" resulted in a flame war erupted between PS3 fans and Xbox360 fans over whether or not PS3 was receiving unfair treatment, particularly held up against a game's score on metacritic.com.

Sunday, August 29, 2010

Page 43: Mapping the world with DataMapper

Helpfully, the thread title is a testable hypothesis

Sunday, August 29, 2010

Page 44: Mapping the world with DataMapper

Are an review outlet’s aggregate game scores (dis)similar to the aggregate

Metascore for those same games?

Sunday, August 29, 2010

Page 45: Mapping the world with DataMapper

Unfortunately, Metascore also has no API.

Sunday, August 29, 2010

Page 46: Mapping the world with DataMapper

Time for the Poor Man’s API:HTML scraping :(

Sunday, August 29, 2010

Page 47: Mapping the world with DataMapper

Save me Nokogiri!

Sunday, August 29, 2010

Page 48: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Yeah, that’s not pretty.def scores_for(game) game_page = case when (game.is_a? String) begin Nokogiri::HTML(open(game)) rescue puts "[FAIL] Failed to open #{game}" break end when (game.is_a? Nokogiri::HTML::Document) game else raise StandardError, "you need to provide either a url, or a nokogiri document" end page_title = game_page.css('title').text junk, title, platform, year = page_title.match(/^(.+)\s*\((#{PLATFORMS.join("|")}): (\d+)\): Reviews$/).to_a title.strip! metascore = game_page.css('table#scoretable img').select{ |i| /Metascore:/ =~ i.attributes['alt'] }.first.attributes['alt'].to_s.split.last puts "[WIN] #{title} on the #{platform} (#{year}) has a score of #{metascore}" #review_count = game_page.to_s.match(/based on <b>(\d+) reviews/).to_a.last reviews = game_page.css('div.scoreandreview')

review_count = reviews.size checksum = game_page.to_s.match(/based on <b>(\d+) reviews/).to_a.last.to_i checksum_message = "Number of Reviews on the page not equal to the claimed number of reviews" raise StandardError, checksum_message unless review_count == checksum scores = reviews.map do |review| score = review.css('div.criticscore').text pub = review.css('span.publication').text [score,pub] end return { :title =>title.strip, :metascore => metascore, :platform => platform, :publish_year => year, :reviews => scores }end

Sunday, August 29, 2010

Page 49: Mapping the world with DataMapper

But it works!<3 Nokogiri

Sunday, August 29, 2010

Page 50: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Modelsclass Game include DataMapper::Resource

property :id, Serial property :title, String, :length=>255 property :platform, String property :release_date, DateTime property :esrb_rating, String property :metascore, Float property :review_count, Integer property :created_at, DateTime property :updated_at, DateTime class Review include DataMapper::Resource

property :game_id, Integer, :key => true property :review_publisher_id, Integer, :key => true property :score, Integer belongs_to :review_publisher belongs_to :game end class Developer include DataMapper::Resource

property :id, Serial property :name, String, :length => 255

has n, :games endend

class ReviewPublisher include DataMapper::Resource property :id, Serial property :name, String, :length => 255 has n, :reviews, :model => "Game::Review" has n, :games, :through => :reviewsend

Sunday, August 29, 2010

Page 51: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Student’s T-Test (Analysis!)def t_value(prop1, collection1, prop2, collection2) c1_std = collection1.std(prop1) c1_avg = collection1.avg(prop1) c1_count = collection1.count c2_std = collection2.std(prop2) c2_avg = collection2.avg(prop2) c2_count = collection2.count

(c1_avg - c2_avg) / Math.sqrt( (c1_std**2 / c1_count)+(c2_std**2 / c2_count))

end

Sunday, August 29, 2010

Page 52: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

PS3 Reviewers vs Metascore

outlets = ReviewPublisher.all("games.platform"=>"ps3")t_scores = outlets.map do |outlet| t_value(:metascore, outlet.games(:platform=>"ps3"),

:score, outlet.reviews("game.platform"=>"ps3"))end # .size => 140

significant = t_scores.select do |t| (t > 1.96 or t < -1.96) and not t.infinite?

end

low = significant.select{ |s| s < -1.96} # .size => 20high = significant.select{ |s| s > 1.96} # .size => 10

Sunday, August 29, 2010

Page 53: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

Xbox360 Reviewers vs Metascore

outlets = ReviewPublisher.all("games.platform"=>"xbox360")t_scores = outlets.map do |outlet| t_value(:metascore, outlet.games(:platform=>"xbox360"),

:score, outlet.reviews("game.platform"=>"xbox360"))end # .size => 169

significant = t_scores.select do |t| (t > 1.96 or t < -1.96) and not t.infinite?

end

low = significant.select{ |s| s < -1.96} # .size => 37high = significant.select{ |s| s > 1.96} # .size => 29

Sunday, August 29, 2010

Page 54: Mapping the world with DataMapper

• A example bullet point• Another example here• Some more as you want

What about Edge Magazine?

>> outlet = ReviewPublisher.first("name.like"=>"%Edge%")=> #<ReviewPublisher @id=36 @name="Edge Magazine">>> t = t_value(:metascore, outlet.games(:platform=>"ps3"), :score, outlet.reviews("game.platform"=>"ps3"))=> 5.10786212293491>> t > 1.96=> true # Edge has a PRO PS3 bias, not Anti!

Sunday, August 29, 2010

Page 55: Mapping the world with DataMapper

There are lots of other possibilities!What would you like to learn?

Sunday, August 29, 2010

Page 56: Mapping the world with DataMapper

Learn about DataMapper perhaps?http://www.datamapper.org

irc://irc.freenode.net#datamapper

Sunday, August 29, 2010

Page 57: Mapping the world with DataMapper

Thanks! ありがとう@knowtheory

[email protected]

Sunday, August 29, 2010