22
1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴嚴嚴 )

1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

Embed Size (px)

DESCRIPTION

3 Motivation  Approaches of Page ranking PageRank [2] HITS (Hypertext Induced Topic Selection) [3]  Issues The number of links to a blog entry is generally very small. Some time is needed to develop a number of in-links and thus have a higher PageRank score.

Citation preview

Page 1: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

1

The EigenRumor Algorithm for Ranking Blogs

Advisor: Hsin-Hsi ChenSpeaker: Sheng-Chung Yen (嚴聖筌 )

Page 2: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

2

Outline Motivation Assumed Blog Structure Classification of Blog Ranking The EigenRumor Algorithm

Community model Scores Algorithm

Mapping to Blog community Experiments Related Works Conclusion Future Work References

Page 3: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

3

Motivation

Approaches of Page ranking PageRank [2] HITS (Hypertext Induced Topic

Selection) [3] Issues

The number of links to a blog entry is generally very small.

Some time is needed to develop a number of in-links and thus have a higher PageRank score.

Page 4: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

4

Assumed Blog Structure

A blog consist a top page and a set of blog entries. A blog is generally updated and maintained by a single blogger.

There are links from the top page of the blog to each blog entry and each blog entry has a permanent URI.

Blog entries are frequently added and the notification of updates is, as an option, sent to a ping server.

A mechanism to construct a trackback [3] is provided.

Page 5: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

5

Classification of Blog Ranking

Subject of ranking Space of ranking Temporal space of ranking Semantics of ranking Source of evaluations collected

Page 6: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

6

The EigenRumor Algorithm –Community model (1/2)

1

i

m

1

j

n

eij

Agents (1 ~ m) Objects (1 ~ n)

Information provisioning

Information evaluation

Page 7: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

7

The EigenRumor Algorithm –Community model (2/2)

When agent i provides (posts) object j, a provisioning link is established from i to j. When agent i evaluates the usefulness of an existing object j with the scoring value eij, an evaluation link is established from i to j. Provisioning matrix P = [pij] to represent all provisioning links in the universe. Evaluation matrix E=[Eij] to represent all evaluation links in the universe.

Page 8: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

8

The EigenRumor Algorithm –Scores

Authority score (agent property) This indicates to what level agent i provided

objects in the past that following the community direction.

Hub score (agent property) This indicates to what level agent i submitted

comments (evaluation) that followed the community direction on other past objects.

Reputation score (object property) This indicates the level of support object j

received from the agents.

Page 9: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

9

The EigenRumor Algorithm –Algorithm (1/4)

Assumptions The objects that are provided by a “good”

authority will follow the direction of the community.

The objects that are supported by a “good” hub will follow the direction of the community.

The agent that provide objects that follow the community direction are “good” authorities of the community.

The agent that evaluate objects that follow the community direction are “good” hubs of the community.

Page 10: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

10

The EigenRumor Algorithm –Algorithm (2/4)

Notations

EP

r

h

a

:Matrix Evaluationn Informatio :Matrix ngProvisionin Informatio

:Vector Reputation

:Vector Hub

:VectorAuthority

Page 11: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

11

The EigenRumor Algorithm –Algorithm (3/4)

...(7)

.matrix of eigenvaluelargest theis ;))1(( Where

...(6) ))1((

)1(

...(5) )1(

...(4)

...(3)

...(2)

...(1)

rSr

SEEPPS

rS

rEEPP

rEErPPr

hEaPr

rEh

rPa

hEr

aPr

TT

TT

TT

TT

T

T

Page 12: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

12

The EigenRumor Algorithm –Algorithm (4/4)

whileend

||||/

)1(

dotly significan changes while

)1,...,1(

)1,...,1(

)1()1(

)1()1(

2

)()()1(

)()()(

)0(

)0(

kk

kk

kkk

kTkTk

T

T

rEh

rPa

rrr

hEaaPr

r

h

a

Page 13: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

13

Mapping to Blog community (1/3)

The links from top page of the blog site to the blog entries => information provisioning links.

The links to blog entries in other blogs => information evaluation links.

(Forward) Trackback => the interest of the blogger.

(Backward) Trackback => be ignored, often generated by spamming.

Page 14: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

14

Mapping to Blog community (2/3)

The basic algorithm does not normalize information provisioning matrix P or information evaluation E.

Problem: Some user creates many blog accounts

and interlinks them, he/she can inflate the scores.

Page 15: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

15

Mapping to Blog community (3/3)

Solutions: Normalization function 1:

Normalization function 2 (longevity factor):agent. by the evaluated and provided objects ofnumber total theis and

...(8) 1' )..1,..1]('['

...(7) 1' )..1,..1]('['

ii

i

ijij

i

ijij

EP

EpnjmieE

PpnjmipP

[0,1] rangeh factor wit damping are ,created. waslink when time the:)( me,current ti :

...(10) ][

...(9) ][

..1

)(

)()()()(

..1

)(

)()()()(

xxtimet

eeE

ppP

nj

etimet

etimettij

tij

t

nj

ptimet

ptimettij

tij

t

ij

ij

ij

ij

Page 16: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

16

Experiments (1/3)

In the database of this system, 9280000 entries from 30500 blog sites (04/10/16 ~ 05/02/03).

Original: 1520000 (16.3%) entries have one or

more hyperlinks. 116000 (1.25%) entries are linked to

other blogs. 107000 (1.15%) entries are referred to

by other blogs.

Page 17: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

17

Experiments (2/3)

Applying EigenRumor algorithm: 36200 bloggers have at least one blog entry linked from other blogs. 28300 (9.28%) bloggers have nonzero authority scores => 862000 (9.28%) entries have nonzero reputation scores.

Page 18: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

18

Experiments (3/3)

Face-to-Face user survey (40 guests Feb. 2005)

Best result

EigenRumor In-link TFIDF Not determined

Queries 18 (45%) 2 (5%) 1 (2.5%)

19 (48%)

Page 19: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

19

Related Works

iRank Technorati provided a commercial

blog search. EigenRumor algorithm:

Agent-to-object, instead of page-to-page or agent-to-agent.

The normalization of link. Dynamic structure of links.

Page 20: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

20

Conclusion

The important feature of the algorithm is to widen the coverage of blog entries that are assigned a score by only from static link analysis.

Page 21: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

21

Future Work

The problem of spamming. How to choose a better ranking

algorithm for specific keyword?

Page 22: 1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )

22

References

[1] K. Fujimura, T. Inoue, and M. Sugisaki, “The EigenRumor Algorithm for Ranking Blogs,” Nippon Telegraph and Telephone, 10 May 2005.[2] S. Brin and L. Page, “The Anatomy of a Large-scale Hypertextual Web Search Engine,” In Proceedings of 7th International World Wide Web Conference, 1998. [3] Wikipedia, http://en.wikipedia.org/.