Research work 2011

======================================================================

Research Report

Fei Hao

2011‐1‐1

======================================================================

1. Done list

1) Determination of the Parameter in the Linear Threshold Model

Each node u chooses a threshold u at random from the interval [0,1]; This represents the

weighted fraction of u’s neighbors that must become active in order for u to become active.

Thus, the threshold u intuitively represent the different latent tendencies of nodes to adopt

the innovation when their neighbors do; The fact that there are randomly selected is intended

to model lack of knowledge of their values.

2) Extract the data set from a Zachary’s karate club network. It is a test network. It contains 34

nodes, 78 edges

Each node indicates the club member

Edge indicates the two members take the activities frequently.

Task 1: Based on proposed social potential, I calculated the shortest path and shortest

distance between any two nodes.

3) Seminar Preparation:

0

100

200

300

400

500

600

1 2 3 4 5

Distribution of Shortest Distance

Distribution ofShortest Distance

2.

Title: Co

Social Ne

Basic ide

influentia

Applicati

Top-k inf

• Commun

The comb

If CoEn

Commu

To do list

1) Formuli

2) Impleme

club netw

ommunity-bas

etworks

ea: Divide a n

al nodes withi

ion: Apply in

fluential users

nity Combina

bination entro

ntropy > θ, th

unity Combina

Cm

sm: Establish

nt the degree-

work.

sed Greedy A

network into

in communitie

fluence maxim

s from it.

ation

opy of commu

hen Combine

ation

the mathema

-based influen

Algorithm for

communities,

es.

mization algo

unity of Cl to C

Cl

atical model fo

nce maximizat

Mining Top-

, and then cho

orithm to Mob

Cm is defined

or my idea.

tion algorithm

-K Influential

oose commun

bile Social Ne

d as

m using the Zac

l Nodes in M

nities to find

etworks and to

achary’s karate

Mobile

top-k

o find

e

======================================================================

Research Report

Fei Hao

2011‐1‐15

======================================================================

3. Done list

1) Problem definition

Social Potential

Given a social network N=(V,E), where V={v1,v2,…vn}is the set of nodes, E is the set of edges.

The potential of a node vi is defined as:

n

j

d

j

n

ji

ij

emijv11

2

)( ）（

}6|{ ixdxj

Where mj is the mass of vj , describing activity of the node, reflects the influence range.

ijd is the shortest distance between node vj and vi.

Algorithm:

For each node in SN

{

Shortestpath[i][]=Dijistra(i);

For each node in SN

If I in shortestpath(k,j)<6

For n:=1 to N do

P(x)=0;

{

P(x):=P(x)+ Degree(i)/sum(Degree(i))*EXP(‐(shortestpath(k,j)/sigma)^2) // Social

Potential definition

}

}

Mathematical Model:

Input: G(V,E), threshold , influence degree NvbyactivedvI /#)( , target set size K,

parameter ]1,0[

Ka

ed

Initialize 0S

For i=1 to K

Choose

u=arg

[

Si 1

|( Sum

uI )(

The purp

social influe

arate Dataset

dges)

0

5

10

15

20

25

0

do

node u

g max *(u

]1,0

uSi

(|) SIS

N

Au // Inf

pose of this a

nce.

t ( Zachary’s

0

5

0

5

0

5

1 2

)1()(x

(||}){ SIu

luence degre

algorithm is to

s karate club

3 4

Sprea

))|() iSum

|)S // Influen

ee of the user

o find an opt

network. It i

5 6

d of Inf

nce Increase

r u.

imal paramet

s a test netwo

7 8

fuence

ter for max

ork. It contain

degre

ximization of

ins 34 nodes,

ee

78

1

2

3

4

5

6

7

8

Top 8

k

Closeness Ce

The closene

the shortest

Where, c(x,y

m

8 key nodes o

entrality

ss centrality

t paths from m

y)‐ a function

‐ the number

of the karate’s

34

1

33

3

2

32

4

24

CC(x) of mem

member x to

xCC(

n describing t

r of nodes in

s club networ

Degree

mber x tightly

all other peo

yxy

mx

,

)

he distance (

a network

rk

3

1

3

3

2

3

4

9

y depends on

ople in the so

Myyxc

m

),(

1

shortest dista

Social

34

1

33

3

2

32

4

9

the geodesic

ocial network

ance) betwee

l potential

c distance, i.e

en nodes x an

e.,

nd y.

k Closeness

1 1

2 3

3 34

4 32

5 9

6 14

7 33

8 20

4. To do list

3) Study the combined model with social potential and other metrics.

4) Study the correlation between the parameter ]1,0[ and the spread of influence.

5.

1

=========

=========

Done list

1) Estimatio

Simple w

(dbuv

New Est

w

uvb

2 ) Betweenn

Between

members

Members

connect w

===========

===========

on of influenc

way:

)(

1

v , d(v) is

timation:

)(

(deg

)(deg

vNw

ree

uree

ness Centralit

ness centralit

s.

s with high B

with each oth

===========

===========

ce between ac

uvb

the degree of

)(

)

w

ty

ty BC of mem

C are very im

her only thro

BC

===========

===========

ctive node u a

f u, which me

mber x pinpoi

mportant to th

ugh them. (B

jxixC ,)(

===========

===========

and inactive n

eans that for

nts to what e

he network b

Bridge role)

ij

Mjiij

b

xb,,

)(

===========

Rese

===========

neighbor v

inactive node

extent x is bet

ecause other

==========

earch Report

Fei Hao

2011‐2‐5

==========

e v.

tween other

rs actors can

t

)(xbij : the number of shortest paths from i to j that pass through x

ijb : the number of all shortest path between i and j.

m: the number of nodes in a network.

3) Random (Naïve method)

This method is to extract k nodes randomly from the network as our seed nodes.

In this case, it doesn’t consider about the importance of nodes in the network.

4) Experimental Dataset 1 (Small Dataset)

DataSet statistics:

Dataset Node Edge Average Degree

Karate Club 34 78 4.6

Degree Distribution

Degree Number of nodes

1 1

2 11

3 6

4 6

5 3

6 2

9 1

10 1

12 1

16 1

17 1

This dataset follows the power‐law distribution.

0

5

10

15

1 2 3 4 5 6 9 10 12 16 17

Number of nodes

Degree

Degree Distribution

Degree Distribution

Explanation of my idea:

A naïve approximate solution to influence maximization is to select the “most influential”

node at each step.

Social potential reflects the potential ability of influence other nodes.

Hence, I want to study them and try to combine them and introduce a tunable parameter to

optimal the seed nodes selection under the linear threshold model.

5) Experimental Dataset 2 (Large Dataset)

DataSet statistics:

Dataset Node Edge Average Degree

Email‐entron 36692 367662 10.02

0

5

10

15

20

25

1 2 3 4 5 6 7 8

Spread

of Influence

K

degree

social potential

PageRank

betweeness

closeness

random

0

2000

4000

6000

8000

10000

12000

1

32

63

94

125

159

198

234

293

383

552

Number of Nodes

Degree

Degree Distribution

DegreeDistribution

6. To do list

5) Study the combined model with social potential and other metrics in the small & large

dataset.

6) Study the correlation between the parameter ]1,0[ and the spread of influence.

7.

=======

=========

Done list

1) Further e

Traditio

which ca

closenes

Weak Po

However

potential

could tri

My idea

into two

Phase 1:

triggerin

Phase 2:

Where β

I’d like t

Suppose

===========

===========

explanation f

onal way: Un

an maximize

ss, betweenes

oint: these ap

r, these nodes

l node we sel

gger more in

a: To obtain th

phases.

: select some

ng the more n

: select rema

β is a tunable

to give an exa

e, one compan

===========

===========

for my idea ab

nder the influe

the influence

s.

pproaches ign

s cannot prop

ected in phas

active neighb

he k seed nod

nodes (for e

nodes.

ining nodes (

parameter.

ample to expl

ny want to rec

===========

===========

bout Influenc

ence diffusion

e increase wit

nore the node

pagate the inf

se I can accum

bors.

des from a soc

example: βk) i

k‐βk) from th

lain it.

cruit k emplo

===========

===========

ce Maximizat

n model, they

th some meas

es which has m

formation at t

mulate some “

cial network.

in terms of hi

he in terms of

oyee, an exc

===========

Rese

===========

ion Problem

y select top-k

surement, suc

more “social

ime t, but I th

“influence” fo

We divide

igher potenti

f maximal inf

cellent compa

===========

earch Report

Fei Hao

2011‐2‐19

==========

k influential n

ch as degree,

potential”.

hink most

for the future.

the selection

al for future

fluential incre

any may selec

==

t

odes

. It

n step

ease.

ct

some person who perform quite well before join the company, and also they consider

some person who have higher potential to develop. Because, they think that these person

can develop very well in the near future and bring much profits to the company.

I want to investigate the correlation between the various β and the influence spread.

2) Firstly, all of methods cannot guarantee to find the optimal seeds, and no algorithm can,

because the problem is NP-hard. It is precisely because we cannot analytically know

which one is better; we use simulations to compare their effectiveness.

3) If we remark the influence spread as f(S), S is the initial active set.

Problem is that: find a k-node set S to maximize f(S). f(S) is the objective function.

F(S) : properties

1) Non-negative

2) Monotone : f(S+v)>=f(S)

It is NP‐hard to determine the optimum for influence maximization for both independent

cascade model and linear threshold model.[Proved by KKT’2003]

3) Discussion on Influence Range parameter

As can be seen from the formula of social potential, σ is only a parameter to be

determined.

Potential Entropy (Generated by Shannon Information Entropy)

Where

n

iivZ

1

)( is a normalized factor.

Property: For any σ ∈ 0, ∞ , potential entropy H satisfies )log(0 nH ,

and H reaches the maximum value log(n) if and only if the social potential of nodes

are same.

Problem: find an optimal parameter can be changed into minimum of potential

entropy, it means minimize the uncertainty.

))(

log()(

)(1 Z

v

Z

vH i

n

i

i

8.

To do list

7) Formuliz

8) Find an e

especially

ze the propose

efficient and f

y for a large n

ed model with

fast algorithm

network.

h better explan

to calculate th

nations.

he influence raange paramete

er in our model

======================================================================

Research Report

Fei Hao

2011‐3‐5

======================================================================

9. Done list

1) Social Potential Calculation

Input:

Influence factor σ;

Number of users N;

Output:

Social Potential ）（ iv

For each node in SN G=(V,E)

Begin

Shortestpath[i][]=Dijistra(i)

For each node in SN

If i in shortestpath(i,j)<=6

For n =1 to N do

）（ iv =0

Begin

2)),(

(1 jkthshortestpa

ii eN

vv

）（）（

end

End

2) Formulation of idea

Input: G=(V,E), target set size K, parameter ]1,0[

Initialize 0S

// seed nodes set

For i=1 to K do

Choose node u

u=arg max ))()1()(*( xhxu // integrate the two kinds of nodes

]1,0[

uSS ii 1

influ

I

A

appr

scop

N

Ba

So, t

3

{)(xh

The purpos

uence.

Such as:

1) We can

2) Social

3) Social

4) Social

Influence Fac

According to

roximately 99

pe of a data o

ormal Distrib

ack to our so

n

jiv

1

）（

the influence

2/3

(Figu

{deg bree

se of this algo

n combine the

potential +cl

potential +be

potential+ ra

ctor Discussio

the property

9.7% of the va

object is roug

bution:

cial potential

j

ij )(

range of eac

re from: htt

,closenbased

orithm is to fi

e social poten

oseness

etweeness

andom

on

y of Gaussian

alues fall with

h equal to 3σ

l

n

jj em

1

ch node is abo

tp://en.wik

basedness

ind an optim

ntial with deg

function, as w

hin a margin.

σ.

d ij2

out

kipedia.org

|},d

al parameter

gree, social po

we know, for

‐‐‐“3σ criter

/wiki/Norm

for maxim

otential + deg

a Gaussian d

ion”. This is, t

mal_distribu

mization of soc

gree

distribution,

the influence

ution)

cial

e

3

20 , there is no interaction between nodes. // within 1 hop

3

22

3

2 each node could influence their 1-hop neighbor nodes.

// 1 hop~2hop

23

22 each node could influence their 2-hop neighbor nodes.

//2 hop~3hop

Let the σ corresponds the minimal potential entropy, i.e. Nll

3

2, then, we can

search the optimum σ in the region ( )3

2,

3

21 ll ）（ which satisfied with specified

precision.

Details:

If σ1 in ( )3

2,

3

21 ll ）（

,

σ2 in ( )3

21,

3

2 ）（ ll

,

σ3 in ( )3

22,

3

21 ）（）（ ll

,

if H(σ1)>H(σ2), H(σ3)>H(σ2) , then, we think )3

21,

3

2 ）（ ll

is our final

searching range.

10.

3) Trust Ma

Motivat

when a s

Contrib

networks

Main Co

Trust In

Multipl

If A

Task‐or

In G=(

additi

the in

where

TA.

VV ' ,

,' GG

Maxim

To do list

1) Do an ex

2) Study the

aximization in

tion: Finding

social networ

ution: Propo

s.

ontents:

nference Mech

lication

A trusts B with

riented Socia

(V,E), V indica

on, A={a1,a

dividual i has

e A(vi)A. A

. For a certa

VV ' Henc

, '' GGG

mum trust ro

xperiment usin

e strength of r

n Social Netw

maximum tru

k is oriented

osed the trust

hanism:

h Tab, B trusts

l Network

ates users, E i

a2,…am} is a s

s the skill j.

task T is defi

ain task, th

ce, this group

),( '' EV

utes identific

ng proposed id

relationship be

works

ust in social n

by certain tas

maximization

s C with Tbc,

indicates the

set of m skills

Each individu

ined as a sub

at can be per

p of individua

cation

dea.

etween users b

networks whi

sks.

n algorithm b

Then A trust

connection b

s. By denoti

ual is associat

set of skills re

rformed by a

ls forms a tas

by fuzzy set th

ich is particul

based on task-

s C with Tab*

between two

ing aj ∈Vi,

ted with a sm

equired to ac

group of ind

sk‐oriented so

heory.

larly importa

k-oriented soc

*Tbc

o users. In

we claim tha

mall set of ski

ccomplish the

ividuals,

ocial network

nt

cial

at

lls,

e task.

k,

11.

========

=========

Done list

1) Existed p

fuzzy adj

For example

R(x,y)=R

I think it has

In real-life w

So, we may e

Basic assum

close to b

For example

Trust(a1,a2)=

So, RTD(a

RT

RT describes

),( ji xxRT

===========

===========

previous wor

djacency matr

:

R(y,x)

some limitat

world, people

extend the fu

mption: If us

b. such as: the

:

=0.9 and trust

1,a2)=trust(a

txxT ji ),(

s the fuzzy tru

,( ij xxRT

===========

===========

rk[Matteo, 20

rix

tions, such as

usually expre

uzzy adjacenc

ser a trust b to

e strength of a

t(a2,a1)=0.2

1,a2)*trust(a2

xtxxt ji

0

(*),(

1

ust relations,

)

===========

===========

009],[Yagger,

s in trust netw

ess different r

cy matrix, i.e,

1.07.0

12.0

9.01

oo much, and

a and b (calle

2,a1)

recxx

a

ij ),

===========

===========

2008] focus

work.

relationship d

, the asymme

1

6.0

2.0

d b trust a too

ed Reciprocal

no

trusciprocal

jandigent

===========

Rese

===========

on symmetric

degree each o

tric relationsh

much, then,

l Trust Degree

trusto

somewithst

eachtrustj

===========

earch Report

Fei Hao

2011‐3‐19

==========

c relation usi

other.

hip.

we consider

e) is stronges

extente

hother

=

t

ng

a is

st.

RT=

106.014.0

06.0118.0

14.018.01

A fuzzy m-ary relation S on a single set X is a fuzzy subset of Xm defined as follows,

]1,0[]1,0[: k

trustno

extentsomewithtrustreciprocal

eachothertrustjandiagent

xxRT pmp

0

1

),.....( 1

)),(),.......,,((),.....( 1211 pmpmpppmp xxRTxxRTxxRT

Hence, we can calculate the m-ary reciprocal trust degree , furthermore, we can obtain the

who are most trust.

For example: RT(x1,x2,x3)=0.5

RT(x1,x2,x4)=0.533

…………..

…………………………

RT(x5,x6,x7)=0.2

2)

Dolph Degree Distribution

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11 12

Degree Distribution

Degree Distribution

Rem

incre

mark: From th

easing functio

his picture, ob

on. It fits to t

bviously, the i

he basic prop

influence spr

perties of

read function

)(x which is

)(x is a m

s proved in [K

monotone

KKT,2003]

This diagram shows the comparison results when the tunable parameter=0.3

12. To do list

3) Continue the experiment on other combination, and also make the appropriate analysis.

4) Find out more related work about asymmetric fuzzy adjacency matrix, especially solving the

social network problem.

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Influence Spread

K

Social potential+Closeness

Social Potential

Closeness

======================================================================

Research Report

Fei Hao

2011‐4‐2

======================================================================

13. Done list

1)

Yeast Protein Interaction

Node Edge Average Degree

1486 4406 16.38

Relative Large Data Set—Yeast (Degree Distribution)

Degree‐based Approach

0

200

400

600

800

1000

1200

1 3 5 7 9 11 13 15 19 21 25 29 38

Number of Nodes

Degree

Yeast Degree Distribution

Yeast DegreeDistribution

0

200

400

600

800

1000

10 20 40 60 80

Influence Spread

K

Degree

Degree

Closeness‐based Approach

Betweeness‐Based Approach

Comparison among various independent approaches

0

10

20

30

40

50

60

70

80

90

10 20 40 60 80

Influence Spread

K

Closeness

Closeness

0

100

200

300

400

500

600

700

800

900

1000

10 20 40 60 80

Influence Spread

K

Betweeness

Betweeness

Obviously, Closeness‐based approach is worst in influence maximization problem.

2) Combined Model

Case 1: Social Potential + Degree

Case2: Social Potential + Closeness

14. To do list

0

100

200

300

400

500

600

700

800

900

1000

10 20 40 60 80

Influence Spread

K

Degree

Closeness

Betweeness

0

200

400

600

800

1000

10 20 40 60 80

Influence Spread

K

Degree

Social+Degree

0

200

400

600

800

1000

10 20 40 60 80

Influence Spread

K

Closeness

Social+closeness

5) I will do the experiment with social potential+ betweeness.

6) Make some experimental analysis on my results.

15.

1) Ye

Co

I st

app

Influence

Spread

========

=========

Done list

east Interacti

ombined mo

tudied the d

proach

0

100

200

300

400

500

600

700

800

900

1000

Influence Spread

===========

===========

ion Network

del (Social+b

ifference of

10 20

===========

===========

betweeness)

Betweeness

40

K

===========

===========

Centrality A

60

===========

===========

Approach and

80

===========

Rese

===========

d Social Pote

Betweene

Social Pot

===========

earch Report

Fei Hao

2011‐4‐16

==========

ntial+Betwee

ess

tential+Betwe

=

t

eness

eeness

2) Small-size Dataset (Dolphin Social Network)

Degree-based Approach, Social Potential + Degree

Closeness-based Approach, Social Potential + Closeness

Betweeness-based Approach, Social Potential + Betweeness

0

10

20

30

40

50

60

1 2 4 6 8 10

Influence Spread

K

Degree

Social Potential+Degree

0

10

20

30

40

50

60

1 2 4 6 8 10

Influence Spread

K

Closeness

Social Potential+Closeness

Influ

16.

Influence

Spread

uence Spread

To do list

7) Consider

Like, we

8) Result An

0

10

20

30

40

50

60

Influence Spread

d Comparison

ring the differe

may consider

nalysis and Co

1 2

n Results usin

ent mass of no

r some attribu

omments

4

K

ng various alg

odes when cal

utes of each no

6 8

gorithms

lculating the s

ode

10

social potentia

Bet

Soc

al

tweeness

cial Potential+

+Betweeness

17.

1) Ra

I st

Influ

I stu

=======

=========

Done list

andom Algor

tudied the Ra

2) Small-size

uence Spread

died Rando

===========

===========

rithm Compa

ndom Algorit

e Dataset (D

d Comparison

om Algorithm

===========

===========

arison Results

thm in Influe

olphin Socia

n Results usin

m in Influence

===========

===========

s

nce Maximiza

al Network)

ng various alg

e Maximizatio

===========

===========

ation Problem

gorithms

on Problem

===========

Rese

===========

m

===========

earch Report

Fei Hao

2011‐5‐7

==========

==

t

2) W

3) iF

Im

G

m

18.

Writing a pape

Friend paper w

mproved the f

ive the gener

membership

To do list

9) Try to fin

10) Improve

er“Efficient T

work:

formulism of

ralized formu

functions

nish the writin

iFriend paper

Top‐K Market

design of me

la to determi

ng of “Influenc

r continuously

t Movers Min

embership fu

ine the mem

ce Maximizat

y.

ning for Socia

unctions for li

bership funct

tion Problem”

l Advertising

nguistic varib

tions with sim

paper .

”

bales.

mple trigular

======================================================================

Research Report

Fei Hao

2011‐5‐21

======================================================================

19. Done list

1) Formalized the membership functions of linguistic terms ‘Special friends’, ‘Good friends’,

‘General friends’, ‘E-buddy’ as a generalized form.

Formalized the membership functions of linguistic terms ‘Seniors’, ‘Peers’, ‘Juniors’, as a

generalized form.

Parameters are given more flexible, it depends on user’s preference.

2) Finish and correct the paper “Efficient Top-K Market Movers Mining for Social

Advertising”.

3) In the Linear threshold model, the threshold is a fixed value; it does not change during

the propagation of information. Actually, the threshold can be changed as the time

unfolds.

There is an intuition: if a node cannot be influenced by some active neighbors, then,

when its next neighbor try to influence current node, the threshold was changed due to

the historical activation of his neighbors, at this point, current node may decrease its

threshold to receive, or may improve its threshold to receive, or may keep the same

threshold value.

t=0 , threshold θ0

t=1, θ1

t=2, θ2

Dynamical Variable Threshold Model (DVT)

Hence, I will design a Dynamical Variable Threshold Model to simulate the propagation

procedure in the social networks.

In this model, S denotes the neighbor nodes which tried to active current node v, but failed.

vvv V

SkS

||

||*)(

k is a random value from K={‐1,0,1}.

Explanation for k values:

1) If k=1, the threshold begins to decay, it means the user reduce his/her requirements to accept

the new activation. In this case, the user can be easily influenced next timestamp.

2) If k=0, the threshold keeps the same value. It means this user is very unshakeable.

3) If k=-1, the threshold begins to increase, it means the user improve his/her requirements to

accept the new activation. In this case, the user is a diehard.

Step1: initially, randomly give the threshold for each node.

Step2: change the threshold dynamically after each activation failed.

Step3: repeat step 2 until no any nodes can be influenced, then stop.

In fact, Dynamical Variable Threshold Model is nothing to do with the order of activation.

Let’s suppose two nodes sequences 1T , 2T

},....,{ 211 ruuuT , },....,{ '2

'1'

2 ruuuT

vvvvv V

rSk

V

TSkTS

||

||*

||

||*)( 1

1

vvvvv V

rSk

V

TSkTS

||

||*

||

||*)( 2

2

Obviously, the )()( 21 TSTS vv , therefore, the variable threshold model is nothing to

do with the order of activation.

Good points:

Proposed Dynamical Variable Threshold Model could simulate the real‐life behavior of

information diffusion and status activation.

Mathematical Formulation:

Modified Constraint:

vvv V

SkS

||

||*)(

))()(()()( ''

''

'

' vTvSbb vvTu

vuvSu

vu

Theorem: The influence spread )(I under the linear threshold model is monotone and

submodular.

1) Obviously, influence spread )(I is non‐negative.

2) Monotone: )()( IvI

3)Submodular : to prove later.

20. To do list

1) Continue improve the writing of papers.

2) Prove three properties for proposed Dynamical Variable Threshold Model.

3) Want to submit to ICDM 2011.

====

21.

===========

=========

Done list

1） Study an

===========

===========

nother social n

===========

===========

network

===========

===========

===========

===========

===========

Rese

===========

=====

earch Report

Fei Hao

2011‐6‐4

==========

t

Here, to

fuzzy pa

D1

D2

D3

Fuzzy p

evaluate the

artial orderin

artial orderi

improved per

ng model to o

Degree

Improvemen

0.01347

0

0

ing relation m

R(

rformance fo

obtain the ran

nt

C

Im

7

0

0

matrix

ji xx

0

0

0

1

),

r various size

nk of perform

Closeness

mprovement

.121628

.020833

.097009

ji

ji

ji

ji

xx

xx

xx

xx

?8.

5.

0

1

e of network.

mance.

Bet

Imp

0.04

0.08

0

I utilized the

tweenness

provement

44287

87142

e

So, using above matrix, we can get following fuzzy partial ordering relation matrix

5.08.00

8.05.08.0

18.05.0

n

jjii xxRxR

1

),()(

R(D1)=2.3 > R(D2)=2.1 > R(D3)=1.8

Conclusions: From the 3 experiments, Random approach is the worst, a naïve method.

Then, closeness centrality-based approach is worse compared with other algorithms.

In addition, our proposed SPEMA: Social Potential+Degree outperforms the other

algorithms.

Remark: As the size of network increasing, proposed approaches could improve

the naïve method and other algorithms significantly.

2) Improved and completed the WWW rejected paper

0

0.5

1

1.5

2

2.5

D3 D2 D1

Improved Perform

ance

Data Sets

Improved Performance

Improved Performance

Modification points: Formulated the design of membership function. Also, i give the high

level description of problem and idea.

3) Improving the SIGMOD rejected paper.

22. To do list

4) Continue to check the mistakes and polish the new paper for ICDM 2011.

5) Re-design the user attractor model considering more factors.

6) Study the proposed diffusion model, especially, when does the user’s threshold increase/

decrease? (Perhaps: Using the Time series Analysis, based on historical time series data)

tS

)(, tvu Sptt

tv

tu Fail to active

Active node

Inactive node

Newly active Node

23.

T

a

L

L

=========

Done list

1) I proved

Let S be

and S2 b

influence

submodu

Lemma1

Proof:

Lemma2

σ S1

proof:

That is, σ(S1

and σ(S2) un

Lemma3: Sub

Let N be a fin

===========

d the propose

the target sub

be the subset o

e spread σ(S

ular.

1: Non-negati

f: Due to S1 is

2: Monotone

v σ S1

: First, we clai

(S+v)−σ(S),

1+v)−σ(S1) ≥

nder the linear

bmodular

ite set, a set

===========

ed approach S

bset of seed n

of the most p

(S1) andσ(S2

ive .

the subset of

,σ S2 v

im R(u) be the

by definition

≥ 0 and σ(S2

r threshold mo

of functionσ

===========

SPEMA satis

nodes, S1 be t

potential node

2) under the l

f S, and S2 is

σ S2 ,

e influenced n

n

2+v)−σ(S2) ≥

odel are mono

σ : 2N−→ R i

====

===========

fies three bas

the subset of

es. |S1|=k-λ

linear thresho

the subset of

nodes by node

≥ 0. Therefore

otone.

is submodula

===========

Rese

===========

sic properties

the most infl

k, and |S2|=

old model are

f S, this lemm

u, consider th

e, the influence

ar iff

===========

earch Report

Fei Hao

2011‐6‐18

==========

.

fluential node

λk. The

e monotone a

ma holds .

he quantity σ

e spread σ(S

==

t

es

nd

σ

S1)

Proof: Cons

quan

In a same

2) Do the e

approac

NetHEP

sider two sets

ntityσ(S1 + v)

way, we can

experiments o

ches.

PT (15233 no

S1(S2) and S

v) −σ(S1)(σ(

easily prove t

on two large

odes, 58891 e

S where S1, S2

(S2 + v) − σ

thatσ(S2+v)−

datasets usin

dges), NetPH

2 are the subse

(S2)), by defin

−σ(S2) ≥σ(S

g degree, clo

HY(37154 nod

ets of S, and co

ition

S + v) − σ(S

seness, betwe

des, 231584 e

consider the

S).

eenness

edges)

3) In the L

the prop

unfolds

There is

when its

the histo

threshol

threshol

Linear thresho

pagation of in

.

an intuition

s next neighb

orical activat

ld to receive

ld value.

old model, the

nformation.

: if a node

bor try to infl

tion of his ne

, or may imp

e threshold is

Actually, the

cannot be in

uence curren

eighbors, at

prove its thr

s a fixed valu

e threshold ca

nfluenced by

nt node, the t

this point, cu

reshold to re

ue; it does not

an be changed

some active

threshold wa

urrent node

ceive, or ma

t change duri

d as the time

e neighbors, t

as changed d

may decreas

ay keep the s

ng

then,

ue to

se its

same

t=0 , threshold θ0

t=1, threshold θ1

t=3, thresholdθ2

Dynamical Variable Threshold Model (DVT)

Hence, I will design a Dynamical Variable Threshold Model to simulate the propagation

procedure in the social networks.

In this model, S denotes the neighbor nodes which tried to active current node v, but failed.

vvv V

SkS

||

||*)(

k is a random value from K={‐1,0,1}.

Explanation for k values:

4) If k=1, the threshold begins to decay, it means the user reduce his/her requirements to accept

the new activation. In this case, the user can be easily influenced next timestamp.

5) If k=0, the threshold keeps the same value. It means this user is very unshakeable.

6) If k=-1, the threshold begins to increase, it means the user improve his/her requirements to

accept the new activation. In this case, the user is a diehard.

Step1: initially, randomly give the threshold for each node.

Step2: change the threshold dynamically after each activation failed.

Step3: repeat step 2 until no any nodes can be influenced, then stop.

In fact, Dynamical Variable Threshold Model is nothing to do with the order of activation.

Let’s suppose two nodes sequences 1T , 2T

},....,{ 211 ruuuT , },....,{ '2

'1'

2 ruuuT

vvvvv V

rSk

V

TSkTS

||

||*

||

||*)( 1

1

vvvvv V

rSk

V

TSkTS

||

||*

||

||*)( 2

2

Obviously, the )()( 21 TSTS vv , therefore, the variable threshold model is nothing to

do with the order of activation.

Good points:

Proposed Dynamical Variable Threshold Model could simulate the real‐life behavior of

information diffusion and status activation.

Mathematical Formulation:

Modified Constraint:

vvv V

SkS

||

||*)(

))()(()()( ''

''

'

' vTvSbb vvTu

vuvSu

vu

Theorem: The influence spread )(I under the linear threshold model is monotone and

submodular.

1) Obviously, influence spread )(I is non‐negative.

2) Monotone: )()( IvI

3)Submodular : to prove later.

24. To do list

1) Continue the experiments using SPEMA on two large datasets.

2) Formulize the proposed diffusion models Time-dependent comprehensive cascade model (TCC

model) and dynamical variable threshold model(DVT model).

====

25.

===========

=========

Done list

1) I proved

To prove

using Lagran

We do the

===========

===========

d the two prop

the H(σ) c

nge Multiplie

e partial deriv

===========

===========

perties of soci

can reach the

er Method

vations on var

===========

===========

ial potential e

maximum va

riables p1, p2,

===========

===========

entropy H(σ

alue. I solve t

, ….pn of abo

===========

Rese

===========

).

the extreme p

ove equation,

=====

earch Report

Fei Hao

2011‐7‐2

==========

problem of H

let

t

(σ)

then, we g

Hence,

= φ(vi) Z

Obviously

= log(n).

et the equati

ℓ = 1+log(p1)

, so, φ(v1) =

y, whenφ(v1

ons as follow

) = 1+log(p2)

= φ(v2) = · · ·

) = φ(v2) = ·

ws,

) = ……….=

= φ(vn).

· · = φ(vn),

= 1+log(pn), i.

H(σ) reaches

e., p1 = p2 = ·

s the maximu

· · · = pn. Sinc

um value, i.e.,

ce pi

, H(σ)

Conc

3) I f

4) G

26.

clusions: 1) A

re

2) Random ap

finished the m

Give the forma

To do list

3) Improve th

As the size of n

eflects much a

pproach is alw

modification o

alization of the

he presentation

NetHEPT (15

NetPHY (37

network increa

advantage.

ways the worst

f WWW rejec

e two modified

n of the paper

5,233 nodes, 5

7,154nodes, 23

ases, our prop

t method.

cted paper.

d diffusion mo

r draft for ICD

58,891 edges)

31,584 edges)

posed approach

odels: TCC m

DE 2012.

)

h (Combing th

model and DV

he social pote

VT model

ential)

====

27.

===========

=========

Done list

1) Correcte

Chung ‘s

2) Formatte

happene

3) Actually

experime

1) Form

2) I giv

two

3) Frien

of tw

4) As f

mem

===========

===========

ed my paper a

s comments a

ed the submis

d in my pape

y, I have chan

ents results.

mulated the p

ve the vector-

users.

ndship can be

wo cases to m

for the friends

mbership is de

===========

===========

according to t

and suggestio

ssion paper w

r.

ged a lot for t

Hence, i wa

problem defin

-valued friend

e defined on

measure the st

ship defined o

efined.

===========

===========

the comments

ons.

with ICDE gu

the contents o

ant to do more

nition, give th

dship represen

single domai

trength of frie

on multiple d

===========

===========

s of KAIST la

ideline, chang

of WWW rej

e experiment

he preliminari

ntation for str

n or multiple

endship.

domains, an a

===========

Rese

===========

ang. center as

ged a lot of m

jected paper

s to impleme

ies of comput

rength of frie

domains. I g

aggregated de

=====

earch Report

Fei Hao

2011‐7‐16

==========

s well as prof

mistakes

r except the

ent my idea.

uting with wo

endship betwe

give the meth

egree of

t

f.

ords.

een

ods

28.

To do list

1) I will conti

2) I will impl

inue to check

ement my new

the mistakes o

w idea of www

of ICDE pape

w rejected pap

er.

per.

29.

=======

=========

Done list

1) Here, I ‘

members

Example:

Suppose t

application

Bob’s wall.

The degre

Suppose the

w_{SS^{neg

Hence, the a

===========

===========

d like to give

ship )(* x

two users A

ns, 20 mutua

Both of them

ees of membe

e weights are

g}}=0.1, W_

aggregated de

===========

===========

e an example

is calculated

lice and Bob

al friends. Ali

m are with ma

ership on diff

given by user

_{SD}=0.1.

egrees of mem

===========

===========

to explain th

d as follows,

b in a socia

ice posts 100

aster degree

ferent attribu

r w_{RS}=0

mbership:

===========

===========

e proposed ag

al network,

0 positive te

.

utes are as fo

.2, W_{MF}=

===========

Rese

===========

ggregated deg

they have 5

rms and 8 n

llows,

=0.4, w_{SS^

===========

earch Report

Fei Hao

2011‐8‐6

==========

grees of

5 common s

negative term

^{pos}}=0.2,

==

t

social

ms on

,

Hen

max

2) I design

single-do

nce, we know

ximum degree

ned the initial

omain modul

that Bob is a

e of members

l system of iF

le and friends

a special frien

ship, , ie.

Friend, it inc

ship defined o

nd of Alice ac

ludes the frie

on multiple d

ccording to th

endship define

omains modu

he principle o

ed on

ule .

f

A: I complet

B: I complet

applicat

3) For two

of K in tw

Trending

Step1: Le

Step2:

Step3: Using

Her

and

ted the imple

ted the imple

tions” ,“mutu

proposed mo

wo dynamic

g structure se

et X={x1,x2,…

Then, we c

follows,

g pattern mat

re, we calcula

d other seque

ementation o

ementation o

ual friends” a

odels, TCC m

information d

equence defi

….xn} be time

an construct

tching approa

ate the mathe

ences.

of the friends

of the friends

and “social di

model and DV

diffusion mod

inition:

series, tren

the predicta

ach to predic

ematical exp

ship defined

ship defined

stance”.

VT model, I gi

dels.

nding structu

ble informati

ct the K.

ectation of h

on single dom

on “the num

ive the solutio

ure sequence

on system fo

amming dista

main.

mber of comm

on for predict

{δ1, δ2… . , δ

or two models

ance betwee

mon

tion

δn}.

s as

n Sq

Step 4: Fin

1) TC

Th

fol

2) DV

ally, our prop

CC model

he influence p

llows,

VT model

posed models

probability fro

s can be form

om active ind

mulized as foll

dividual to ina

ows,

active neighb

bor is defined

d as

30.

Th

To do list

3) Continue th

domains.

4) For propos

dynamica

5) Read the

to figure

he diffusion th

he implement

.

sed two modif

al influence m

paper: “Sentim

out the correl

hreshold will

tation of iFrie

fied dynamica

maximization p

ment propagat

lation between

be changed d

end. Especially

al diffusion mo

problem based

ation in social

n sentiment an

dynamically

y, the friendsh

odels, I will gi

d on proposed

networks: a c

nd social influ

hip defined on

ive the proble

d diffusion mo

ase study in L

uence.

n multiple

em statement o

odels.

LiveJournal”. T

of

Try

====

31.

===========

=========

Done list

1) Sentimen

Motivat

a) How

b) Doe

c) Wha

Contribu

1) Fo

2) Qu

3) Ide

In a wo

2) Sentimen

My idea bas

Since work1

detect the s

I think that s

Finding sen

Motivation

===========

===========

nt propagation

tion:

w do individu

es sentiment p

at different ro

ution:

ormally define

uantify and pr

entify feature

ord, this pape

nt community

sed work1 an

1 has proved

entiment com

some good is

ntiment leade

ns: If a comp

can contro

has a ne

===========

===========

n in social netw

uals influence

propagate and

oles do indivi

e and study th

redict the occ

es that result i

er proved tha

y detection in

nd work2:

that sentime

mmunities in

ssues can be

er in social ne

pany could fi

ol and adjust

egative sentim

===========

===========

works: a case

e each other in

d how does se

iduals play in

he propagatio

currence of a

in a sentimen

at the sentim

n social netwo

ent can prop

social netwo

investigated

etworks.

nd the sentim

t their marke

ment on a

===========

===========

study in Live

n social netw

entiment prop

n propagation

on of sentimen

sentiment pro

nt propagation

ment can prop

orks

agate, and w

orks.

based work1

ment leader

eting strategi

certain prod

===========

Rese

===========

Journal

orks?

pagate?

?

nt in social n

opagation

n

pagate in soc

work2 give a f

and work 2.

in social net

es. For exam

duct, the co

=====

earch Report

Fei Hao

2011‐8‐20

==========

networks

cial networks

formal metho

tworks, then

mple, when us

ompany may

t

.

od to

they

ser A

y not

convince user A to spread, also will not introduce the new products to user A.

By contrast, if user A has positive sentiment on a certain product, the

company will give some benefits to A and induce A to spread the information

to A’s friends. As we know, users influence each other due to the sentiment

propagation. Hence, finding the sentiment leader is becoming important. It is

helpful to marketing and information diffusion.

Problem: Finding the sentiment leader in sentiment communities. There is an

assumption that there exists a sentiment leader who can disseminate his

or her sentiment to their friends, and their friends will disseminate their

sentiment to their friends’ friends and so on.

Solution Framework:

There are two kind of sentiment representations.

1) Discrete sentiment value-based

2) Continuous sentiment value-based

Technical route:

1) Discrete sentiment value-based

For example: a) positive and negative sentiment b) 5- stars scale rating system (very

bad, bad, neural, good, and very good)

In this case, a social networks can be represented as G(V,E,S), S : each user holds

certain sentiment si towards a particular product or topic.

Formulation for Social Sentiment Network:

G=(V,E,S) where V:{v1,v2,…vn} each node vi represents a user.

E:{eij} eij represents a relationship between two users vi and vj,

S:{s1,s2,…sn}, each user vi holds certain sentiment si towards a

particular product or topic.

In Discrete sentiment value‐based system, si={positive, negative}, si={a,b,c,d,e}(5‐star

Sentiment

Community

Detection

Finding

Sentiment

Leader

Graph Topology

Information

scale rating system)

I.E, problem is convert to finding the leaders in the social sentiment network.

2) Continuous sentiment value-based

For example, sentiment is quantified using a value. This value is not discrete. Hence,

when we detect the sentiment community, we should design a new detection

method to get the communities.

Formulation for Social Sentiment Network:

G=(V,E,S) where V:{v1,v2,…vn} each node vi represents a user.

E:{eij} eij represents a relationship between two users vi and vj,

S:{s1,s2,…sn}, each user vi holds certain sentiment si towards a

particular product or topic.

In Continuous sentiment value‐based system, si=(0,1]

Community detection

We should consider the mechanism of sentiment propagation in social networks.

Propagation Intuition: if the overall mood of a user A is closer to the overall mood of

the community C. Then, user A belongs to C.

Evaluation:

I will compare the sentiment leader finding algorithm by degree, closeness, betweenness,

and random, as well as social potential.

3) Wrote the section of the prediction of parameter K in proposed influence diffusion models

TCC, DVT models.

Sentiment

Community

Detection

Finding

Sentiment

Leader

Graph Topology

Information

Sentiment

Propagation Model

32. To do list

6) Give the specific idea of finding sentiment leader in social networks. Study the discrete

sentiment value-based, leaders’ identification problem.

7) Result analysis on iFriend paper.

8) Read paper: “Identifying Opinion Leaders in the Blogosphere” CIKM 2007.

======================================================================

Research Report

Fei Hao

2011‐9‐3

======================================================================

33. Done list

3) Finish the writing of paper “Influence Strength Aware Diffusion Models for Dynamic

Influence Maximization in Social Networks”.

The focus of this paper is proposing two modified diffusion models. They mainly study

the dynamics of the information propagation. But, the traditional Independent cascade

model is a kind of decay model as the information propagation. My model is more

reliable to the realistic network. It should not be a decay information diffusion model

while it should be a Time-dependent comprehensive cascade model and DVT model. It

depends on the previous transaction.

Model Feature comparisons

Independent Cascade(ID)

Model

Linear Threshold

(LT) Model

TCC Model DVT Model

Influence

Maximization

1) Each active

individual

attempts to

activate each of

its neighbor

independently

2) After the single

attempt, the active

individual

becomes latent

1) A node has

random

threshold

2) The threshold

will not change

‐ ‐

Dynamic

Influence

Maximization

The active individuals never

become latent during the

spreading process

Each active

individual is given

only one attempt to

activate any of its

inactive neighbor

The influence

probability might be

increased,

decrease, or

changeless. It

depends on the

previous activation

trials

The threshold of each

node can be changed.

It depends on the

previous activation

trials

Highligh

a) We i

prob

b) Influ

histo

prob

Simi

chan

incre

beha

c) Rela

The

to at

K=1

The

2) Finding Se

The main t

Problem St

Objective

Input: sen

Output: g

Dataset fr

hts:

incorporate th

bability in TC

uence probab

orical interact

bability could

ilarly, traditio

nged. But DV

eased, or chan

aviors.

ationship betw

TCC model

ttenuation mo

1.

DVT model

entiment Lea

technical rout

tatement:

e: Maximize t

ntiment netw

generate a se

rom Epinions

he methodolo

CC model and

ility in Tradit

tions (activat

d be decreased

onal LT mode

VT model con

ngeless in ter

ween propose

is a generaliz

odel with dep

can also be d

aders in Socia

te of SentiRan

he sentiment

work G, sentim

ed set T of ca

.com

ogy of individ

d threshold in

tional IC diff

tions). Our TC

d, increased o

el assumes th

nsiders the thr

rms of individ

ed models and

zed model of

pendency of in

degraded as a

al Networks

nk is

t coverage by

ment commu

ardinality k.

dual ethology

n DVT model

fusion model

CC model con

or changeless

at the thresho

reshold of ind

duals’ sentim

d traditional m

IC model, T

nfluence diffu

kind of LT

y seed set T

nity iC and

y to evaluate t

.

is independen

nsiders that a

.

olds of individ

dividual migh

ent, attitude a

models

TCC model c

usion and tim

T model when

d a number K

the influence

nt on previou

an influence

duals are not

ht be decrease

and other soc

can be degrad

me feature wh

n k=0.

K

us

be

ed,

cial

ded

hen

Sent

1)

2)

Hen

1)

2)

Sent

Give

repr

timent Repre

Positive and

5-star scale r

ce, I will disc

Detect the p

This proble

Basic idea:

Detect the 5

timent Leade

e a sentiment

resented as fo

sentation Sys

negative

rating (for exa

uss the sentim

positive and n

em is converte

: try to clust

distrust ea

5-star scale ra

rs Identificat

t communityC

ollows:

stem:

ample , Epini

ment commu

negative senti

ed to maximi

er the users

ch other.

ating sentime

ion

iC , a numbe

ions.com， i

unity detectio

iment commu

ize the agreem

who trust e

ent communit

r k , the top‐k

iphone 3G W

on approache

unities

ment on users

each other, w

ties

k sentiment l

White)

es for above 2

s’ sentiment

while separat

easers identi

2 cases:

t within clust

te the users

ification is

ters.

who

It is

com

betw

The

Eval

1)

2)

s to return

mmunity iC ,

weenness, so

Algorithm of

uation Metri

One-step sen

All-path sent

top‐K users

where

cial potential

f SentiRank is

ics:

ntiment cover

timent covera

with maxim

can be va

l.

s described as

rage

age

mum

arious users p

s follows,

as the sen

position metr

ntiment leade

ics. Such as d

ers in senti

degree, close

ment

ness,

34. To do list

1) I am working on the degree-based sentiment leaders identification in each communities

for two sentiment representation system with evaluation metric of one-step sentiment

coverage.

2) Discuss more related works about diffusion models in social networks.

====

35.

===========

=========

Done list

1) I answer

2) Datasets

A) For p

I ado

(http

First

Basi

have

B) For

I col

iPho

3) Sentimen

CASE 1

CASE 2

===========

===========

red the ICDE

s collection

positive and n

opt the datase

p://snap.stanf

tly, we detect

ic idea: If the

e the same sen

the 5-stars sc

llect the prod

one 3GS Wh

nt Communit

:

:

===========

===========

feedback req

negative sent

et from SNAP

ford.edu/data/

t the sentimen

ere exists a tru

ntiment. Othe

cale rating sen

duct rating dat

hite (16GB) S

ties Visualiza

===========

===========

quest with Pro

timent repres

P Stanford Un

/index.html).

nt communiti

ust relationsh

erwise, they h

ntiment repre

taset from Ep

Smartphone”

ation Represen

Positive S

Negative Se

===========

===========

of. Chung.

entation syste

niv.

It is a trust

ies according

hips between

have differen

esentation sys

pinions.com w

”. I obtain 84

ntation

Sentiment Co

entiment Com

===========

Rese

===========

em

relationship

to the optimi

two users, th

nt sentiment.

tem

with query ke

reviews by 8

ommunity

mmunity

=====

earch Report

Fei Hao

2011‐9‐17

==========

dataset.

ization appro

hen two users

eywords “App

84 customers.

t

oach.

who

ple

.

36.

Sentime

Actually

the basic

Here, the

Question

leaders c

interactio

sentimen

Hence, f

importan

In sentim

leaders a

(CU

bigtr

To do list

Sentiment Co

ent Commun

y, each sentim

c structure of

ere is an inter

n: Who is the

can be consid

on (sentimen

nt leaders.

finding the se

nt problem.

ment commun

are identified

)5,★★★★★C

ruckseries}

ommunities o

nities Structu

ment commun

each commu

resting issue a

e sentiment le

dered as users

t interaction)

ntiment leade

nity with 5 sta

by propose

) {sexymam

of Dataset II (

ure Analysis

nity is a sub-st

unity is compo

and challenge

eader in each

s who can init

, the sentimen

ers in each se

ars, we denot

ed algorithm S

ma442, three_

(It has five se

tructure of so

osed of some

e to be solved

sentiment co

tiator of senti

nt of some us

entiment com

te it with ★C

SentiRank.

_ster, kyreejd

entiment com

ocial graph. In

fragile sub-n

d.

ommunity?

ment. Due to

sers will be in

munity is bec

★★★★★ , the t

, s-o-m-e-g-u

mmunities)

n another wor

networks.

Sentiment

o the social

nfected by the

coming an

top-5 sentime

u-y,

rd,

e

ent

1) I will study a master thesis “An Information Diffusion Approach for Detecting Emotional

Contagion in online Social Networks” (Arizona State University)

A) To learn the diffusion model for sentiment propagation

B) Find out the potential role of the sentiment leaders for sentiment propagation.

======================================================================

Research Report

Fei Hao

2011‐10‐1

======================================================================

37. Done list

1) Seminar preparation.

I consider the sentiment factor into the social advertising. Hence, an interesting problem

is proposed, called “Finding Sentiment Leaders from Social Networks ”.

The difference between this problem and influence maximization:

D1) the sentiment leaders finding problem is more complex, because it considers the

sentiment factors on the basis of influence maximization.

D2) in influence maximization, default sentiment is positive without any other special

sentiments, such as negative and other rating sentiment.

D3) Sentiment analysis is more useful to social advertising.

D4) Influence maximization is special case of new defined problem.

2) Improve the writing of recent paper.

Contributions:

Considering both dynamics and the influence strength

A) Time-dependent Comprehensive Cascade Model (TCC)

B) Dynamic Variable Threshold Model (DVT)

Provide a prediction approach regarding when the influence strength should be changed

in two proposed models.

3) Reading the paper “Analysis of terrorist social networks with fractal views”. (JIS journal)

User position evaluation with fractal views.

It is a new idea to study the users position, it is benefit to my current research

topic—social marketing, social advertising.

That is a computational method to evaluate the user’s importance in social networks.

38. To do list

1) Begin the experiments of Sentiment Leaders Finding problem.

2) Consider the user position evaluation using fractal views, give an approach to calculate

the importance of nodes in social networks.

39.

======

=========

Done list

1) Influenc

maximiz

Here, I ju

the diffu

i.e., in ea

Indicates

in‐neigh

Hence, i

In the DV

In above t

social ne

Open issue

I think th

improve

products

===========

===========

e strength an

zation in socia

ust simply co

usion models.

ach model, th

s the probab

bor.

n TCC model,

VT model, th

two models,

etworks.

e:

here exists a

e his/her activ

s will be chan

===========

===========

d sentiment a

al networks

onsider the po

here is a para

ility that a no

, the influenc

e threshold is

we focus on t

certain relati

vation thresh

nged. I gues

===========

===========

aware diffusio

ositive and ne

ameter which

ode stays pos

ce probability

s redefined a

the study of t

onship betwe

old at time st

ss there exist

===========

===========

on models for

egative sentim

h called qualit

itive after it i

y is redefined

s follows

the positive i

een K and q,

tep t, his/her

s a non‐linea

===========

Rese

===========

r dynamic inf

ment (negativ

ty factor of th

s activated by

as follows,

nfluence diff

i.e, if a user d

r sentiment to

r relationship

===========

earch Report

Fei Hao

2011‐10‐15

==========

fluence

ve opinions) in

he product;

y a positive

fusion in dyna

decides to

owards to the

p between th

===

t

nto

amic

e

hem.

S

2: W

A is

F is a

2) User’s p

Suppose ther

We convert th

a focus：

a focus

osition measu

re is a social n

e social grap

urement using

network:

h into a tree,

g fractal view

, the resulting

Tree (A)

ws

g trees with f

ocus of A and

d F are as foll

ows:

Tree (F)

3：the importance of each node：

Topological view:

Using the number of diffusion paths to measure the importance

For example: A can propagate the information to 6 nodes. 4 nodes in the first level, i.e., its

directly influenced nodes is 4.

The importance can be calculated as follows: 4×0.5+3×0.25=2.75

Fractal Views:

Suppose the fractal value of focused node is 1. The sum of fractal values of children nodes

equals to the fractal value of their parent nodes. Due to various weights between each child

nodes and father nodes, the fractal values of each node are different.

For tree (A)

B: fractal value=4 5 2 1

*3 4 5 2 4 1

=0.26

C: fractal value=3 5 2 1

*3 4 5 2 4 1

= 0.23

E: fractal value= 3 4 2 1

*3 4 5 2 4 1

=0.21

F: fractal value=3 4 5 1

*3 4 5 2 4 1

=0.28

B+C+E+F: the summation of fractal value =1.

The fractal value of D= the fractal value of E，为 0.21.

The fractal value of G： )(*12

1*

53

5Ffractal

=

5 1* *

3 5 2 1F

分形值 =0.175

The fractal value of H : fractal(F)‐fractal(G)=0.105

How

Lev

Lev

He

H

To

influ

F

calcu

influ

Sp

For

topo

40.

w to calculate

vel 1: ( fractal

vel 2: (fractal(

nce, from the

ere is a prob

opological Pa

uence the chi

Fractal Part:

ulate the fra

uence should

pecial case:

above two g

ological part a

To do list

4) Study th

the importan

l(B)+fractal(C

(D)+fractal(G

e fractal view

lem: how to

art: If there e

ldren nodes.

It consider

actal value fo

be propagate

graphs, the f

and fractal pa

he performanc

nce of node A

C)+fractal(E)+f

)+fractal(H))*

w of points, th

o combine tw

exists many n

rs the weight

or each node

ed.

fractal values

art when we

ce of obtainin

A：

fractal(F))*0.

*0.25=0.21+0

he importance

wo measurem

nodes in the

ts between e

e. Fractal v

s are exactly

evaluate the

ng the top-k s

5=0.5

0.175+0.105）

e of node A

ments in an ef

same level,

each node an

value is a ref

y same. Ther

importance o

eed nodes for

）×0.25=0.1

is 0.5+0.122

ffective way ?

an importan

nd its parent’

fined propert

re, we have

of node.

r ICDE reject

1225

25=0.6225

?

nt node can e

’s node. Then

ty for how m

to consider

ted paper.

easily

n, we

much

both

41.

=======

=========

Done list

1) I’ve fini

Models

If possib

2) I wrote

Fractal V

3) I give th

===========

===========

ished my pap

for Dynamic

ble, I want to

another new

Views in Soc

he performan

===========

===========

per entitled “I

c Influence M

submit to AC

paper entitled

cial Networks

ce of obtainin

===========

===========

Influence Stre

Maximization

CM PODS co

d “TFRank: A

s”.

ng the top-k m

===========

===========

ength and Sen

in Social Net

nference.

An Evaluation

market mover

===========

Rese

===========

ntiment Awar

tworks”.

n of Users Im

rs in terms of

===========

earch Report

Fei Hao

2011‐11‐5

==========

re Diffusion

mportance wit

f running tim

==

t

th

e

42.

To do list

5) Working

I will com

6) Improve

Fractal V

g on my previ

mpare the TF

e the writing o

Views in Soci

ious work: ho

FRank algorit

of the paper ”

ial Networks”

ow to find the

thm with othe

” TFRank: An

”

e sentiment le

er existing alg

n Evaluation o

eaders from so

gorithms.

of Users Imp

ocial network

portance with

ks.

43.

=======

=========

Done list

1) Read the

VLDB 2

Motivat

the ident

be achiev

Contrib

1) De

tra

2) Sh

3) Co

lea

acc

Main C

Suppo

T_{S,u

Tota

T_{S,

Defin

K_{

The

===========

===========

e paper “A Da

2011).

tions: Viral m

tification of in

ved.

butions:

eveloped a ne

aces that allow

how that influ

ompare the pr

arned from re

curacy.

Contents:

ose Seed nod

u} is the fract

l credit given

,u}(a)=1*0.25

ne the total in

{S,u}=1/|A| ∑

e influence sp

===========

===========

ata-based App

marketing, soc

nfluential use

ew model call

ws us to direc

uence maximi

roposed appro

al data, and s

es set S={v,z}

ion of flow re

to v for influ

5+0.25+0.5*0

nfluence cred

∑ ,∈

pread can be

===========

===========

proach to Soc

cial advertisin

ers, by targeti

led credit d

ctly predict th

ization under

oach with the

show that the

},

eaching u tha

uencing u for

0.25+1*0.25=

dit for all the

defined using

===========

===========

cial Influence

ng. One of the

ing whom cer

distribution, b

he influence s

credit distrib

e standard app

credit distrib

at flows from

action a.

=0.875

actions in A (

g above total

===========

Rese

===========

e Maximizatio

e key problem

rtain desirabl

uilt on top of

pread of node

bution is NP-h

proach with e

bution mode p

either v or z.

(actions set)

influence cre

===========

earch Report

Fei Hao

2011‐11‐19

==========

on”( publishe

ms in this are

le outcomes c

f real propaga

de sets.

hard.

edge probabil

provides high

.

edit

==

t

ed in

a is

can

ation

lities

her

Vu

uSKS ,)(

Comments: most of the literature on influence maximization has focused mainly on

the social structure, in this paper, the authors proposed a novel data‐based approach, that

directly predict the influence spread.

The proposed credit distribution model directly estimate influence spread by

exploiting historical data, thus avoiding the need for learning influence probabilities, and more

importantly, avoiding costly MC simulations.

CD model is closest to ground truth. It is highly scalable algorithm.. CD model is not a

propagation model, but prediction model of influence spread according to the credits.

2) I continued to check and improve the presentation of my paper.

I make the contributions clearly. I made a table for important variables appear in the

paper.

44. To do list

1) Based on my proposed framework SentiRank. Design an efficient sentiment leader

mining algorithm

2) Do the experiment on sentiment leaders finding problem

45.

Prob

Give

prob

the c

The

======

=========

Done list

1） Discover

2011)

Motivati

collabora

Each tea

Project:

Expert: a

Social ne

any two

blem 1: witho

en a project P

blem of team

communicati

Communicat

===========

===========

ring Top-K te

ions: Given a

ate in order to

am might/mig

set of require

an individual

etwork: repre

experts)

out a leader

P and a graph

formation w

ion cost of T,

tion Cost is ca

===========

===========

eams of Expe

a social netwo

o complete a

ght not have a

ed skills

with a specif

esents strength

G representi

without a lead

defined as th

alculated as f

===========

===========

erts with/with

ork, find top-k

project.

a leader.

fic skill-set

th of relations

ing the social

der is to find a

he sum of dist

follows:

===========

===========

hout a leader i

k teams of ex

ships (the deg

l network of a

a team of exp

tances of T, is

===========

Rese

===========

in social netw

xperts that can

gree of collab

a set of exper

perts T for P f

s minimized.

===========

earch Report

Fei Hao

2011‐12‐3

==========

works (CIKM

n effectively

boration betw

rts C, the

from G so tha

===

t

M

een

at

Prob

Give

prob

the l

mini

The

been

blem 2: with a

en a project P

blem of team

leader of the

imized.

Communicat

Concl

Two

1) Fin

2) Fin

An a

n proposed.

a leader

P and a graph

formation w

team so that

tion Cost is ca

usions:

problems are

nding top‐k te

nding top‐k te

pproximation

G representi

with a leader i

t the commu

alculated as f

e defined:

eams of expe

eams of expe

n algorithm fo

ing the social

is to find a te

nication cost

follows:

erts with a lea

erts without a

or finding a t

l network of a

am of expert

t, defined as t

ader.

a leader.

eam of exper

a set of exper

s T and an ex

the leader dis

rts without/w

rts C, the

xpert L from C

stance is

with a leader

C as

have

Extensions:

I think above problem ignored the skill proficiency of experts. That is to say, if we assign the

experts to complete a project without consideration of skill proficiency, user may not unsatisfied

this assignment. However, if we consider the skill proficiency and communication cost

together over the team formation problem, then it is coinciding with real world life.

Problem Statement:

Let },....,{ 21 ncccC denote a set of n experts, and },...,{ 21 nsssS denote a set of m

skills. Each expert ic has a set of skills, denoted as )( icQ , and ScQ i )( . If )( ij cQs ,

expert ic has skill js . Each expert ic have various proficiency on his/her each skill js ,

denoted as )( jsR , )( ij cQs . For a project SP is defined as a set of skills required to

complete the project.

Input : a social graph G, and a project P.

We have two objectives to optimize

1) Minimize the communication cost between them (find the experts who can easily to

collaborate or communicate)

2) Maximize the proficiency among them. (person’s proficiency of a certain skill)

Output: return the set of experts with the minimum communication cost (sum of distance in a

case of without a leader and leader distance in a case of with a leader) and maximum of

proficiency R.

There is an intuition design for our two objectives:

Method 1: we can give an overall optimization variable (such as O)

cyproficicen

ionCostCommunicatO

(1)

Our aim is to maximize the overall optimization variable O. Here, we called O utility.

Method 2: It is difficult to satisfy above two objectives together. Therefore, I think there exists a

parameter to balance this two performance.

Max oficiencyionCostCommunicat Pr*)1(*

Obviously, if want to Max above function, we may increase the value of parameter as much

as possible. But, it is a little abstract to understand this formula.

Here

Simp

Defi

ps ,

Topi

Cont

For a

com

max

them

acti

Here

team

0.88

e is a questio

ply, we can m

nition (Sum o

pSc >}, the s

ic title: Utility

tinue the exa

a project P={A

mmunicate, sm

The n

ximize the deg

m. Because,

vity .

e, we may giv

m of experts.

0.75,0.87,0.4

n? How to

make the sum

of proficiency

sum of profic

y maximizati

ample :

AI,DB,DM,IR}

maller numbe

umbers in th

gree of profic

degree of p

ve a Sdistance

This new dist

43

determine th

mation of ea

y) Given a tea

iency of T is d

ofsum Pr

on aware tea

}， the numb

er represents

e boxes deno

ciency of the

roficiency is m

e: a new dista

tance SDIS ca

he proficiency

ch user’s pro

am T of exper

defined as

oficiency

am formation

bers on the e

better comm

ote the degre

team and mi

more associa

ance for mea

an be devised

y of a team?

oficiency.

rts for a proje

p

isi

cR1

)(

n in task‐orie

edge represen

munication. A

ee of proficien

inimize the co

ted with use

suring the ex

d according to

0.85

ect :{<1

,1 Scs

ented social n

nts how easily

t the same ti

ncy, my motiv

ommunicatio

r’s attitude a

xperts and fur

o the descript

0.95

>,<2

,2 Scs

network.

y two experts

me,

vation is try t

on cost betwe

and subjecti

rther form a

tion of Eq.(

0.65,0.7

>…<

s can

to

een

ve

(1).

4

2) I formally give the proof of NP‐hard for dynamic influence maximization under the

proposed models.

By consider an instance of existing NP problems and prove that.

46. To do list

1) I will formulate the problem statement mathematically, and give an initial solution idea

for that.

======================================================================

Research Report

Fei Hao

2011‐12‐17

======================================================================

47. Done list

Basic Problem Descriptions:

Let },....,{ 21 ncccC denote a set of n experts, and },...,{ 21 nsssS denote a set of m

skills. Each expert ic has a set of skills, denoted as )( icQ , and ScQ i )( . If )( ij cQs ,

expert ic has skill js . Each expert ic have various proficiency on his/her each skill js ,

denoted as )( jsR , )( ij cQs . For a project SP is defined as a set of skills required to

complete the project.

Input : a social graph G, and a project P.

We have two objectives to optimize

3) Minimize the communication cost between them (find the experts who can easily to

collaborate or communicate)

4) Maximize the proficiency among them. (person’s proficiency of a certain skill)

Output: return the set of experts with the minimum communication cost (sum of distance in a

case of without a leader and leader distance in a case of with a leader) and maximum of

proficiency R.

There is an intuition design for our two objectives:

Method 1: we can give an overall optimization variable “Utility”

tioncommunicat

oficiencyU

cos

Pr

(1)

Our aim is to maximize the utility.

Here is a question? How to determine the proficiency of a team?

Simply, we can make the summation of each user’s proficiency.

Definition (Sum of proficiency) Given a team T of experts for a project :{<1

,1 Scs >,<2

,2 Scs >…<

pSp cs , >}, the sum of proficiency of T is defined as

Topi

Cont

For a

com

max

them

acti

Prob

We

team

PRO

Give

awa

from

0.88

ic title: Utility

tinue the exa

a project P={A

mmunicate, sm

The n

ximize the deg

m. Because,

vity .

blem Definit

will give two

m formation

OBLEM 1 (UM

en a task‐orie

re team form

m 'G so that

0.75,0.87,0.4

y maximizati

ample:

AI,DB,DM,IR}

maller numbe

umbers in th

gree of profic

degree of p

tions:

o formal defi

with a leade

M‐TF)

nted social n

mation proble

the utility of

43

ofsum Pr

on aware tea

}， the numb

er represents

e boxes deno

ciency of the

roficiency is m

initions for tw

er

etwork 'G

m without a

T

oficiency

am formation

bers on the e

better comm

ote the degre

team and mi

more associa

wo cases: 1

),( '' EV

leader (UM‐T

p

isi

cR1

)(

n in task‐orie

edge represen

munication. A

ee of proficien

inimize the co

ted with use

) team form

and a task

TF) is to find

0.85

ented social n

nts how easily

t the same ti

ncy, my motiv

ommunicatio

r’s attitude a

ation withou

, the utility

a team of exp

0.95

network.

y two experts

me,

vation is try t

on cost betwe

and subjecti

ut a leader 2

maximization

perts T for

0.65,0.7

s can

to

een

ve

2)

n

4

PRO

Give

awa

an e

The

1)

2)

3)

4)

This

5)

48.

OBLEM 2 (UM

en a task‐orie

re team form

expert L from

framework o

We divide so

Select an exp

Select the 2nd

Repeat step 3

Step 1

is a greedy a

I collected th

Paper collabo

a) Consider

b) Consider

To do list

2) Do the e

with exi

M‐TF‐L)

nted social n

mation proble

m 'V as the

of algorithm:

ome groups th

pert with the bd expert who

3 until I found

algorithm.

he datasets for

oration netwo

r the frequenc

r the frequenc

experiments f

isting approa

etwork 'G

m with a lead

leader of the

hat each group

biggest profic

can maximiz

d all of skill g

r experiments

ork

cy of user’s c

cy of each pa

for case witho

ch in terms o

),( '' EV

der (UM‐TF‐L

e team so tha

up correspond

ciency.

ze the utility i

groups.

Step 2

s.

contributions

air as the com

out a leader.

of proficiency

and a task

L) is to find a

t the utility o

ds a cluster wi

f add him or

as the profici

mmunication c

I will comp

y and cost, res

, the utility

team of expe

of T.

ith a certain s

her into the r

iency for a ce

cost.

pare my propo

spectively.

maximization

erts T for a

skill.

results.

Step3

ertain skill.

osed approac

n

and

h

Documents

Research work 2011