Upload
fei-hao
View
230
Download
7
Embed Size (px)
DESCRIPTION
Research work 2011
Citation preview
======================================================================
Research Report
Fei Hao
2011‐1‐1
======================================================================
1. Done list
1) Determination of the Parameter in the Linear Threshold Model
Each node u chooses a threshold u at random from the interval [0,1]; This represents the
weighted fraction of u’s neighbors that must become active in order for u to become active.
Thus, the threshold u intuitively represent the different latent tendencies of nodes to adopt
the innovation when their neighbors do; The fact that there are randomly selected is intended
to model lack of knowledge of their values.
2) Extract the data set from a Zachary’s karate club network. It is a test network. It contains 34
nodes, 78 edges
Each node indicates the club member
Edge indicates the two members take the activities frequently.
Task 1: Based on proposed social potential, I calculated the shortest path and shortest
distance between any two nodes.
3) Seminar Preparation:
0
100
200
300
400
500
600
1 2 3 4 5
Distribution of Shortest Distance
Distribution ofShortest Distance
2.
Title: Co
Social Ne
Basic ide
influentia
Applicati
Top-k inf
• Commun
The comb
If CoEn
Commu
To do list
1) Formuli
2) Impleme
club netw
ommunity-bas
etworks
ea: Divide a n
al nodes withi
ion: Apply in
fluential users
nity Combina
bination entro
ntropy > θ, th
unity Combina
Cm
sm: Establish
nt the degree-
work.
sed Greedy A
network into
in communitie
fluence maxim
s from it.
ation
opy of commu
hen Combine
ation
the mathema
-based influen
Algorithm for
communities,
es.
mization algo
unity of Cl to C
Cl
atical model fo
nce maximizat
Mining Top-
, and then cho
orithm to Mob
Cm is defined
or my idea.
tion algorithm
-K Influential
oose commun
bile Social Ne
d as
m using the Zac
l Nodes in M
nities to find
etworks and to
achary’s karate
Mobile
top-k
o find
e
======================================================================
Research Report
Fei Hao
2011‐1‐15
======================================================================
3. Done list
1) Problem definition
Social Potential
Given a social network N=(V,E), where V={v1,v2,…vn}is the set of nodes, E is the set of edges.
The potential of a node vi is defined as:
n
j
d
j
n
ji
ij
emijv11
2
)( )(
}6|{ ixdxj
Where mj is the mass of vj , describing activity of the node, reflects the influence range.
ijd is the shortest distance between node vj and vi.
Algorithm:
For each node in SN
{
Shortestpath[i][]=Dijistra(i);
For each node in SN
If I in shortestpath(k,j)<6
For n:=1 to N do
P(x)=0;
{
P(x):=P(x)+ Degree(i)/sum(Degree(i))*EXP(‐(shortestpath(k,j)/sigma)^2) // Social
Potential definition
}
}
Mathematical Model:
Input: G(V,E), threshold , influence degree NvbyactivedvI /#)( , target set size K,
parameter ]1,0[
Ka
ed
Initialize 0S
For i=1 to K
Choose
u=arg
[
Si 1
|( Sum
uI )(
The purp
social influe
arate Dataset
dges)
0
5
10
15
20
25
0
do
node u
g max *(u
]1,0
uSi
(|) SIS
N
Au // Inf
pose of this a
nce.
t ( Zachary’s
0
5
0
5
0
5
1 2
)1()(x
(||}){ SIu
luence degre
algorithm is to
s karate club
3 4
Sprea
))|() iSum
|)S // Influen
ee of the user
o find an opt
network. It i
5 6
d of Inf
nce Increase
r u.
imal paramet
s a test netwo
7 8
fuence
ter for max
ork. It contain
degre
ximization of
ins 34 nodes,
ee
78
1
2
3
4
5
6
7
8
Top 8
k
Closeness Ce
The closene
the shortest
Where, c(x,y
m
8 key nodes o
entrality
ss centrality
t paths from m
y)‐ a function
‐ the number
of the karate’s
34
1
33
3
2
32
4
24
CC(x) of mem
member x to
xCC(
n describing t
r of nodes in
s club networ
Degree
mber x tightly
all other peo
yxy
mx
,
)
he distance (
a network
rk
3
1
3
3
2
3
4
9
y depends on
ople in the so
Myyxc
m
),(
1
shortest dista
Social
34
1
33
3
2
32
4
9
the geodesic
ocial network
ance) betwee
l potential
c distance, i.e
en nodes x an
e.,
nd y.
k Closeness
1 1
2 3
3 34
4 32
5 9
6 14
7 33
8 20
4. To do list
3) Study the combined model with social potential and other metrics.
4) Study the correlation between the parameter ]1,0[ and the spread of influence.
5.
1
=========
=========
Done list
1) Estimatio
Simple w
(dbuv
New Est
w
uvb
2 ) Betweenn
Between
members
Members
connect w
===========
===========
on of influenc
way:
)(
1
v , d(v) is
timation:
)(
(deg
)(deg
vNw
ree
uree
ness Centralit
ness centralit
s.
s with high B
with each oth
===========
===========
ce between ac
uvb
the degree of
)(
)
w
ty
ty BC of mem
C are very im
her only thro
BC
===========
===========
ctive node u a
f u, which me
mber x pinpoi
mportant to th
ugh them. (B
jxixC ,)(
===========
===========
and inactive n
eans that for
nts to what e
he network b
Bridge role)
ij
Mjiij
b
xb,,
)(
===========
Rese
===========
neighbor v
inactive node
extent x is bet
ecause other
==========
earch Report
Fei Hao
2011‐2‐5
==========
e v.
tween other
rs actors can
t
)(xbij : the number of shortest paths from i to j that pass through x
ijb : the number of all shortest path between i and j.
m: the number of nodes in a network.
3) Random (Naïve method)
This method is to extract k nodes randomly from the network as our seed nodes.
In this case, it doesn’t consider about the importance of nodes in the network.
4) Experimental Dataset 1 (Small Dataset)
DataSet statistics:
Dataset Node Edge Average Degree
Karate Club 34 78 4.6
Degree Distribution
Degree Number of nodes
1 1
2 11
3 6
4 6
5 3
6 2
9 1
10 1
12 1
16 1
17 1
This dataset follows the power‐law distribution.
0
5
10
15
1 2 3 4 5 6 9 10 12 16 17
Number of nodes
Degree
Degree Distribution
Degree Distribution
Explanation of my idea:
A naïve approximate solution to influence maximization is to select the “most influential”
node at each step.
Social potential reflects the potential ability of influence other nodes.
Hence, I want to study them and try to combine them and introduce a tunable parameter to
optimal the seed nodes selection under the linear threshold model.
5) Experimental Dataset 2 (Large Dataset)
DataSet statistics:
Dataset Node Edge Average Degree
Email‐entron 36692 367662 10.02
0
5
10
15
20
25
1 2 3 4 5 6 7 8
Spread
of Influence
K
degree
social potential
PageRank
betweeness
closeness
random
0
2000
4000
6000
8000
10000
12000
1
32
63
94
125
159
198
234
293
383
552
Number of Nodes
Degree
Degree Distribution
DegreeDistribution
6. To do list
5) Study the combined model with social potential and other metrics in the small & large
dataset.
6) Study the correlation between the parameter ]1,0[ and the spread of influence.
7.
=======
=========
Done list
1) Further e
Traditio
which ca
closenes
Weak Po
However
potential
could tri
My idea
into two
Phase 1:
triggerin
Phase 2:
Where β
I’d like t
Suppose
===========
===========
explanation f
onal way: Un
an maximize
ss, betweenes
oint: these ap
r, these nodes
l node we sel
gger more in
a: To obtain th
phases.
: select some
ng the more n
: select rema
β is a tunable
to give an exa
e, one compan
===========
===========
for my idea ab
nder the influe
the influence
s.
pproaches ign
s cannot prop
ected in phas
active neighb
he k seed nod
nodes (for e
nodes.
ining nodes (
parameter.
ample to expl
ny want to rec
===========
===========
bout Influenc
ence diffusion
e increase wit
nore the node
pagate the inf
se I can accum
bors.
des from a soc
example: βk) i
k‐βk) from th
lain it.
cruit k emplo
===========
===========
ce Maximizat
n model, they
th some meas
es which has m
formation at t
mulate some “
cial network.
in terms of hi
he in terms of
oyee, an exc
===========
Rese
===========
ion Problem
y select top-k
surement, suc
more “social
ime t, but I th
“influence” fo
We divide
igher potenti
f maximal inf
cellent compa
===========
earch Report
Fei Hao
2011‐2‐19
==========
k influential n
ch as degree,
potential”.
hink most
for the future.
the selection
al for future
fluential incre
any may selec
==
t
odes
. It
n step
ease.
ct
some person who perform quite well before join the company, and also they consider
some person who have higher potential to develop. Because, they think that these person
can develop very well in the near future and bring much profits to the company.
I want to investigate the correlation between the various β and the influence spread.
2) Firstly, all of methods cannot guarantee to find the optimal seeds, and no algorithm can,
because the problem is NP-hard. It is precisely because we cannot analytically know
which one is better; we use simulations to compare their effectiveness.
3) If we remark the influence spread as f(S), S is the initial active set.
Problem is that: find a k-node set S to maximize f(S). f(S) is the objective function.
F(S) : properties
1) Non-negative
2) Monotone : f(S+v)>=f(S)
It is NP‐hard to determine the optimum for influence maximization for both independent
cascade model and linear threshold model.[Proved by KKT’2003]
3) Discussion on Influence Range parameter
As can be seen from the formula of social potential, σ is only a parameter to be
determined.
Potential Entropy (Generated by Shannon Information Entropy)
Where
n
iivZ
1
)( is a normalized factor.
Property: For any σ ∈ 0, ∞ , potential entropy H satisfies )log(0 nH ,
and H reaches the maximum value log(n) if and only if the social potential of nodes
are same.
Problem: find an optimal parameter can be changed into minimum of potential
entropy, it means minimize the uncertainty.
))(
log()(
)(1 Z
v
Z
vH i
n
i
i
8.
To do list
7) Formuliz
8) Find an e
especially
ze the propose
efficient and f
y for a large n
ed model with
fast algorithm
network.
h better explan
to calculate th
nations.
he influence raange paramete
er in our model
======================================================================
Research Report
Fei Hao
2011‐3‐5
======================================================================
9. Done list
1) Social Potential Calculation
Input:
Influence factor σ;
Number of users N;
Output:
Social Potential )( iv
For each node in SN G=(V,E)
Begin
Shortestpath[i][]=Dijistra(i)
For each node in SN
If i in shortestpath(i,j)<=6
For n =1 to N do
)( iv =0
Begin
2)),(
(1 jkthshortestpa
ii eN
vv
)()(
end
End
2) Formulation of idea
Input: G=(V,E), target set size K, parameter ]1,0[
Initialize 0S
// seed nodes set
For i=1 to K do
Choose node u
u=arg max ))()1()(*( xhxu // integrate the two kinds of nodes
]1,0[
uSS ii 1
influ
I
A
appr
scop
N
Ba
So, t
3
{)(xh
The purpos
uence.
Such as:
1) We can
2) Social
3) Social
4) Social
Influence Fac
According to
roximately 99
pe of a data o
ormal Distrib
ack to our so
n
jiv
1
)(
the influence
2/3
(Figu
{deg bree
se of this algo
n combine the
potential +cl
potential +be
potential+ ra
ctor Discussio
the property
9.7% of the va
object is roug
bution:
cial potential
j
ij )(
range of eac
re from: htt
,closenbased
orithm is to fi
e social poten
oseness
etweeness
andom
on
y of Gaussian
alues fall with
h equal to 3σ
l
n
jj em
1
ch node is abo
tp://en.wik
basedness
ind an optim
ntial with deg
function, as w
hin a margin.
σ.
d ij2
out
kipedia.org
|},d
al parameter
gree, social po
we know, for
‐‐‐“3σ criter
/wiki/Norm
for maxim
otential + deg
a Gaussian d
ion”. This is, t
mal_distribu
mization of soc
gree
distribution,
the influence
ution)
cial
e
3
20 , there is no interaction between nodes. // within 1 hop
3
22
3
2 each node could influence their 1-hop neighbor nodes.
// 1 hop~2hop
23
22 each node could influence their 2-hop neighbor nodes.
//2 hop~3hop
Let the σ corresponds the minimal potential entropy, i.e. Nll
3
2, then, we can
search the optimum σ in the region ( )3
2,
3
21 ll )( which satisfied with specified
precision.
Details:
If σ1 in ( )3
2,
3
21 ll )(
,
σ2 in ( )3
21,
3
2 )( ll
,
σ3 in ( )3
22,
3
21 )()( ll
,
if H(σ1)>H(σ2), H(σ3)>H(σ2) , then, we think )3
21,
3
2 )( ll
is our final
searching range.
10.
3) Trust Ma
Motivat
when a s
Contrib
networks
Main Co
Trust In
Multipl
If A
Task‐or
In G=(
additi
the in
where
TA.
VV ' ,
,' GG
Maxim
To do list
1) Do an ex
2) Study the
aximization in
tion: Finding
social networ
ution: Propo
s.
ontents:
nference Mech
lication
A trusts B with
riented Socia
(V,E), V indica
on, A={a1,a
dividual i has
e A(vi)A. A
. For a certa
VV ' Henc
, '' GGG
mum trust ro
xperiment usin
e strength of r
n Social Netw
maximum tru
k is oriented
osed the trust
hanism:
h Tab, B trusts
l Network
ates users, E i
a2,…am} is a s
s the skill j.
task T is defi
ain task, th
ce, this group
),( '' EV
utes identific
ng proposed id
relationship be
works
ust in social n
by certain tas
maximization
s C with Tbc,
indicates the
set of m skills
Each individu
ined as a sub
at can be per
p of individua
cation
dea.
etween users b
networks whi
sks.
n algorithm b
Then A trust
connection b
s. By denoti
ual is associat
set of skills re
rformed by a
ls forms a tas
by fuzzy set th
ich is particul
based on task-
s C with Tab*
between two
ing aj ∈Vi,
ted with a sm
equired to ac
group of ind
sk‐oriented so
heory.
larly importa
k-oriented soc
*Tbc
o users. In
we claim tha
mall set of ski
ccomplish the
ividuals,
ocial network
nt
cial
at
lls,
e task.
k,
11.
========
=========
Done list
1) Existed p
fuzzy adj
For example
R(x,y)=R
I think it has
In real-life w
So, we may e
Basic assum
close to b
For example
Trust(a1,a2)=
So, RTD(a
RT
RT describes
),( ji xxRT
===========
===========
previous wor
djacency matr
:
R(y,x)
some limitat
world, people
extend the fu
mption: If us
b. such as: the
:
=0.9 and trust
1,a2)=trust(a
txxT ji ),(
s the fuzzy tru
,( ij xxRT
===========
===========
rk[Matteo, 20
rix
tions, such as
usually expre
uzzy adjacenc
ser a trust b to
e strength of a
t(a2,a1)=0.2
1,a2)*trust(a2
xtxxt ji
0
(*),(
1
ust relations,
)
===========
===========
009],[Yagger,
s in trust netw
ess different r
cy matrix, i.e,
1.07.0
12.0
9.01
oo much, and
a and b (calle
2,a1)
recxx
a
ij ),
===========
===========
2008] focus
work.
relationship d
, the asymme
1
6.0
2.0
d b trust a too
ed Reciprocal
no
trusciprocal
jandigent
===========
Rese
===========
on symmetric
degree each o
tric relationsh
much, then,
l Trust Degree
trusto
somewithst
eachtrustj
===========
earch Report
Fei Hao
2011‐3‐19
==========
c relation usi
other.
hip.
we consider
e) is stronges
extente
hother
=
t
ng
a is
st.
RT=
106.014.0
06.0118.0
14.018.01
A fuzzy m-ary relation S on a single set X is a fuzzy subset of Xm defined as follows,
]1,0[]1,0[: k
trustno
extentsomewithtrustreciprocal
eachothertrustjandiagent
xxRT pmp
0
1
),.....( 1
)),(),.......,,((),.....( 1211 pmpmpppmp xxRTxxRTxxRT
Hence, we can calculate the m-ary reciprocal trust degree , furthermore, we can obtain the
who are most trust.
For example: RT(x1,x2,x3)=0.5
RT(x1,x2,x4)=0.533
…………..
…………………………
RT(x5,x6,x7)=0.2
2)
Dolph Degree Distribution
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11 12
Degree Distribution
Degree Distribution
Rem
incre
mark: From th
easing functio
his picture, ob
on. It fits to t
bviously, the i
he basic prop
influence spr
perties of
read function
)(x which is
)(x is a m
s proved in [K
monotone
KKT,2003]
This diagram shows the comparison results when the tunable parameter=0.3
12. To do list
3) Continue the experiment on other combination, and also make the appropriate analysis.
4) Find out more related work about asymmetric fuzzy adjacency matrix, especially solving the
social network problem.
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Influence Spread
K
Social potential+Closeness
Social Potential
Closeness
======================================================================
Research Report
Fei Hao
2011‐4‐2
======================================================================
13. Done list
1)
Yeast Protein Interaction
Node Edge Average Degree
1486 4406 16.38
Relative Large Data Set—Yeast (Degree Distribution)
Degree‐based Approach
0
200
400
600
800
1000
1200
1 3 5 7 9 11 13 15 19 21 25 29 38
Number of Nodes
Degree
Yeast Degree Distribution
Yeast DegreeDistribution
0
200
400
600
800
1000
10 20 40 60 80
Influence Spread
K
Degree
Degree
Closeness‐based Approach
Betweeness‐Based Approach
Comparison among various independent approaches
0
10
20
30
40
50
60
70
80
90
10 20 40 60 80
Influence Spread
K
Closeness
Closeness
0
100
200
300
400
500
600
700
800
900
1000
10 20 40 60 80
Influence Spread
K
Betweeness
Betweeness
Obviously, Closeness‐based approach is worst in influence maximization problem.
2) Combined Model
Case 1: Social Potential + Degree
Case2: Social Potential + Closeness
14. To do list
0
100
200
300
400
500
600
700
800
900
1000
10 20 40 60 80
Influence Spread
K
Degree
Closeness
Betweeness
0
200
400
600
800
1000
10 20 40 60 80
Influence Spread
K
Degree
Social+Degree
0
200
400
600
800
1000
10 20 40 60 80
Influence Spread
K
Closeness
Social+closeness
5) I will do the experiment with social potential+ betweeness.
6) Make some experimental analysis on my results.
15.
1) Ye
Co
I st
app
Influence
Spread
========
=========
Done list
east Interacti
ombined mo
tudied the d
proach
0
100
200
300
400
500
600
700
800
900
1000
Influence Spread
===========
===========
ion Network
del (Social+b
ifference of
10 20
===========
===========
betweeness)
Betweeness
40
K
===========
===========
Centrality A
60
===========
===========
Approach and
80
===========
Rese
===========
d Social Pote
Betweene
Social Pot
===========
earch Report
Fei Hao
2011‐4‐16
==========
ntial+Betwee
ess
tential+Betwe
=
t
eness
eeness
2) Small-size Dataset (Dolphin Social Network)
Degree-based Approach, Social Potential + Degree
Closeness-based Approach, Social Potential + Closeness
Betweeness-based Approach, Social Potential + Betweeness
0
10
20
30
40
50
60
1 2 4 6 8 10
Influence Spread
K
Degree
Social Potential+Degree
0
10
20
30
40
50
60
1 2 4 6 8 10
Influence Spread
K
Closeness
Social Potential+Closeness
Influ
16.
Influence
Spread
uence Spread
To do list
7) Consider
Like, we
8) Result An
0
10
20
30
40
50
60
Influence Spread
d Comparison
ring the differe
may consider
nalysis and Co
1 2
n Results usin
ent mass of no
r some attribu
omments
4
K
ng various alg
odes when cal
utes of each no
6 8
gorithms
lculating the s
ode
10
social potentia
Bet
Soc
al
tweeness
cial Potential+
+Betweeness
17.
1) Ra
I st
Influ
I stu
=======
=========
Done list
andom Algor
tudied the Ra
2) Small-size
uence Spread
died Rando
===========
===========
rithm Compa
ndom Algorit
e Dataset (D
d Comparison
om Algorithm
===========
===========
arison Results
thm in Influe
olphin Socia
n Results usin
m in Influence
===========
===========
s
nce Maximiza
al Network)
ng various alg
e Maximizatio
===========
===========
ation Problem
gorithms
on Problem
===========
Rese
===========
m
===========
earch Report
Fei Hao
2011‐5‐7
==========
==
t
2) W
3) iF
Im
G
m
18.
Writing a pape
Friend paper w
mproved the f
ive the gener
membership
To do list
9) Try to fin
10) Improve
er“Efficient T
work:
formulism of
ralized formu
functions
nish the writin
iFriend paper
Top‐K Market
design of me
la to determi
ng of “Influenc
r continuously
t Movers Min
embership fu
ine the mem
ce Maximizat
y.
ning for Socia
unctions for li
bership funct
tion Problem”
l Advertising
nguistic varib
tions with sim
paper .
”
bales.
mple trigular
======================================================================
Research Report
Fei Hao
2011‐5‐21
======================================================================
19. Done list
1) Formalized the membership functions of linguistic terms ‘Special friends’, ‘Good friends’,
‘General friends’, ‘E-buddy’ as a generalized form.
Formalized the membership functions of linguistic terms ‘Seniors’, ‘Peers’, ‘Juniors’, as a
generalized form.
Parameters are given more flexible, it depends on user’s preference.
2) Finish and correct the paper “Efficient Top-K Market Movers Mining for Social
Advertising”.
3) In the Linear threshold model, the threshold is a fixed value; it does not change during
the propagation of information. Actually, the threshold can be changed as the time
unfolds.
There is an intuition: if a node cannot be influenced by some active neighbors, then,
when its next neighbor try to influence current node, the threshold was changed due to
the historical activation of his neighbors, at this point, current node may decrease its
threshold to receive, or may improve its threshold to receive, or may keep the same
threshold value.
t=0 , threshold θ0
t=1, θ1
t=2, θ2
Dynamical Variable Threshold Model (DVT)
Hence, I will design a Dynamical Variable Threshold Model to simulate the propagation
procedure in the social networks.
In this model, S denotes the neighbor nodes which tried to active current node v, but failed.
vvv V
SkS
||
||*)(
k is a random value from K={‐1,0,1}.
Explanation for k values:
1) If k=1, the threshold begins to decay, it means the user reduce his/her requirements to accept
the new activation. In this case, the user can be easily influenced next timestamp.
2) If k=0, the threshold keeps the same value. It means this user is very unshakeable.
3) If k=-1, the threshold begins to increase, it means the user improve his/her requirements to
accept the new activation. In this case, the user is a diehard.
Step1: initially, randomly give the threshold for each node.
Step2: change the threshold dynamically after each activation failed.
Step3: repeat step 2 until no any nodes can be influenced, then stop.
In fact, Dynamical Variable Threshold Model is nothing to do with the order of activation.
Let’s suppose two nodes sequences 1T , 2T
},....,{ 211 ruuuT , },....,{ '2
'1'
2 ruuuT
vvvvv V
rSk
V
TSkTS
||
||*
||
||*)( 1
1
vvvvv V
rSk
V
TSkTS
||
||*
||
||*)( 2
2
Obviously, the )()( 21 TSTS vv , therefore, the variable threshold model is nothing to
do with the order of activation.
Good points:
Proposed Dynamical Variable Threshold Model could simulate the real‐life behavior of
information diffusion and status activation.
Mathematical Formulation:
Modified Constraint:
vvv V
SkS
||
||*)(
))()(()()( ''
''
'
' vTvSbb vvTu
vuvSu
vu
Theorem: The influence spread )(I under the linear threshold model is monotone and
submodular.
1) Obviously, influence spread )(I is non‐negative.
2) Monotone: )()( IvI
3)Submodular : to prove later.
20. To do list
1) Continue improve the writing of papers.
2) Prove three properties for proposed Dynamical Variable Threshold Model.
3) Want to submit to ICDM 2011.
====
21.
===========
=========
Done list
1) Study an
===========
===========
nother social n
===========
===========
network
===========
===========
===========
===========
===========
Rese
===========
=====
earch Report
Fei Hao
2011‐6‐4
==========
t
Here, to
fuzzy pa
D1
D2
D3
Fuzzy p
evaluate the
artial orderin
artial orderi
improved per
ng model to o
Degree
Improvemen
0.01347
0
0
ing relation m
R(
rformance fo
obtain the ran
nt
C
Im
7
0
0
matrix
ji xx
0
0
0
1
),
r various size
nk of perform
Closeness
mprovement
.121628
.020833
.097009
ji
ji
ji
ji
xx
xx
xx
xx
?8.
5.
0
1
e of network.
mance.
Bet
Imp
0.04
0.08
0
I utilized the
tweenness
provement
44287
87142
e
So, using above matrix, we can get following fuzzy partial ordering relation matrix
5.08.00
8.05.08.0
18.05.0
n
jjii xxRxR
1
),()(
R(D1)=2.3 > R(D2)=2.1 > R(D3)=1.8
Conclusions: From the 3 experiments, Random approach is the worst, a naïve method.
Then, closeness centrality-based approach is worse compared with other algorithms.
In addition, our proposed SPEMA: Social Potential+Degree outperforms the other
algorithms.
Remark: As the size of network increasing, proposed approaches could improve
the naïve method and other algorithms significantly.
2) Improved and completed the WWW rejected paper
0
0.5
1
1.5
2
2.5
D3 D2 D1
Improved Perform
ance
Data Sets
Improved Performance
Improved Performance
Modification points: Formulated the design of membership function. Also, i give the high
level description of problem and idea.
3) Improving the SIGMOD rejected paper.
22. To do list
4) Continue to check the mistakes and polish the new paper for ICDM 2011.
5) Re-design the user attractor model considering more factors.
6) Study the proposed diffusion model, especially, when does the user’s threshold increase/
decrease? (Perhaps: Using the Time series Analysis, based on historical time series data)
tS
)(, tvu Sptt
tv
tu Fail to active
Active node
Inactive node
Newly active Node
23.
T
a
L
L
=========
Done list
1) I proved
Let S be
and S2 b
influence
submodu
Lemma1
Proof:
Lemma2
σ S1
proof:
That is, σ(S1
and σ(S2) un
Lemma3: Sub
Let N be a fin
===========
d the propose
the target sub
be the subset o
e spread σ(S
ular.
1: Non-negati
f: Due to S1 is
2: Monotone
v σ S1
: First, we clai
(S+v)−σ(S),
1+v)−σ(S1) ≥
nder the linear
bmodular
ite set, a set
===========
ed approach S
bset of seed n
of the most p
(S1) andσ(S2
ive .
the subset of
,σ S2 v
im R(u) be the
by definition
≥ 0 and σ(S2
r threshold mo
of functionσ
===========
SPEMA satis
nodes, S1 be t
potential node
2) under the l
f S, and S2 is
σ S2 ,
e influenced n
n
2+v)−σ(S2) ≥
odel are mono
σ : 2N−→ R i
====
===========
fies three bas
the subset of
es. |S1|=k-λ
linear thresho
the subset of
nodes by node
≥ 0. Therefore
otone.
is submodula
===========
Rese
===========
sic properties
the most infl
k, and |S2|=
old model are
f S, this lemm
u, consider th
e, the influence
ar iff
===========
earch Report
Fei Hao
2011‐6‐18
==========
.
fluential node
λk. The
e monotone a
ma holds .
he quantity σ
e spread σ(S
==
t
es
nd
σ
S1)
Proof: Cons
quan
In a same
2) Do the e
approac
NetHEP
sider two sets
ntityσ(S1 + v)
way, we can
experiments o
ches.
PT (15233 no
S1(S2) and S
v) −σ(S1)(σ(
easily prove t
on two large
odes, 58891 e
S where S1, S2
(S2 + v) − σ
thatσ(S2+v)−
datasets usin
dges), NetPH
2 are the subse
(S2)), by defin
−σ(S2) ≥σ(S
g degree, clo
HY(37154 nod
ets of S, and co
ition
S + v) − σ(S
seness, betwe
des, 231584 e
consider the
S).
eenness
edges)
3) In the L
the prop
unfolds
There is
when its
the histo
threshol
threshol
Linear thresho
pagation of in
.
an intuition
s next neighb
orical activat
ld to receive
ld value.
old model, the
nformation.
: if a node
bor try to infl
tion of his ne
, or may imp
e threshold is
Actually, the
cannot be in
uence curren
eighbors, at
prove its thr
s a fixed valu
e threshold ca
nfluenced by
nt node, the t
this point, cu
reshold to re
ue; it does not
an be changed
some active
threshold wa
urrent node
ceive, or ma
t change duri
d as the time
e neighbors, t
as changed d
may decreas
ay keep the s
ng
then,
ue to
se its
same
t=0 , threshold θ0
t=1, threshold θ1
t=3, thresholdθ2
Dynamical Variable Threshold Model (DVT)
Hence, I will design a Dynamical Variable Threshold Model to simulate the propagation
procedure in the social networks.
In this model, S denotes the neighbor nodes which tried to active current node v, but failed.
vvv V
SkS
||
||*)(
k is a random value from K={‐1,0,1}.
Explanation for k values:
4) If k=1, the threshold begins to decay, it means the user reduce his/her requirements to accept
the new activation. In this case, the user can be easily influenced next timestamp.
5) If k=0, the threshold keeps the same value. It means this user is very unshakeable.
6) If k=-1, the threshold begins to increase, it means the user improve his/her requirements to
accept the new activation. In this case, the user is a diehard.
Step1: initially, randomly give the threshold for each node.
Step2: change the threshold dynamically after each activation failed.
Step3: repeat step 2 until no any nodes can be influenced, then stop.
In fact, Dynamical Variable Threshold Model is nothing to do with the order of activation.
Let’s suppose two nodes sequences 1T , 2T
},....,{ 211 ruuuT , },....,{ '2
'1'
2 ruuuT
vvvvv V
rSk
V
TSkTS
||
||*
||
||*)( 1
1
vvvvv V
rSk
V
TSkTS
||
||*
||
||*)( 2
2
Obviously, the )()( 21 TSTS vv , therefore, the variable threshold model is nothing to
do with the order of activation.
Good points:
Proposed Dynamical Variable Threshold Model could simulate the real‐life behavior of
information diffusion and status activation.
Mathematical Formulation:
Modified Constraint:
vvv V
SkS
||
||*)(
))()(()()( ''
''
'
' vTvSbb vvTu
vuvSu
vu
Theorem: The influence spread )(I under the linear threshold model is monotone and
submodular.
1) Obviously, influence spread )(I is non‐negative.
2) Monotone: )()( IvI
3)Submodular : to prove later.
24. To do list
1) Continue the experiments using SPEMA on two large datasets.
2) Formulize the proposed diffusion models Time-dependent comprehensive cascade model (TCC
model) and dynamical variable threshold model(DVT model).
====
25.
===========
=========
Done list
1) I proved
To prove
using Lagran
We do the
===========
===========
d the two prop
the H(σ) c
nge Multiplie
e partial deriv
===========
===========
perties of soci
can reach the
er Method
vations on var
===========
===========
ial potential e
maximum va
riables p1, p2,
===========
===========
entropy H(σ
alue. I solve t
, ….pn of abo
===========
Rese
===========
).
the extreme p
ove equation,
=====
earch Report
Fei Hao
2011‐7‐2
==========
problem of H
let
t
(σ)
then, we g
Hence,
= φ(vi) Z
Obviously
= log(n).
et the equati
ℓ = 1+log(p1)
, so, φ(v1) =
y, whenφ(v1
ons as follow
) = 1+log(p2)
= φ(v2) = · · ·
) = φ(v2) = ·
ws,
) = ……….=
= φ(vn).
· · = φ(vn),
= 1+log(pn), i.
H(σ) reaches
e., p1 = p2 = ·
s the maximu
· · · = pn. Sinc
um value, i.e.,
ce pi
, H(σ)
Conc
3) I f
4) G
26.
clusions: 1) A
re
2) Random ap
finished the m
Give the forma
To do list
3) Improve th
As the size of n
eflects much a
pproach is alw
modification o
alization of the
he presentation
NetHEPT (15
NetPHY (37
network increa
advantage.
ways the worst
f WWW rejec
e two modified
n of the paper
5,233 nodes, 5
7,154nodes, 23
ases, our prop
t method.
cted paper.
d diffusion mo
r draft for ICD
58,891 edges)
31,584 edges)
posed approach
odels: TCC m
DE 2012.
)
h (Combing th
model and DV
he social pote
VT model
ential)
====
27.
===========
=========
Done list
1) Correcte
Chung ‘s
2) Formatte
happene
3) Actually
experime
1) Form
2) I giv
two
3) Frien
of tw
4) As f
mem
===========
===========
ed my paper a
s comments a
ed the submis
d in my pape
y, I have chan
ents results.
mulated the p
ve the vector-
users.
ndship can be
wo cases to m
for the friends
mbership is de
===========
===========
according to t
and suggestio
ssion paper w
r.
ged a lot for t
Hence, i wa
problem defin
-valued friend
e defined on
measure the st
ship defined o
efined.
===========
===========
the comments
ons.
with ICDE gu
the contents o
ant to do more
nition, give th
dship represen
single domai
trength of frie
on multiple d
===========
===========
s of KAIST la
ideline, chang
of WWW rej
e experiment
he preliminari
ntation for str
n or multiple
endship.
domains, an a
===========
Rese
===========
ang. center as
ged a lot of m
jected paper
s to impleme
ies of comput
rength of frie
domains. I g
aggregated de
=====
earch Report
Fei Hao
2011‐7‐16
==========
s well as prof
mistakes
r except the
ent my idea.
uting with wo
endship betwe
give the meth
egree of
t
f.
ords.
een
ods
28.
To do list
1) I will conti
2) I will impl
inue to check
ement my new
the mistakes o
w idea of www
of ICDE pape
w rejected pap
er.
per.
29.
=======
=========
Done list
1) Here, I ‘
members
Example:
Suppose t
application
Bob’s wall.
The degre
Suppose the
w_{SS^{neg
Hence, the a
===========
===========
d like to give
ship )(* x
two users A
ns, 20 mutua
Both of them
ees of membe
e weights are
g}}=0.1, W_
aggregated de
===========
===========
e an example
is calculated
lice and Bob
al friends. Ali
m are with ma
ership on diff
given by user
_{SD}=0.1.
egrees of mem
===========
===========
to explain th
d as follows,
b in a socia
ice posts 100
aster degree
ferent attribu
r w_{RS}=0
mbership:
===========
===========
e proposed ag
al network,
0 positive te
.
utes are as fo
.2, W_{MF}=
===========
Rese
===========
ggregated deg
they have 5
rms and 8 n
llows,
=0.4, w_{SS^
===========
earch Report
Fei Hao
2011‐8‐6
==========
grees of
5 common s
negative term
^{pos}}=0.2,
==
t
social
ms on
,
Hen
max
2) I design
single-do
nce, we know
ximum degree
ned the initial
omain modul
that Bob is a
e of members
l system of iF
le and friends
a special frien
ship, , ie.
Friend, it inc
ship defined o
nd of Alice ac
ludes the frie
on multiple d
ccording to th
endship define
omains modu
he principle o
ed on
ule .
f
A: I complet
B: I complet
applicat
3) For two
of K in tw
Trending
Step1: Le
Step2:
Step3: Using
Her
and
ted the imple
ted the imple
tions” ,“mutu
proposed mo
wo dynamic
g structure se
et X={x1,x2,…
Then, we c
follows,
g pattern mat
re, we calcula
d other seque
ementation o
ementation o
ual friends” a
odels, TCC m
information d
equence defi
….xn} be time
an construct
tching approa
ate the mathe
ences.
of the friends
of the friends
and “social di
model and DV
diffusion mod
inition:
series, tren
the predicta
ach to predic
ematical exp
ship defined
ship defined
stance”.
VT model, I gi
dels.
nding structu
ble informati
ct the K.
ectation of h
on single dom
on “the num
ive the solutio
ure sequence
on system fo
amming dista
main.
mber of comm
on for predict
{δ1, δ2… . , δ
or two models
ance betwee
mon
tion
δn}.
s as
n Sq
Step 4: Fin
1) TC
Th
fol
2) DV
ally, our prop
CC model
he influence p
llows,
VT model
posed models
probability fro
s can be form
om active ind
mulized as foll
dividual to ina
ows,
active neighb
bor is defined
d as
30.
Th
To do list
3) Continue th
domains.
4) For propos
dynamica
5) Read the
to figure
he diffusion th
he implement
.
sed two modif
al influence m
paper: “Sentim
out the correl
hreshold will
tation of iFrie
fied dynamica
maximization p
ment propagat
lation between
be changed d
end. Especially
al diffusion mo
problem based
ation in social
n sentiment an
dynamically
y, the friendsh
odels, I will gi
d on proposed
networks: a c
nd social influ
hip defined on
ive the proble
d diffusion mo
ase study in L
uence.
n multiple
em statement o
odels.
LiveJournal”. T
of
Try
====
31.
===========
=========
Done list
1) Sentimen
Motivat
a) How
b) Doe
c) Wha
Contribu
1) Fo
2) Qu
3) Ide
In a wo
2) Sentimen
My idea bas
Since work1
detect the s
I think that s
Finding sen
Motivation
===========
===========
nt propagation
tion:
w do individu
es sentiment p
at different ro
ution:
ormally define
uantify and pr
entify feature
ord, this pape
nt community
sed work1 an
1 has proved
entiment com
some good is
ntiment leade
ns: If a comp
can contro
has a ne
===========
===========
n in social netw
uals influence
propagate and
oles do indivi
e and study th
redict the occ
es that result i
er proved tha
y detection in
nd work2:
that sentime
mmunities in
ssues can be
er in social ne
pany could fi
ol and adjust
egative sentim
===========
===========
works: a case
e each other in
d how does se
iduals play in
he propagatio
currence of a
in a sentimen
at the sentim
n social netwo
ent can prop
social netwo
investigated
etworks.
nd the sentim
t their marke
ment on a
===========
===========
study in Live
n social netw
entiment prop
n propagation
on of sentimen
sentiment pro
nt propagation
ment can prop
orks
agate, and w
orks.
based work1
ment leader
eting strategi
certain prod
===========
Rese
===========
Journal
orks?
pagate?
?
nt in social n
opagation
n
pagate in soc
work2 give a f
and work 2.
in social net
es. For exam
duct, the co
=====
earch Report
Fei Hao
2011‐8‐20
==========
networks
cial networks
formal metho
tworks, then
mple, when us
ompany may
t
.
od to
they
ser A
y not
convince user A to spread, also will not introduce the new products to user A.
By contrast, if user A has positive sentiment on a certain product, the
company will give some benefits to A and induce A to spread the information
to A’s friends. As we know, users influence each other due to the sentiment
propagation. Hence, finding the sentiment leader is becoming important. It is
helpful to marketing and information diffusion.
Problem: Finding the sentiment leader in sentiment communities. There is an
assumption that there exists a sentiment leader who can disseminate his
or her sentiment to their friends, and their friends will disseminate their
sentiment to their friends’ friends and so on.
Solution Framework:
There are two kind of sentiment representations.
1) Discrete sentiment value-based
2) Continuous sentiment value-based
Technical route:
1) Discrete sentiment value-based
For example: a) positive and negative sentiment b) 5- stars scale rating system (very
bad, bad, neural, good, and very good)
In this case, a social networks can be represented as G(V,E,S), S : each user holds
certain sentiment si towards a particular product or topic.
Formulation for Social Sentiment Network:
G=(V,E,S) where V:{v1,v2,…vn} each node vi represents a user.
E:{eij} eij represents a relationship between two users vi and vj,
S:{s1,s2,…sn}, each user vi holds certain sentiment si towards a
particular product or topic.
In Discrete sentiment value‐based system, si={positive, negative}, si={a,b,c,d,e}(5‐star
Sentiment
Community
Detection
Finding
Sentiment
Leader
Graph Topology
Information
scale rating system)
I.E, problem is convert to finding the leaders in the social sentiment network.
2) Continuous sentiment value-based
For example, sentiment is quantified using a value. This value is not discrete. Hence,
when we detect the sentiment community, we should design a new detection
method to get the communities.
Formulation for Social Sentiment Network:
G=(V,E,S) where V:{v1,v2,…vn} each node vi represents a user.
E:{eij} eij represents a relationship between two users vi and vj,
S:{s1,s2,…sn}, each user vi holds certain sentiment si towards a
particular product or topic.
In Continuous sentiment value‐based system, si=(0,1]
Community detection
We should consider the mechanism of sentiment propagation in social networks.
Propagation Intuition: if the overall mood of a user A is closer to the overall mood of
the community C. Then, user A belongs to C.
Evaluation:
I will compare the sentiment leader finding algorithm by degree, closeness, betweenness,
and random, as well as social potential.
3) Wrote the section of the prediction of parameter K in proposed influence diffusion models
TCC, DVT models.
Sentiment
Community
Detection
Finding
Sentiment
Leader
Graph Topology
Information
Sentiment
Propagation Model
32. To do list
6) Give the specific idea of finding sentiment leader in social networks. Study the discrete
sentiment value-based, leaders’ identification problem.
7) Result analysis on iFriend paper.
8) Read paper: “Identifying Opinion Leaders in the Blogosphere” CIKM 2007.
======================================================================
Research Report
Fei Hao
2011‐9‐3
======================================================================
33. Done list
3) Finish the writing of paper “Influence Strength Aware Diffusion Models for Dynamic
Influence Maximization in Social Networks”.
The focus of this paper is proposing two modified diffusion models. They mainly study
the dynamics of the information propagation. But, the traditional Independent cascade
model is a kind of decay model as the information propagation. My model is more
reliable to the realistic network. It should not be a decay information diffusion model
while it should be a Time-dependent comprehensive cascade model and DVT model. It
depends on the previous transaction.
Model Feature comparisons
Independent Cascade(ID)
Model
Linear Threshold
(LT) Model
TCC Model DVT Model
Influence
Maximization
1) Each active
individual
attempts to
activate each of
its neighbor
independently
2) After the single
attempt, the active
individual
becomes latent
1) A node has
random
threshold
2) The threshold
will not change
‐ ‐
Dynamic
Influence
Maximization
The active individuals never
become latent during the
spreading process
Each active
individual is given
only one attempt to
activate any of its
inactive neighbor
The influence
probability might be
increased,
decrease, or
changeless. It
depends on the
previous activation
trials
The threshold of each
node can be changed.
It depends on the
previous activation
trials
Highligh
a) We i
prob
b) Influ
histo
prob
Simi
chan
incre
beha
c) Rela
The
to at
K=1
The
2) Finding Se
The main t
Problem St
Objective
Input: sen
Output: g
Dataset fr
hts:
incorporate th
bability in TC
uence probab
orical interact
bability could
ilarly, traditio
nged. But DV
eased, or chan
aviors.
ationship betw
TCC model
ttenuation mo
1.
DVT model
entiment Lea
technical rout
tatement:
e: Maximize t
ntiment netw
generate a se
rom Epinions
he methodolo
CC model and
ility in Tradit
tions (activat
d be decreased
onal LT mode
VT model con
ngeless in ter
ween propose
is a generaliz
odel with dep
can also be d
aders in Socia
te of SentiRan
he sentiment
work G, sentim
ed set T of ca
.com
ogy of individ
d threshold in
tional IC diff
tions). Our TC
d, increased o
el assumes th
nsiders the thr
rms of individ
ed models and
zed model of
pendency of in
degraded as a
al Networks
nk is
t coverage by
ment commu
ardinality k.
dual ethology
n DVT model
fusion model
CC model con
or changeless
at the thresho
reshold of ind
duals’ sentim
d traditional m
IC model, T
nfluence diffu
kind of LT
y seed set T
nity iC and
y to evaluate t
.
is independen
nsiders that a
.
olds of individ
dividual migh
ent, attitude a
models
TCC model c
usion and tim
T model when
d a number K
the influence
nt on previou
an influence
duals are not
ht be decrease
and other soc
can be degrad
me feature wh
n k=0.
K
us
be
ed,
cial
ded
hen
Sent
1)
2)
Hen
1)
2)
Sent
Give
repr
timent Repre
Positive and
5-star scale r
ce, I will disc
Detect the p
This proble
Basic idea:
Detect the 5
timent Leade
e a sentiment
resented as fo
sentation Sys
negative
rating (for exa
uss the sentim
positive and n
em is converte
: try to clust
distrust ea
5-star scale ra
rs Identificat
t communityC
ollows:
stem:
ample , Epini
ment commu
negative senti
ed to maximi
er the users
ch other.
ating sentime
ion
iC , a numbe
ions.com, i
unity detectio
iment commu
ize the agreem
who trust e
ent communit
r k , the top‐k
iphone 3G W
on approache
unities
ment on users
each other, w
ties
k sentiment l
White)
es for above 2
s’ sentiment
while separat
easers identi
2 cases:
t within clust
te the users
ification is
ters.
who
It is
com
betw
The
Eval
1)
2)
s to return
mmunity iC ,
weenness, so
Algorithm of
uation Metri
One-step sen
All-path sent
top‐K users
where
cial potential
f SentiRank is
ics:
ntiment cover
timent covera
with maxim
can be va
l.
s described as
rage
age
mum
arious users p
s follows,
as the sen
position metr
ntiment leade
ics. Such as d
ers in senti
degree, close
ment
ness,
34. To do list
1) I am working on the degree-based sentiment leaders identification in each communities
for two sentiment representation system with evaluation metric of one-step sentiment
coverage.
2) Discuss more related works about diffusion models in social networks.
====
35.
===========
=========
Done list
1) I answer
2) Datasets
A) For p
I ado
(http
First
Basi
have
B) For
I col
iPho
3) Sentimen
CASE 1
CASE 2
===========
===========
red the ICDE
s collection
positive and n
opt the datase
p://snap.stanf
tly, we detect
ic idea: If the
e the same sen
the 5-stars sc
llect the prod
one 3GS Wh
nt Communit
:
:
===========
===========
feedback req
negative sent
et from SNAP
ford.edu/data/
t the sentimen
ere exists a tru
ntiment. Othe
cale rating sen
duct rating dat
hite (16GB) S
ties Visualiza
===========
===========
quest with Pro
timent repres
P Stanford Un
/index.html).
nt communiti
ust relationsh
erwise, they h
ntiment repre
taset from Ep
Smartphone”
ation Represen
Positive S
Negative Se
===========
===========
of. Chung.
entation syste
niv.
It is a trust
ies according
hips between
have differen
esentation sys
pinions.com w
”. I obtain 84
ntation
Sentiment Co
entiment Com
===========
Rese
===========
em
relationship
to the optimi
two users, th
nt sentiment.
tem
with query ke
reviews by 8
ommunity
mmunity
=====
earch Report
Fei Hao
2011‐9‐17
==========
dataset.
ization appro
hen two users
eywords “App
84 customers.
t
oach.
who
ple
.
36.
Sentime
Actually
the basic
Here, the
Question
leaders c
interactio
sentimen
Hence, f
importan
In sentim
leaders a
(CU
bigtr
To do list
Sentiment Co
ent Commun
y, each sentim
c structure of
ere is an inter
n: Who is the
can be consid
on (sentimen
nt leaders.
finding the se
nt problem.
ment commun
are identified
)5,★★★★★C
ruckseries}
ommunities o
nities Structu
ment commun
each commu
resting issue a
e sentiment le
dered as users
t interaction)
ntiment leade
nity with 5 sta
by propose
) {sexymam
of Dataset II (
ure Analysis
nity is a sub-st
unity is compo
and challenge
eader in each
s who can init
, the sentimen
ers in each se
ars, we denot
ed algorithm S
ma442, three_
(It has five se
tructure of so
osed of some
e to be solved
sentiment co
tiator of senti
nt of some us
entiment com
te it with ★C
SentiRank.
_ster, kyreejd
entiment com
ocial graph. In
fragile sub-n
d.
ommunity?
ment. Due to
sers will be in
munity is bec
★★★★★ , the t
, s-o-m-e-g-u
mmunities)
n another wor
networks.
Sentiment
o the social
nfected by the
coming an
top-5 sentime
u-y,
rd,
e
ent
1) I will study a master thesis “An Information Diffusion Approach for Detecting Emotional
Contagion in online Social Networks” (Arizona State University)
A) To learn the diffusion model for sentiment propagation
B) Find out the potential role of the sentiment leaders for sentiment propagation.
======================================================================
Research Report
Fei Hao
2011‐10‐1
======================================================================
37. Done list
1) Seminar preparation.
I consider the sentiment factor into the social advertising. Hence, an interesting problem
is proposed, called “Finding Sentiment Leaders from Social Networks ”.
The difference between this problem and influence maximization:
D1) the sentiment leaders finding problem is more complex, because it considers the
sentiment factors on the basis of influence maximization.
D2) in influence maximization, default sentiment is positive without any other special
sentiments, such as negative and other rating sentiment.
D3) Sentiment analysis is more useful to social advertising.
D4) Influence maximization is special case of new defined problem.
2) Improve the writing of recent paper.
Contributions:
Considering both dynamics and the influence strength
A) Time-dependent Comprehensive Cascade Model (TCC)
B) Dynamic Variable Threshold Model (DVT)
Provide a prediction approach regarding when the influence strength should be changed
in two proposed models.
3) Reading the paper “Analysis of terrorist social networks with fractal views”. (JIS journal)
User position evaluation with fractal views.
It is a new idea to study the users position, it is benefit to my current research
topic—social marketing, social advertising.
That is a computational method to evaluate the user’s importance in social networks.
38. To do list
1) Begin the experiments of Sentiment Leaders Finding problem.
2) Consider the user position evaluation using fractal views, give an approach to calculate
the importance of nodes in social networks.
39.
======
=========
Done list
1) Influenc
maximiz
Here, I ju
the diffu
i.e., in ea
Indicates
in‐neigh
Hence, i
In the DV
In above t
social ne
Open issue
I think th
improve
products
===========
===========
e strength an
zation in socia
ust simply co
usion models.
ach model, th
s the probab
bor.
n TCC model,
VT model, th
two models,
etworks.
e:
here exists a
e his/her activ
s will be chan
===========
===========
d sentiment a
al networks
onsider the po
here is a para
ility that a no
, the influenc
e threshold is
we focus on t
certain relati
vation thresh
nged. I gues
===========
===========
aware diffusio
ositive and ne
ameter which
ode stays pos
ce probability
s redefined a
the study of t
onship betwe
old at time st
ss there exist
===========
===========
on models for
egative sentim
h called qualit
itive after it i
y is redefined
s follows
the positive i
een K and q,
tep t, his/her
s a non‐linea
===========
Rese
===========
r dynamic inf
ment (negativ
ty factor of th
s activated by
as follows,
nfluence diff
i.e, if a user d
r sentiment to
r relationship
===========
earch Report
Fei Hao
2011‐10‐15
==========
fluence
ve opinions) in
he product;
y a positive
fusion in dyna
decides to
owards to the
p between th
===
t
nto
amic
e
hem.
S
2: W
A is
F is a
2) User’s p
Suppose ther
We convert th
a focus:
a focus
osition measu
re is a social n
e social grap
urement using
network:
h into a tree,
g fractal view
, the resulting
Tree (A)
ws
g trees with f
ocus of A and
d F are as foll
ows:
Tree (F)
3:the importance of each node:
Topological view:
Using the number of diffusion paths to measure the importance
For example: A can propagate the information to 6 nodes. 4 nodes in the first level, i.e., its
directly influenced nodes is 4.
The importance can be calculated as follows: 4×0.5+3×0.25=2.75
Fractal Views:
Suppose the fractal value of focused node is 1. The sum of fractal values of children nodes
equals to the fractal value of their parent nodes. Due to various weights between each child
nodes and father nodes, the fractal values of each node are different.
For tree (A)
B: fractal value=4 5 2 1
*3 4 5 2 4 1
=0.26
C: fractal value=3 5 2 1
*3 4 5 2 4 1
= 0.23
E: fractal value= 3 4 2 1
*3 4 5 2 4 1
=0.21
F: fractal value=3 4 5 1
*3 4 5 2 4 1
=0.28
B+C+E+F: the summation of fractal value =1.
The fractal value of D= the fractal value of E,为 0.21.
The fractal value of G: )(*12
1*
53
5Ffractal
=
5 1* *
3 5 2 1F
分形值 =0.175
The fractal value of H : fractal(F)‐fractal(G)=0.105
How
Lev
Lev
He
H
To
influ
F
calcu
influ
Sp
For
topo
40.
w to calculate
vel 1: ( fractal
vel 2: (fractal(
nce, from the
ere is a prob
opological Pa
uence the chi
Fractal Part:
ulate the fra
uence should
pecial case:
above two g
ological part a
To do list
4) Study th
the importan
l(B)+fractal(C
(D)+fractal(G
e fractal view
lem: how to
art: If there e
ldren nodes.
It consider
actal value fo
be propagate
graphs, the f
and fractal pa
he performanc
nce of node A
C)+fractal(E)+f
)+fractal(H))*
w of points, th
o combine tw
exists many n
rs the weight
or each node
ed.
fractal values
art when we
ce of obtainin
A:
fractal(F))*0.
*0.25=0.21+0
he importance
wo measurem
nodes in the
ts between e
e. Fractal v
s are exactly
evaluate the
ng the top-k s
5=0.5
0.175+0.105)
e of node A
ments in an ef
same level,
each node an
value is a ref
y same. Ther
importance o
eed nodes for
)×0.25=0.1
is 0.5+0.122
ffective way ?
an importan
nd its parent’
fined propert
re, we have
of node.
r ICDE reject
1225
25=0.6225
?
nt node can e
’s node. Then
ty for how m
to consider
ted paper.
easily
n, we
much
both
41.
=======
=========
Done list
1) I’ve fini
Models
If possib
2) I wrote
Fractal V
3) I give th
===========
===========
ished my pap
for Dynamic
ble, I want to
another new
Views in Soc
he performan
===========
===========
per entitled “I
c Influence M
submit to AC
paper entitled
cial Networks
ce of obtainin
===========
===========
Influence Stre
Maximization
CM PODS co
d “TFRank: A
s”.
ng the top-k m
===========
===========
ength and Sen
in Social Net
nference.
An Evaluation
market mover
===========
Rese
===========
ntiment Awar
tworks”.
n of Users Im
rs in terms of
===========
earch Report
Fei Hao
2011‐11‐5
==========
re Diffusion
mportance wit
f running tim
==
t
th
e
42.
To do list
5) Working
I will com
6) Improve
Fractal V
g on my previ
mpare the TF
e the writing o
Views in Soci
ious work: ho
FRank algorit
of the paper ”
ial Networks”
ow to find the
thm with othe
” TFRank: An
”
e sentiment le
er existing alg
n Evaluation o
eaders from so
gorithms.
of Users Imp
ocial network
portance with
ks.
43.
=======
=========
Done list
1) Read the
VLDB 2
Motivat
the ident
be achiev
Contrib
1) De
tra
2) Sh
3) Co
lea
acc
Main C
Suppo
T_{S,u
Tota
T_{S,
Defin
K_{
The
===========
===========
e paper “A Da
2011).
tions: Viral m
tification of in
ved.
butions:
eveloped a ne
aces that allow
how that influ
ompare the pr
arned from re
curacy.
Contents:
ose Seed nod
u} is the fract
l credit given
,u}(a)=1*0.25
ne the total in
{S,u}=1/|A| ∑
e influence sp
===========
===========
ata-based App
marketing, soc
nfluential use
ew model call
ws us to direc
uence maximi
roposed appro
al data, and s
es set S={v,z}
ion of flow re
to v for influ
5+0.25+0.5*0
nfluence cred
∑ ,∈
pread can be
===========
===========
proach to Soc
cial advertisin
ers, by targeti
led credit d
ctly predict th
ization under
oach with the
show that the
},
eaching u tha
uencing u for
0.25+1*0.25=
dit for all the
defined using
===========
===========
cial Influence
ng. One of the
ing whom cer
distribution, b
he influence s
credit distrib
e standard app
credit distrib
at flows from
action a.
=0.875
actions in A (
g above total
===========
Rese
===========
e Maximizatio
e key problem
rtain desirabl
uilt on top of
pread of node
bution is NP-h
proach with e
bution mode p
either v or z.
(actions set)
influence cre
===========
earch Report
Fei Hao
2011‐11‐19
==========
on”( publishe
ms in this are
le outcomes c
f real propaga
de sets.
hard.
edge probabil
provides high
.
edit
==
t
ed in
a is
can
ation
lities
her
Vu
uSKS ,)(
Comments: most of the literature on influence maximization has focused mainly on
the social structure, in this paper, the authors proposed a novel data‐based approach, that
directly predict the influence spread.
The proposed credit distribution model directly estimate influence spread by
exploiting historical data, thus avoiding the need for learning influence probabilities, and more
importantly, avoiding costly MC simulations.
CD model is closest to ground truth. It is highly scalable algorithm.. CD model is not a
propagation model, but prediction model of influence spread according to the credits.
2) I continued to check and improve the presentation of my paper.
I make the contributions clearly. I made a table for important variables appear in the
paper.
44. To do list
1) Based on my proposed framework SentiRank. Design an efficient sentiment leader
mining algorithm
2) Do the experiment on sentiment leaders finding problem
45.
Prob
Give
prob
the c
The
======
=========
Done list
1) Discover
2011)
Motivati
collabora
Each tea
Project:
Expert: a
Social ne
any two
blem 1: witho
en a project P
blem of team
communicati
Communicat
===========
===========
ring Top-K te
ions: Given a
ate in order to
am might/mig
set of require
an individual
etwork: repre
experts)
out a leader
P and a graph
formation w
ion cost of T,
tion Cost is ca
===========
===========
eams of Expe
a social netwo
o complete a
ght not have a
ed skills
with a specif
esents strength
G representi
without a lead
defined as th
alculated as f
===========
===========
erts with/with
ork, find top-k
project.
a leader.
fic skill-set
th of relations
ing the social
der is to find a
he sum of dist
follows:
===========
===========
hout a leader i
k teams of ex
ships (the deg
l network of a
a team of exp
tances of T, is
===========
Rese
===========
in social netw
xperts that can
gree of collab
a set of exper
perts T for P f
s minimized.
===========
earch Report
Fei Hao
2011‐12‐3
==========
works (CIKM
n effectively
boration betw
rts C, the
from G so tha
===
t
M
een
at
Prob
Give
prob
the l
mini
The
been
blem 2: with a
en a project P
blem of team
leader of the
imized.
Communicat
Concl
Two
1) Fin
2) Fin
An a
n proposed.
a leader
P and a graph
formation w
team so that
tion Cost is ca
usions:
problems are
nding top‐k te
nding top‐k te
pproximation
G representi
with a leader i
t the commu
alculated as f
e defined:
eams of expe
eams of expe
n algorithm fo
ing the social
is to find a te
nication cost
follows:
erts with a lea
erts without a
or finding a t
l network of a
am of expert
t, defined as t
ader.
a leader.
eam of exper
a set of exper
s T and an ex
the leader dis
rts without/w
rts C, the
xpert L from C
stance is
with a leader
C as
have
Extensions:
I think above problem ignored the skill proficiency of experts. That is to say, if we assign the
experts to complete a project without consideration of skill proficiency, user may not unsatisfied
this assignment. However, if we consider the skill proficiency and communication cost
together over the team formation problem, then it is coinciding with real world life.
Problem Statement:
Let },....,{ 21 ncccC denote a set of n experts, and },...,{ 21 nsssS denote a set of m
skills. Each expert ic has a set of skills, denoted as )( icQ , and ScQ i )( . If )( ij cQs ,
expert ic has skill js . Each expert ic have various proficiency on his/her each skill js ,
denoted as )( jsR , )( ij cQs . For a project SP is defined as a set of skills required to
complete the project.
Input : a social graph G, and a project P.
We have two objectives to optimize
1) Minimize the communication cost between them (find the experts who can easily to
collaborate or communicate)
2) Maximize the proficiency among them. (person’s proficiency of a certain skill)
Output: return the set of experts with the minimum communication cost (sum of distance in a
case of without a leader and leader distance in a case of with a leader) and maximum of
proficiency R.
There is an intuition design for our two objectives:
Method 1: we can give an overall optimization variable (such as O)
cyproficicen
ionCostCommunicatO
(1)
Our aim is to maximize the overall optimization variable O. Here, we called O utility.
Method 2: It is difficult to satisfy above two objectives together. Therefore, I think there exists a
parameter to balance this two performance.
Max oficiencyionCostCommunicat Pr*)1(*
Obviously, if want to Max above function, we may increase the value of parameter as much
as possible. But, it is a little abstract to understand this formula.
Here
Simp
Defi
ps ,
Topi
Cont
For a
com
max
them
acti
Here
team
0.88
e is a questio
ply, we can m
nition (Sum o
pSc >}, the s
ic title: Utility
tinue the exa
a project P={A
mmunicate, sm
The n
ximize the deg
m. Because,
vity .
e, we may giv
m of experts.
0.75,0.87,0.4
n? How to
make the sum
of proficiency
sum of profic
y maximizati
ample :
AI,DB,DM,IR}
maller numbe
umbers in th
gree of profic
degree of p
ve a Sdistance
This new dist
43
determine th
mation of ea
y) Given a tea
iency of T is d
ofsum Pr
on aware tea
}, the numb
er represents
e boxes deno
ciency of the
roficiency is m
e: a new dista
tance SDIS ca
he proficiency
ch user’s pro
am T of exper
defined as
oficiency
am formation
bers on the e
better comm
ote the degre
team and mi
more associa
ance for mea
an be devised
y of a team?
oficiency.
rts for a proje
p
isi
cR1
)(
n in task‐orie
edge represen
munication. A
ee of proficien
inimize the co
ted with use
suring the ex
d according to
0.85
ect :{<1
,1 Scs
ented social n
nts how easily
t the same ti
ncy, my motiv
ommunicatio
r’s attitude a
xperts and fur
o the descript
0.95
>,<2
,2 Scs
network.
y two experts
me,
vation is try t
on cost betwe
and subjecti
rther form a
tion of Eq.(
0.65,0.7
>…<
s can
to
een
ve
(1).
4
2) I formally give the proof of NP‐hard for dynamic influence maximization under the
proposed models.
By consider an instance of existing NP problems and prove that.
46. To do list
1) I will formulate the problem statement mathematically, and give an initial solution idea
for that.
======================================================================
Research Report
Fei Hao
2011‐12‐17
======================================================================
47. Done list
Basic Problem Descriptions:
Let },....,{ 21 ncccC denote a set of n experts, and },...,{ 21 nsssS denote a set of m
skills. Each expert ic has a set of skills, denoted as )( icQ , and ScQ i )( . If )( ij cQs ,
expert ic has skill js . Each expert ic have various proficiency on his/her each skill js ,
denoted as )( jsR , )( ij cQs . For a project SP is defined as a set of skills required to
complete the project.
Input : a social graph G, and a project P.
We have two objectives to optimize
3) Minimize the communication cost between them (find the experts who can easily to
collaborate or communicate)
4) Maximize the proficiency among them. (person’s proficiency of a certain skill)
Output: return the set of experts with the minimum communication cost (sum of distance in a
case of without a leader and leader distance in a case of with a leader) and maximum of
proficiency R.
There is an intuition design for our two objectives:
Method 1: we can give an overall optimization variable “Utility”
tioncommunicat
oficiencyU
cos
Pr
(1)
Our aim is to maximize the utility.
Here is a question? How to determine the proficiency of a team?
Simply, we can make the summation of each user’s proficiency.
Definition (Sum of proficiency) Given a team T of experts for a project :{<1
,1 Scs >,<2
,2 Scs >…<
pSp cs , >}, the sum of proficiency of T is defined as
Topi
Cont
For a
com
max
them
acti
Prob
We
team
PRO
Give
awa
from
0.88
ic title: Utility
tinue the exa
a project P={A
mmunicate, sm
The n
ximize the deg
m. Because,
vity .
blem Definit
will give two
m formation
OBLEM 1 (UM
en a task‐orie
re team form
m 'G so that
0.75,0.87,0.4
y maximizati
ample:
AI,DB,DM,IR}
maller numbe
umbers in th
gree of profic
degree of p
tions:
o formal defi
with a leade
M‐TF)
nted social n
mation proble
the utility of
43
ofsum Pr
on aware tea
}, the numb
er represents
e boxes deno
ciency of the
roficiency is m
initions for tw
er
etwork 'G
m without a
T
oficiency
am formation
bers on the e
better comm
ote the degre
team and mi
more associa
wo cases: 1
),( '' EV
leader (UM‐T
p
isi
cR1
)(
n in task‐orie
edge represen
munication. A
ee of proficien
inimize the co
ted with use
) team form
and a task
TF) is to find
0.85
ented social n
nts how easily
t the same ti
ncy, my motiv
ommunicatio
r’s attitude a
ation withou
, the utility
a team of exp
0.95
network.
y two experts
me,
vation is try t
on cost betwe
and subjecti
ut a leader 2
maximization
perts T for
0.65,0.7
s can
to
een
ve
2)
n
4
PRO
Give
awa
an e
The
1)
2)
3)
4)
This
5)
48.
OBLEM 2 (UM
en a task‐orie
re team form
expert L from
framework o
We divide so
Select an exp
Select the 2nd
Repeat step 3
Step 1
is a greedy a
I collected th
Paper collabo
a) Consider
b) Consider
To do list
2) Do the e
with exi
M‐TF‐L)
nted social n
mation proble
m 'V as the
of algorithm:
ome groups th
pert with the bd expert who
3 until I found
algorithm.
he datasets for
oration netwo
r the frequenc
r the frequenc
experiments f
isting approa
etwork 'G
m with a lead
leader of the
hat each group
biggest profic
can maximiz
d all of skill g
r experiments
ork
cy of user’s c
cy of each pa
for case witho
ch in terms o
),( '' EV
der (UM‐TF‐L
e team so tha
up correspond
ciency.
ze the utility i
groups.
Step 2
s.
contributions
air as the com
out a leader.
of proficiency
and a task
L) is to find a
t the utility o
ds a cluster wi
f add him or
as the profici
mmunication c
I will comp
y and cost, res
, the utility
team of expe
of T.
ith a certain s
her into the r
iency for a ce
cost.
pare my propo
spectively.
maximization
erts T for a
skill.
results.
Step3
ertain skill.
osed approac
n
and
h