Upload
ross-francis
View
216
Download
2
Embed Size (px)
Citation preview
Modeling Relationship Strength in Online Social Networks
Rongjian Xiang1, Jennifer Neville1, Monica Rogati2
1Purdue University, 2LinkedIn
WWW 2010
2010. 08. 13.
Summarized and Presented by Sang-il Song, IDS Lab., Seoul National Uni-versity
Copyright 2010 by CEBT
Introduction β Social Network
Homophily ( λμ§μ± )
the tendency of individuals to associate and bond with simi-lar others
βBirds of a feather flock togetherβ
Found in many real-world and online social networks
Research Area
Network Structure Analysis
Link prediction β βWho will be my friend?β
Community Detection
Item Recommendation
2
Copyright 2010 by CEBT
Introduction
Past work has focused on social networks with binary ties.
e.g., friends or not
Binary indicators provide only a coarse indication of the relationship.
Pairs of individuals with strong ties (e.g., close friends) are likely to exhibit greater similarity then those with weak ties (e.g., acquaintances)
Treating all relationships as equal will increase the noise and degrade the performance
Pruning away spurious relationships and highlighting stronger relationship has improved the accuracy of the models.
3
Copyright 2010 by CEBT
Related Works
I. Kahanda and J. Neviile. Using transactional information to predict link strength in online social networks, ICWSM09
E. Gilbert and K. Karahalios. Predicting tie strength with social media. CHI 09
Binary prediction task
β Strong ties or Weak ties
Supervised learning
β Involved in efforts on human annotations
Friendship Rating
Top friend nomination
4
Copyright 2010 by CEBT
Goal
A model to infer relationship strength
Based on profile similarity and interaction activity
Automatically distinguishing strong relationships from weak ones
β Unsupervised
Relationship strength is represented as continuous value
β Full spectrum of relation strength, from weak to strong
Scalable approach
β Suitable for online application
5
Copyright 2010 by CEBT
Assumptions of the Model
The higher the similarity, the stronger the tie
There is many common feature between β μ©μ§β and me, so we have strong relationship.
Relationship strength directly impacts the nature and frequency of online interactions between a pair users
β μ²λ¦Όβ is close with me if he has many chat with me in mes-senger.
The independence of interactions
6
Copyright 2010 by CEBT
Variables of the Model
Profile: the data of specific user
: profile vector of individual
e.g., school, company, region, industry, job of the user
Interaction: the activity between two users
: occurrences of the interaction between and
e.g., reply, retweet (in twitter)
e.g., tagging the person in a picture, posting oneβs wall (in facebook)
Relationship Strength
latent relationship strength
7
Copyright 2010 by CEBT
Graphical model representation
|)
8
π¦ 1(ππ) π¦ 2
(ππ) π¦π(ππ)
: profile vector: occurrences of the interactionlatent relationship strength
π±β(π) π±β
( π )
π§β(ππ )
Copyright 2010 by CEBT
Model Specification
Inferring relationship strength from user profile
Using similarity vector
e.g., : 1 if and in the same company, 0 otherwise
e.g., logarithm of the normalized counts of common groups that and join
Adopting the Gaussian distribution
9
π (π§ ( ππ )|π±(π) ,π± ( π)ΒΏ=π (π°ππ¬ (π± π ,π± π ) ,π£)Weighted sum of
similarity measuresp
0
π±β(π) π±β
( π )
π§β(ππ )
To be esti-mated
z
Blue represents similar two usersRed represents unsimilar two users
Copyright 2010 by CEBT
Model Specification
Inferring relationship strength from interactions
Modeling all interactions as binary variables
Introducing an auxiliary( 보쑰 ) variables
β Capturing auxiliary causes of the interactions which are inde-pendent of the relationship strength
β e.g., the total number of pictures that a user has tagged repre-sents their intrinsic tendency to tag pictures
Using sigmoid function
is to be estimated
10
π¦ 1(ππ) π¦ 2
(ππ) π¦π(ππ)
π§β(ππ )
π (π₯ )= 1
1+πβπ₯
Weighted sum of auxiliary variables and z
Copyright 2010 by CEBT
Model Specification
11
π¦ 1(ππ) π¦ 2
(ππ) π¦π(ππ)
π±β(π) π±β
( π )
π§β(ππ )
π¦ 1(ππ) π¦ 2
(ππ) π¦π(ππ)
π¬β(ππ )
π§β(ππ )
π1(ππ ) π2
(ππ )ππ
(ππ )
π (π· ,π€ ,π )=π (π·|π€ ,π )π ΒΏ
Copyright 2010 by CEBT
Inference
Find the point estimates, that maximize
Using gradient method
Using Newton-Raphson updates to weight updates
12
Copyright 2010 by CEBT
Experiment
Two dataset is prepared for experiments
β Business-Oriented Social Network
β Members can search member profiles and job postings
Facebook Data
13
Copyright 2010 by CEBT
LinkedIn Dataset
100 seed users and their tow-hop neighborhood (100000 pairs)
Overall similarity
Interaction features
14
1 if and went to same school, 0 otherwise
1 if and work in the same company, 0 otherwise
1 if and are in the same geographical region, 0 otherwise
1 if and are in the same industry, 0 otherwise
1 if and have the same job title, 0 otherwise
1 if and are in the same functional area, 0 otherwise
Logarithm for the normalized counts of common groups that and join
Logarithm for the normalized counts of common connections that and join
1 if and have established a connection, 0 otherwise
1 if has written a recommendation for 0 otherwise
1 if has viewed βs profile, 0 otherwise
1 if has included in his or her online LinkedIn address book, 0 otherwise
Copyright 2010 by CEBT
Evaluation (in LinkedIn Dataset)
Estimating relationship strength with
Job
Functional area
Geographical region
Measuring how well the estimated relationship strengths
Identifying feature values ( same school, same company, same industry)
Measuring the are under the ROC curve (AUC)
Comparing relationship strength to
Recommendation links
Profile view links
Address book links
Connection links
Interaction count
Profile similarity
15
Copyright 2010 by CEBT
Receiver Operating Characteristic (ROC)
TPR (sensitivity)
eqv. with hit rate, recall
TPR = TP / P = TP / (TP + FN)
FPR
eqv. with fall-out
FPR = FP / N = FP / (FP + TN)
AUC (Area Under ROC Curve)
16
Copyright 2010 by CEBT
The result on LinkedIn dataset
17
Copyright 2010 by CEBT
Facebook dataset
5 public Purdue Facebook user and their three-hop neighborhood
4500 nodes and 144,712 pairs
Overall similarity
Not using personal profile data
Interactions
18
logarithm of the normalized counts of common networks for which and are both member
logarithm of the normalized counts of common group that and join
logarithm of the normalized counts of common friends that and share
1 if has posted on βs wall, 0 otherwise
1 if has tagged in a picture, 0 otherwise
Copyright 2010 by CEBT
Evaluation (in Facebook Dataset)
Comparing the relationship strength of the model to other weighted graph
Friendship graph: strong/weak relationships
Top-Friend graph: strong relationships
Wall graph: interactions
Picture graph: interactions
Evaluating
Autocorrelation improvement
Classification improvement
19
Copyright 2010 by CEBT
Evaluation (in Facebook Dataset) Autocorrelation
Statistical dependency of the same attribute on related instances
β K is the number of possible categorical value of the attribute
β is the observed occurrence
β is the expected occurrence
β If the observed occurrence is increasing, then the autocorrelation is also increasing
β e.g., Geographical region attribute has higher autocorrelation than fa-vorite baseball team attribute in friendship network
Classification performance
The Gaussian Random field (GRF) model is used to classification
20
Copyright 2010 by CEBT
Autocorrelation improvement
21
Copyright 2010 by CEBT
Classification improvement
22
Copyright 2010 by CEBT
Conclusions
A latent variable model for the task of relationship strength estimation
Latent variable model capture the causality of the underly-ing social process
Hybrid approach of generative model and discriminative model
β Not suffering from sparsity of interaction
β The latent variable is inferred using only upper level in model
β Predicting future interactions is also possible
Predicting new connections
Experiments show estimated relationship strength gives higher autocorrelation and better classification perfor-mance
23
Copyright 2010 by CEBT
Discussions
General model to estimate relationship strength
Easy to apply specific domain knowledge
β Just define similarity of two users and interaction distributions
But, Experiment is something weird
No comparison to other state-of-the-art techniques
β There is only comparison to raw data
Similarity function is too simple
β Considering the recent techniques
24
Thank you
25