Modeling Relationship Strength in Online Social Networks Rongjian Xiang 1, Jennifer Neville 1, Monica Rogati 2 1 Purdue University, 2 LinkedIn WWW 2010

Modeling Relationship Strength in Online Social Networks

Rongjian Xiang1, Jennifer Neville1, Monica Rogati2

1Purdue University, 2LinkedIn

WWW 2010

2010. 08. 13.

Summarized and Presented by Sang-il Song, IDS Lab., Seoul National Uni-versity

Copyright 2010 by CEBT

Introduction – Social Network

Homophily ( 동질성 )

the tendency of individuals to associate and bond with simi-lar others

“Birds of a feather flock together”

Found in many real-world and online social networks

Research Area

Network Structure Analysis

Link prediction – “Who will be my friend?”

Community Detection

Item Recommendation

2


Introduction

Past work has focused on social networks with binary ties.

e.g., friends or not

Binary indicators provide only a coarse indication of the relationship.

Pairs of individuals with strong ties (e.g., close friends) are likely to exhibit greater similarity then those with weak ties (e.g., acquaintances)

Treating all relationships as equal will increase the noise and degrade the performance

Pruning away spurious relationships and highlighting stronger relationship has improved the accuracy of the models.

3


Related Works

I. Kahanda and J. Neviile. Using transactional information to predict link strength in online social networks, ICWSM09

E. Gilbert and K. Karahalios. Predicting tie strength with social media. CHI 09

Binary prediction task

– Strong ties or Weak ties

Supervised learning

– Involved in efforts on human annotations

Friendship Rating

Top friend nomination

4


Goal

A model to infer relationship strength

Based on profile similarity and interaction activity

Automatically distinguishing strong relationships from weak ones

– Unsupervised

Relationship strength is represented as continuous value

– Full spectrum of relation strength, from weak to strong

Scalable approach

– Suitable for online application

5


Assumptions of the Model

The higher the similarity, the stronger the tie

There is many common feature between ‘ 용진’ and me, so we have strong relationship.

Relationship strength directly impacts the nature and frequency of online interactions between a pair users

‘ 청림’ is close with me if he has many chat with me in mes-senger.

The independence of interactions

6


Variables of the Model

Profile: the data of specific user

: profile vector of individual

e.g., school, company, region, industry, job of the user

Interaction: the activity between two users

: occurrences of the interaction between and

e.g., reply, retweet (in twitter)

e.g., tagging the person in a picture, posting one’s wall (in facebook)

Relationship Strength

latent relationship strength

7


Graphical model representation

|)

8

𝑦 1(𝑖𝑗) 𝑦 2

(𝑖𝑗) 𝑦𝑚(𝑖𝑗)

: profile vector: occurrences of the interactionlatent relationship strength

𝐱❑(𝑖) 𝐱❑

( 𝑗 )

𝑧❑(𝑖𝑗 )


Model Specification

Inferring relationship strength from user profile

Using similarity vector

e.g., : 1 if and in the same company, 0 otherwise

e.g., logarithm of the normalized counts of common groups that and join

Adopting the Gaussian distribution

9

𝑃 (𝑧 ( 𝑖𝑗 )|𝐱(𝑖) ,𝐱 ( 𝑗)¿=𝑁 (𝐰𝑇𝐬 (𝐱 𝑖 ,𝐱 𝑗 ) ,𝑣)Weighted sum of

similarity measuresp

0

𝐱❑(𝑖) 𝐱❑

( 𝑗 )

𝑧❑(𝑖𝑗 )

To be esti-mated

z

Blue represents similar two usersRed represents unsimilar two users


Model Specification

Inferring relationship strength from interactions

Modeling all interactions as binary variables

Introducing an auxiliary( 보조 ) variables

– Capturing auxiliary causes of the interactions which are inde-pendent of the relationship strength

– e.g., the total number of pictures that a user has tagged repre-sents their intrinsic tendency to tag pictures

Using sigmoid function

is to be estimated

10

𝑦 1(𝑖𝑗) 𝑦 2


𝑧❑(𝑖𝑗 )

𝜎 (𝑥 )= 1

1+𝑒−𝑥

Weighted sum of auxiliary variables and z


Model Specification

11

𝑦 1(𝑖𝑗) 𝑦 2


𝐱❑(𝑖) 𝐱❑

( 𝑗 )

𝑧❑(𝑖𝑗 )

𝑦 1(𝑖𝑗) 𝑦 2


𝐬❑(𝑖𝑗 )

𝑧❑(𝑖𝑗 )

𝐚1(𝑖𝑗 ) 𝐚2

(𝑖𝑗 )𝐚𝑚

(𝑖𝑗 )

𝑃 (𝐷 ,𝑤 ,𝜃 )=𝑃 (𝐷|𝑤 ,𝜃 )𝑃 ¿


Inference

Find the point estimates, that maximize

Using gradient method

Using Newton-Raphson updates to weight updates

12


Experiment

Two dataset is prepared for experiments

LinkedIn

– Business-Oriented Social Network

– Members can search member profiles and job postings

Facebook Data

13


LinkedIn Dataset

100 seed users and their tow-hop neighborhood (100000 pairs)

Overall similarity

Interaction features

14

1 if and went to same school, 0 otherwise

1 if and work in the same company, 0 otherwise

1 if and are in the same geographical region, 0 otherwise

1 if and are in the same industry, 0 otherwise

1 if and have the same job title, 0 otherwise

1 if and are in the same functional area, 0 otherwise

Logarithm for the normalized counts of common groups that and join

Logarithm for the normalized counts of common connections that and join

1 if and have established a connection, 0 otherwise

1 if has written a recommendation for 0 otherwise

1 if has viewed ‘s profile, 0 otherwise

1 if has included in his or her online LinkedIn address book, 0 otherwise


Evaluation (in LinkedIn Dataset)

Estimating relationship strength with

Job

Functional area

Geographical region

Measuring how well the estimated relationship strengths

Identifying feature values ( same school, same company, same industry)

Measuring the are under the ROC curve (AUC)

Comparing relationship strength to

Recommendation links

Profile view links

Address book links

Connection links

Interaction count

Profile similarity

15


Receiver Operating Characteristic (ROC)

TPR (sensitivity)

eqv. with hit rate, recall

TPR = TP / P = TP / (TP + FN)

FPR

eqv. with fall-out

FPR = FP / N = FP / (FP + TN)

AUC (Area Under ROC Curve)

16


The result on LinkedIn dataset

17


Facebook dataset

5 public Purdue Facebook user and their three-hop neighborhood

4500 nodes and 144,712 pairs

Overall similarity

Not using personal profile data

Interactions

18

logarithm of the normalized counts of common networks for which and are both member

logarithm of the normalized counts of common group that and join

logarithm of the normalized counts of common friends that and share

1 if has posted on ‘s wall, 0 otherwise

1 if has tagged in a picture, 0 otherwise


Evaluation (in Facebook Dataset)

Comparing the relationship strength of the model to other weighted graph

Friendship graph: strong/weak relationships

Top-Friend graph: strong relationships

Wall graph: interactions

Picture graph: interactions

Evaluating

Autocorrelation improvement

Classification improvement

19


Evaluation (in Facebook Dataset) Autocorrelation

Statistical dependency of the same attribute on related instances

– K is the number of possible categorical value of the attribute

– is the observed occurrence

– is the expected occurrence

– If the observed occurrence is increasing, then the autocorrelation is also increasing

– e.g., Geographical region attribute has higher autocorrelation than fa-vorite baseball team attribute in friendship network

Classification performance

The Gaussian Random field (GRF) model is used to classification

20


Autocorrelation improvement

21


Classification improvement

22


Conclusions

A latent variable model for the task of relationship strength estimation

Latent variable model capture the causality of the underly-ing social process

Hybrid approach of generative model and discriminative model

– Not suffering from sparsity of interaction

– The latent variable is inferred using only upper level in model

– Predicting future interactions is also possible

Predicting new connections

Experiments show estimated relationship strength gives higher autocorrelation and better classification perfor-mance

23


Discussions

General model to estimate relationship strength

Easy to apply specific domain knowledge

– Just define similarity of two users and interaction distributions

But, Experiment is something weird

No comparison to other state-of-the-art techniques

– There is only comparison to raw data

Similarity function is too simple

– Considering the recent techniques

24

Thank you

25

Documents

Modeling Relationship Strength in Online Social Networks Rongjian Xiang 1, Jennifer Neville 1, Monica Rogati 2 1 Purdue University, 2 LinkedIn WWW 2010