A Visual Analysis Approach for Community Detection of ... · to community detection, like community detection with overlapped communities[25], discovery of communities of speciﬂc

Ma YX, Xu JY, Peng DC et al. A visual analysis approach for community detection of multi-context mobile social networks.

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 28(5): 797–809 Sept. 2013. DOI 10.1007/s11390-013-

1378-5

A Visual Analysis Approach for Community Detection of Multi-Context

Mobile Social Networks

Yu-Xin Ma1 (马昱欣), Jia-Yi Xu2 (徐佳逸), Di-Chao Peng1 (彭帝超), Ting Zhang2 (张婷)Cheng-Zhe Jin3 (金呈哲), Hua-Min Qu4 (屈华民), Member, IEEE, Wei Chen1,∗ (陈为), Member, CCFand Qun-Sheng Peng1 (彭群生)

1State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China2College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China3Department of Mathematics, Zhejiang University, Hangzhou 310058, China4Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Kowloon

Hong Kong, China

E-mail: [email protected]; [email protected]; [email protected]; [email protected]

[email protected]; [email protected]; {chenwei, peng}@cad.zju.edu.cn

Received May 5, 2013; revised August 2, 2013.

Abstract The problem of detecting community structures of a social network has been extensively studied over recentyears, but most existing methods solely rely on the network structure and neglect the context information of the socialrelations. The main reason is that a context-rich network offers too much flexibility and complexity for automatic or manualmodulation of the multifaceted context in the analysis process. We address the challenging problem of incorporating contextinformation into the community analysis with a novel visual analysis mechanism. Our approach consists of two stages:interactive discovery of salient context, and iterative context-guided community detection. Central to the analysis processis a context relevance model (CRM) that visually characterizes the influence of a given set of contexts on the variation ofthe detected communities, and discloses the community structure in specific context configurations. The extracted relevanceis used to drive an iterative visual reasoning process, in which the community structures are progressively discovered. Weintroduce a suite of visual representations to encode the community structures, the context as well as the CRM. In particular,we propose an enhanced parallel coordinates representation to depict the context and community structures, which allowsfor interactive data exploration and community investigation. Case studies on several datasets demonstrate the efficiencyand accuracy of our approach.

Keywords visual analysis, community detection, multi-context

1 Introduction

The rapid growth of smart phones and mobile so-cial networks[1-2] has brought great challenges to under-stand and analyze such complex networks. To study thestructure of such networks, community detection is oneof the most effective mechanisms, in which the networkis partitioned into strongly intra-connected clusters[3].

Community detection methods can be applied tovarious types of complex networks, including social net-works established by human interactions. Existing so-lutions commonly leverage the information of the net-

work structure to fulfill the task[3-6]. This works wellfor many applications, while may lead to unsatisfac-tory results when the underlying network is influencedby complicated context factors (Fig.1). Unfortunately,existing community detection methods do not performvery well on these datasets. The main reason is that theinfluences from the relevant context are missing in theanalysis process. Note that the context itself is an im-portant part of the data. It is reasonable to regard it asa component of the entire analysis process. Addition-ally, the links (edges) between two nodes of the networkare formed with distinctive context configurations. To

Regular PaperThis work is supported by the National Natural Science Foundation of China under Grant Nos. 61232012, 61202279, the National

High Technology Research and Development 863 Program of China under Grant No. 2012AA12090, the Natural Science Foundationof Zhejiang Province of China under Grant No. LR13F020001, and the Doctoral Fund of Ministry of Education of China under GrantNo. 20120101110134.

∗Corresponding Author©2013 Springer Science +Business Media, LLC & Science Press, China

798 J. Comput. Sci. & Technol., Sept. 2013, Vol.28, No.5

Fig.1. Community detection results of the MIT Reality Mining

dataset[8] using an existing method[9]. The yellow background

indicates the actual community boundary (ground truth); while

the left community is formed by participants from the Computer

Science Department, the right one is composed of students of the

Business school. The colors of nodes represent community de-

tection results. Note that nodes 29, 30, 39, 43, 57, 59 and 86

are wrongly grouped into the Business school community in this

result.

distinguish individual connections between two nodes,corresponding context information has to be employed.Therefore, incorporating the context information intothe community analysis is no doubt a promising wayto improve the accuracy of community detection. Pre-liminary effort that uses time[7] to utilize contextualinformation verifies this observation.

However, taking the advantage of multifaceted con-text information in community detection is quite chal-lenging for two reasons. First, the context items areheterogeneous and complicated, like date, time, loca-tion, call/meet duration, SMS topic, signal strength.The data type can be categorical, ordinal, discrete, orcontinuous. Tackling them using a unified data model ishardly tractable. Yet, it is desirable and feasible to un-cover the significantly relevant context and associatedconfigurations by studying the relationship between thecontextual information and the community structures.As a result, the relevance between the context and com-munity rather than the context itself can be modeledand employed in the analysis process.

On the other hand, multifaceted context informa-tion and its relevance to the community structures aretoo flexible and complex for automatic analysis. Simul-taneously presenting and modulating different types ofcontextual information is very confusing. As a result,comprehending the semantic meanings of relevant con-text requires the participation of human users, who candetermine which context configuration is appropriate

and how the context parameters should be modulated.This actually elicits a visual analysis mechanism, whosemain components are a suite of visual representationsencoding the context, community structures and theirrelevance, plus an integrated visual interface that allowsfor intuitive exploration and iterative analysis.

We argue that integrating contextual informationinto the community analysis process can be fulfilledwith a two-stage pipeline: prior to context-guided com-munity detection, the salient context and context con-figurations are discovered. The kernel of this pipelineis the representation, modeling, visualization and ex-ploration of the context relevance to the communitystructures. It not only determines the discovery of im-portant contextual information in the first stage, butalso guides the clustering process in the second stage.Additionally, an iterative visual reasoning scheme as-sociated with an intuitive visual interface is needed forusers to view and analyze contextual information in theentire analysis process.

In this paper, we describe a novel visual analysisapproach, which visually embodies rich context infor-mation into the community detection process and leadsto a new interactive community detection tool with thefollowing contributions:• a context relevance model (CRM) that answers

the questions like which context items are significantlyaffecting the community structures, and how a givencontext item should be configured to yield an optimaloutcome;• a suite of visual representations and an integrated

visual interface including an advanced parallel coordi-nates representation that allows for interactive data ex-ploration and community quality examination;• an iterative visual reasoning algorithm that con-

verts and incorporates the user’s findings under specificcontext settings into a priori knowledge in the reason-ing process.

The remainder of this paper is organized as follows.Section 2 summarizes previous work. We describe thekey idea in Section 3. Section 4 and Section 5 explainthe approach details. Experimental results are given inSection 6, followed by the conclusion in Section 7.

2 Related Work

2.1 Community Detection

The problem of partitioning a network graphinto multiple parts has been extensively studied andwidely used in parallel computing, circuit partition-ing and layout. Representative approaches include thegraph cut[3], the hierarchical clustering[10] and spectralclustering[11].

Yu-Xin Ma et al.: Visual Analysis of Community Detection 799

Most of recent efforts are devoted into discoveringcommunity structures by considering the characteris-tics of the network. Existing solutions can be classifiedinto four categories: divisive, optimization-based, spec-tral, and dynamic mechanism. For a more completesurvey, please refer to [12].

A straight-forward way to cluster a graph is to de-tect and remove the edges that connect nodes of differ-ent communities. The well-known divisive algorithm[4]

measures the likelihood of an edge bridging two com-munities with the betweenness, which denotes the num-ber of times that all shortest paths between every nodepairs run along the edge. The edges with high between-ness are sequentially removed until the graph becomesdisconnected. This method was later modified[13-14] forimproving the community quality and efficiency. Thisdetect-and-removal scheme requires the computation ofpairwise shortest path, and can hardly scale up to large-sized networks.

Another category defines the quality metrics of acommunity and computes an optimal classification tomaximize the quality metrics of the generated struc-tures. The most popular one is the modularity [5]

that measures the density of links inside and be-tween communities. The recently developed modula-rity algorithm[9] initializes each node as a community,and recursively merges neighbored communities for amaximum gain of modularity. The simplicity of the ap-proach yields high efficiency and scalability.

The spectral approaches leverage the spectral pro-perties of the matrix representation to fulfill thepartition[15]. A common way is spectral graph clus-tering that employs the eigenvectors of Laplacianmatrices[16]. The Newman-Girvan modularity can alsobe optimized by using the eigenvectors of the modula-rity matrix[17].

Alternatively, one can employ an evolving mecha-nism to steer the dynamic discovery of the communitystructures. For instance, the random walker algorithmis adopted to detect densely connected nodes[18-19].Likewise, the Spin[20] and Synchronization[21] modelsare capable of iteratively forming a stable communitystructure. There are also models that allow communi-ties to overlap with each other to fit in the requirementof specific applications[22]. [23] presents a multiobjec-tive immune approach to solve community detectionproblem in dynamic networks.

Although not explicitly claimed, there are some pre-liminary work that incorporates certain context into thecommunity analysis process and achieves higher analy-sis quality[6]. For instance, time is taken special care ina time-varying social network[7]. A later study[24] in-troduces the aging factor to characterize the evolutionof social network relations.

There are some other issues and applications relatedto community detection, like community detection withoverlapped communities[25], discovery of communitiesof specific type[26], and summarization of large-scaledatabase schema[27].

2.2 Network Visualization

It is a great challenge to understand a dynamicallyevolving and context-rich community using an auto-matic analysis approach, as witnessed by numerouscommunity analysis literatures. Visualization empow-ers users with the capability of understanding complexdata through an interactive visual interface. In particu-lar, visualizing a network graph is a commonly usedtechnique to facilitate gaining insight on the networkstructure.

The node-link representation of a network intuitivelydepicts relations among nodes. It can be found inalmost all graph visualization toolkits like Pajek[28],Vizster[29], NodeXL[30] and Gephi[31]. One challengetowards effective exploration of large-sized network rep-resentation is the removal of visual clutter. Solutionsfor that include force-directed graph layout[32], edgebundling[33], and interactive graph exploration[34].

In summary, many methods have been studied in thefield of community detection and visualization of socialnetwork data. There is also initial effort to take seman-tic context into account for community detection[6,24]

and for social network analysis[35]. Yet, it remains anopen problem to incorporate heterogeneous context in-formation into the community analysis through a com-prehensive visual interface.

3 Key Idea

Let us denote a network graph as G = {V, E, CV }.Each edge ei ∈ E is associated with a context vectorcvi ∈ CV , which records a set of context items. Therecan be multiple edges for each node pair vi and vj (e.g.,e1 and e2 between v1 and v2 in Fig.2), of which eachrelates to a distinctive context item.

3.1 Context Relevance Model

The problem of multi-context community detectioncan be defined as finding a non-overlapping partition ofG (illustrated with the sets in yellow in Fig.2), whichclusters the graph into meaningful sub-communitieswith respect to CV. Similar to the adjacency matrixrepresentation A of graph G, we can reformulate thepartition result with a community matrix C, whose cellc(i, j) ∈ {0, 1} indicates if the i-th and j-th nodes be-long to the same community. The community matrixrepresentation allows for efficient modeling of the in-fluence of the context on the variation of communitystructures.


Fig.2. Illustration of a context-rich network. The community ma-

trix C encodes the community detection result with respect to

the context, which contains the date, time, location and meeting

duration of the Bluetooth sighting records.

The context CV is the spanned space of a sequence ofcontext items: CV = CT 1×CT 2×· · ·CTM−1×CTM ,where each context item CT j relates to a specific typeof context shared by all cvi entries such as time andlocation. The context configuration of a context itemis the allowed values or range of the item. For exam-ple, the context configuration of the time-of-day itemis the interval from “00:00” to “23:59”. The contextrelevance (saliency) denotes 1) the saliency of eachcontext item, and 2) the salient configuration on agiven context item with respect to the underlying net-work. We define the context relevance model, as a tupleCRM = 〈G,L, P, F, T 〉, where• G = {V, E,CV }, V is the set of nodes, E is the set

of edges, CV is composed of multiple context items;• L is a set of labels, and P is a conventional labeling

function from V⋃

E into L.• F is a context filtering function that maps CV to

its subset.• T is a context relevance function that associates

each item of CV with a saliency and a salient configu-ration.

The definition of CV varies with different applica-tions. In general, a context item CT j can take a cate-gorical, numerical or ordinal data type. As a result,one of its instances, ctj

i , has a specific definition do-main, e.g., a date item ranges from Jan. 1 to Dec. 31.The labeling function P and the label set L denote thepartition function on the vertices and edges V

⋃E and

the graph partitioning result, respectively. F generatesa subset of CV by imposing constraints on selected con-text items. T assigns each context item an importancetag and suggests a configuration for each item. It im-plicitly encodes the context relevance with respect tothe network G.

With a given G = {V, E,CV }, context-guided com-munity detection seeks to compute an optimal P withthe constraints from salient context. This elicits two

challenging problems for constructing an appropriateCRM, that is,• How to determine the salient context items and

their configurations by optimizing the relevance func-tion T?• How to incorporate the salient context into the

visual reasoning process of community detection P?

3.2 Pipeline

Our approach consists of two stages that address theabove problems sequentially (see Fig.3). The challengefor the first problem lies in that a multifaceted con-text causes mixed influences on the community and itsstructure, and even the human users do not have suffi-cient knowledge about their inter-connections. Accord-ingly, we take a trial-and-error process to generate alist of filtered context items {cvi} by examining dif-ferent F . Each filtered context yields a subset of G:G∗ = (V, E∗,CV ∗), where E∗ is determined by CV ∗.The similarities among different G∗ imply the proximi-ties of various context configurations. Then an optimalT can be found by visually exploring the characteris-tics of the collection of different G∗. Subsection 4.1 andSubsection 4.2 describe the computation of the proximi-ties and the determination of salient context based onthe variation of the proximities, respectively.

Fig.3. Conceptual overview of our approach.

On the other hand, conventional community detec-tion function P does not use the context information.By the context relevance model, the found context rele-vance function T is used as the initial condition ofthe community detection. Subsequently, the knowledgeabout the context is progressively discovered by fur-ther modulating F as well as the user’s reasoning onthe community structures. Both of them are incorpo-rated into the community detection function P withinan iterative visual reasoning process. The visual rea-soning scheme and the user interaction are presented inSection 5.

3.3 Visual Interface

We design a visual interface to accommodate thevisualization of the underlying network, visual analysis


of salient context, as well as the visual reasoning of thecommunity structures. As seen in Fig.4, three mainpanels are coordinated with the brushing and linkingoperations. In particular, the middle panel depicts thenodes, edges, and explored community structures withthe Bubble Set technique[36]. The user can freely select,highlight and move a node, or specify constraints for agiven set of nodes. The right panel visualizes the com-puted community proximities with a matrix view anda curve view. The bottom panel shows an enhancedparallel coordinates view that links the network withindividual context items. The detailed visual designfor each feature is presented in the following section.

4 Discovering Salient Context

In our approach, the relevance of a context is definedas the combination of two parts: the saliency of eachcontext item and the significant configurations for eachcontext item. Discovering salient context is identical tofinding the most salient context items and their mostsignificant configurations.

4.1 Context Proximity

Initially, the significance of each context item is setas identical, and its context configuration is the fullset of all instances. Directly comparing context items

whose semantics are distinctive is intractable. Instead,we compute the proximities among their influences onthe community structures.

For each context item, a sequence of subsets canbe automatically obtained by sampling the range of itscontext configuration if the configuration is numerical,or selecting one of its categories if it is categorical. Thenumber of sampled subsets K is an adjustable para-meter that controls the sampling granularity.

By choosing a subset of the context configurationof a context item, and keeping other context items un-changed, the input context is modified. Thus, we canget K modified context {CV k} with respect to eachcontext item. Each context CV k yields a new graphGk by filtering the edges E of the input graph G withCV k.

Further, a graph partition can be obtained by apply-ing the conventional community detection algorithm[9]

P to each graph Gk. The result, Lk, can be repre-sented with a community matrix Ci as defined in Sub-section 3.1.

We define the difference matrix Diff(Ci,Cj) be-tween Ci and Cj as the union of the exclusive disjunc-tion (XOR) of element pairs:

Diff(Ci,Cj) = Cij.= Ci ⊕Cj , (1)

Fig.4. Visual interface consisting of three main panels. The Cambridge Bluetooth sighting dataset[37] is demonstrated.


where a cell c(a, b) ∈ Ci ⊕Cj is 0 if a and b have thesame community label, and is 1 otherwise. The more0’s in Cij , the higher similarity between Ci and Cj .

The density of Cij , ρ(Cij), is regarded as an indica-tion of the proximity between two community detectionresults Li and Lj :

ρ(Cij) =CountNoneZero(Cij)

N2, (2)

where N is the node number of G, and ρ(Cij) ∈ [0, 1]is the count of none-zero elements divided by the to-tal number of elements of Cij . If ρ(Cij) is 0, Ci andCj perfectly match, i.e., the two corresponding con-text configurations are semantically equal. On the con-trary, a relatively large value of ρ(Cij) suggests thatthe two configurations are distinctive in terms of theirassociated community structures.

Finally, we assemble {ρ(Cij)} into a K × K ma-trix R, called the context proximity matrix. Note thata context proximity matrix can be generated for eachcontext item. The set of context proximity matricesactually characterize the influences of a set of contextitems on the variation of the detected communities, andcan be used to analyze the context relevance. Fig.5 il-lustrates the process of creating R.

Fig.5. Computing the context proximity. Different context con-

figurations resulted from configuration sampling create a set of

graphs G1, G2, . . . Gk. Ci denotes the community matrix asso-

ciated with Gi. The difference between Ci and Cj is encoded

with ρ(Cij). The collection of {ρ(Cij)} constructs a context

proximity matrix R.

4.2 Visual Analysis of Context Relevance

The context proximity matrices facilitate the studyof the following questions: 1) which context items arethe most relevant ones? 2) how can the context con-figurations be set to improve the community analysis?The discovery process should be steered by the usersbecause the semantics of a context may be very fuzzyand complicated.

4.2.1 Visual Design

Three visual designs are employed to illustrate thecontext proximity.

A context proximity matrix is visualized as a con-text proximity map, of which each element is encodedwith a patch in blue. The luminance of each patchis transformed from the proximity: Luminace = (1 −ρ(Cij)) × 255. In this way, ρ(Cij) with large values(high influences) are mapped to dark colors.

Fig.6 visualizes the context proximity matrices offour context items of the Cambridge Bluetooth sightingdataset[37]. The visualizations are generated based on1) meeting-duration, 2) hour-of-day, 3) time-line and4) day-of-week context items, respectively. The visua-lization of the hour-of-day item reveals that the com-munity structure alters dramatically with the change ofhour-of-day. This provides a visual cue that the meet-ing duration is not a significant context in this dataset.

Fig.6. Visualizations of context proximity matrices of the Cam-

bridge dataset. (a) Meeting-duration. (b) Hour-of-day. (c)

Time-line. (d) Day-of-week.

When all the context items are ordinal or nume-rical, we can sequentially plot the context proximitiesρ(Ci,i+1) of two adjacent community matrices Ci andCi+1, yielding a context variation curve (see Fig.7(a)).If some point i on the context variation curve takes arelatively high value, it indicates that the communitystructures change significantly when the correspondingcontext configuration changes from i to i + 1. This isa visual hint for the modulation of the context filteringfunction F . The view of the context variation curve


can be coordinated with other views for synchronizedcontext adjustment.

Fig.7. (a) Context variation curve and (b) its circular formula-

tion, generated based on the hour-of-day context item from the

Cambridge dataset. In this example, the community structures

changed abruptly at 1 pm.

In some cases, an ordinal or numerical context itemforms a periodical loop. For example, the first houris adjacent to the last hour in the hour-of-day contextitem. Accordingly, we plot the context variation curveby using a circular coordinate, and complete the loopby linking ρ(Ck−1k) and ρ(C12). Fig.7(b) shows anexample.

In summary, the context proximity map is suitablefor all data types of context items. When the contextitems are numerical or ordinal, the context variationcurve can be additionally employed.

4.2.2 Visual Analysis of Salient Context

The visual design of the context proximity map andthe context variation curve allow for interactive investi-gation on the context relevance in the following aspects:

Significance of Context Items. The global darknessof the context proximity matrix visually encodes thesignificance of the corresponding context item.

Salient Pattern. Besides the significance of the cor-responding context item, the context proximity matrixmay also contain some interesting patterns that revealhow the community structures change with the varia-tion of a given context. For instance, the matrix inFig.6(b) (the context item of hour-of-day) depicts alight blue block on the main diagonal, which corre-sponds to the time interval (6 am∼12 pm). This visua-lization suggests that the calculated community struc-tures are very similar within this time interval.

Significant Context Configuration. For an ordinal ornumerical context item, detecting salient context con-figurations is identical to finding an interval where thecommunity structures change dramatically. For exam-

ple, for a “family-and-friends” mobile social networkcaptured by the Bluetooth sightings, we may reasonthat the sightings recorded during the night bettermatch the true friend communities. However, we donot know the exact time point that marks the start of“night” in the dataset. We rely on the context varia-tion curve to locate the important point or interval.This can be fulfilled by finding relatively high points orinterval in the curve.

5 Multi-Context Community Detection

In addition to the discovered context relevance, theknowledge and experiences of the users are importantsources for understanding the community structures.As such, multi-context community detection is an itera-tive process that progressively discloses and integratesthe a priori knowledge[38] towards the refinement of thecommunity structures. Meanwhile, the iterative rea-soning process should be accomplished within a visualinterface.

Often, the understanding of a specific context fromthe users is single-faceted and fragmented. Effectivemining of a priori knowledge can be regarded as thereconstruction of the community structures from theunderstandings of their basic primitives. Thus, we de-fine three types of a priori knowledge in the context ofcommunity structures.• Must-link specifies the constraint that a set of

nodes must have the same label. It is a transitive con-straint.• Cannot-link specifies the constraint that a set of

nodes must have different labels. It is a non-transitiveconstraint.• Closure-link specifies the transitive closure con-

straint based on the must-link and the cannot-link. Ifvi and vj have a must-link constraint, and vj cannotlink to vk, then vi cannot link to vk. This constraintis used to determine the relationship of nodes withoutexplicit knowledge.

5.1 Iterative Visual Reasoning with A PrioriKnowledge

The visual reasoning for multi-context communitydetection takes the underlying graph and detectedsalient context in the previous stage as the input. Itskey task is the progressive mining of a priori know-ledge and the usage into the discovery of communitystructures. In particular, two sorts of context discoverymethods are employed: the detected salient contextwith respect to the context relevance model; the contextconfigurations interactively specified by the users in theiterative process. They need to be transformed into apriori knowledge within the visual reasoning process.


We design a recursive loop (see Fig.8) that empow-ers the detection of community structures with a prioriknowledge mined from the context. The loop consistsof three steps.

Interactive Context Filtering. At the beginning ofthe loop (Fig.8(a)), the users interactively perform con-text filtering function F guided by the computed con-text relevance model in the previous stage. On the onehand, the significance of each context item can be usedto determine the most relevant context items. On theother hand, the salient patterns and significant contextconfigurations can assist the users to choose the mostsalient parts of individual context items. After filter-ing enough context information that represents mean-ingful semantics, the user can stop the filtering processand move to the next step. By leveraging multi-contextcommunity detection described below, the context rele-vance is converted to a priori knowledge.

Multi-Context Community Detection. Thereafter,an improved multi-context community detection is per-formed (Fig.8(b)), yielding a graph partition that re-flects the user’s reasoning on the context. The algo-rithm is modified from the algorithm proposed in [9].We choose it because it achieves a good balance betweenthe computational complexity and accuracy. The algo-rithm initializes each node as an individual community,and recursively merges them from bottom to top. Thestandard merging criterion is a metric, called modula-rity [14]. It is a scalar value between −1 and 1 that mea-sures the density of intra-community links as opposedto inter-community connections:

Q =1

2|E|∑

i,j

(aij − kikj

2m

)cij , (3)

where aij is the (i, j) cell of matrix A and |E| the totalnumber of edges in the input graph. ki =

∑j aij is the

summation of the weights of all edges connected to nodevi. cij denotes the (i, j) cell of the community matrix,

as defined in Subsection 3.1. The metric “modularity”is widely accepted as a standard of community qualityand serves as an optimization objective function[12].

We modify the algorithm by imposing a priori know-ledge in two aspects. First, the nodes represented by aMust-link constraint are forced to be in a community.Second, a cannot-link constraint is imposed to modify(3): in the case of any cannot-link relation between apair of nodes belonging to community i and j sepa-rately, the value of modularity gain is overruled to −1,which prevents them from being in the same commu-nity.

Interactive User Adjustment. After the commu-nity detection, the users can directly specify must-link,cannot-link or closure-link in the detected communitystructures (Fig.8(c)). The modifications to the graphare kept and used as the input in the beginning of thenext iteration (Fig.8(c)). The loop naturally eliminatespossible conflicts of setting context configurations, andrecursively achieves the optimal context configurationswith the mining of a priori knowledge.

5.2 Visual Design

A suite of visualizations is designed to enable the un-derstanding of the complicated community structuresas well as the saliency of the context.

We use the node-link representation to visualize theunderlying graph. Each node is colored with respectto its label. The a priori knowledge is visualized onthe graph. The Bubble Sets technique[36] is adaptedto visually encode the must-link and cannot-link con-straints, as shown in Fig.9.

We propose an enhanced parallel coordinates plots(PCPs) representation to visually encode the context,the network, and their relevance. The multifaceted con-text essentially forms a multi-dimensional space, whereeach context item denotes a dimension of the parallelcoordinates. The context vector cvi associated with

Fig.8. Pipeline of iterative visual reasoning for community detection.


Fig.9. Visualizing the must-link constraint (in green) and the

cannot-link constraint (in red) with the Bubble Sets technique.

each edge ei is a point in the multidimensional space,or say, a plotted polyline in the PCPs. Taking the MITReality Mining dataset[8] as an example, each polylineencodes a Bluetooth sighting record, which spans theaxes of the source node, the target node, date, time,day of week and the recorded sighting length.

To encode the relevance between the context and thenetwork, a node-link graph representation is embeddedin the PCPs view (Fig.10). It replaces the two axescorresponding to the source and target nodes, and illus-trates the entire information of the network, include thenodes, the edges, the labels of nodes in colors. The ad-vantage of introducing the graph view as an additionalcoordinate is twofold. On the one hand, the users donot need to mentally link a node from the axes of thecontext items to the nodes or edges, which relieves theworkload in the memory. On the other hand, the effi-ciency of visual pattern discovery is greatly enhancedby means of a suite of interactions like selection, brush-ing, and linking.

Fig.10. Visualizing the relevance of the context and the network

with an enhanced parallel coordinates representation.

5.3 User Interactions

The visual interface (see Fig.4) incorporates a suiteof visualizations for the context, the network and thecontext relevance, and provides a comprehensive envi-ronment for the visual reasoning process. The interfacealso facilitates a list of interactions.

The node-link view locates in the main panel of theinterface. In addition to standard graph explorationoperations like selection, labeling, dragging, and zoom-ing, the users can re-layout the graph with respect tocommunity structures, or re-color the partitions.

The enhanced PCPs view allows for the followinginteractions:• Filtering. Each context item can be individually

selected or filtered by specifying the interval in the cor-responding axis.• Selection. Interesting nodes, edges, or context

vectors can be brushed and highlighted. In this way,the quality of community structures can be studied byvalidating a detected community and highlighting theintra-community connections on the PCPs view. Thehighlighted curves may show visible distribution pat-terns on some axes, which visually indicate the com-munity quality.• Switching Axes. The order of axes can be manually

changed by dragging and placing the axes. Significantcontext items can be emphasized by reordering the axesof the context items.• Coordinated View. In the runtime, all operations

on the enhanced PCPs view, the node-link view, andthe context relevance view are kept consistent, namely,an operation on a view is immediately applied in otherrelated views.

A particular interaction mode is designed to find themultilateral nodes. A multilateral node has a numberof connections to nodes in more than two communities.Determining the labels of the boundary nodes by em-ploying certain metrics may lead to inaccurate results.Moreover, it is sensible to small changes on the com-munity structures.

By user interactions, there are two tasks: finding themultilateral nodes and manually specifying their labelsaccording to the semantics of related context. First,the locations of the multilateral nodes are disclosedwith the node-link view because these nodes typicallylay on the boundary among communities. In any case,the users can always highlight the links of a node (seeFig.11) to study whether it connects multiple commu-nities. Thereafter, the user can select a multilateralnode on the enhanced PCPs view to highlight all linkededges, and easily discover the distribution of associatedcontext vectors to reason its real label.

6 Case Study

We implement our system based on Gephi[31], anopen source graph visualization framework. Through-out the paper, we have been using the Cambridgedataset[37] as an illustration example (Figs. 4, 6 and 7).It contains 10 thousand Bluetooth sightings between 36


Fig.11. Illustration of the identification of multilateral nodes.

human participants, collected within a period of 12days. However, this context-rich dataset does notcome with any ground truth. On the other hand, inthe community detection research field, there are afew computer-generated graphs with purposely embed-ded community structures as ground truth, which arewidely used as benchmarks for testing. Unfortunately,these graphs do not have any context information. Inthis section, we use the MIT Reality Mining dataset[8]

with both rich context and ground truth for testing andvalidation.

In the MIT Reality Mining study[8], 97 Bluetooth-enabled smart phones were deployed to students andstaff members at MIT over a period of 9 months. These

phones logged over 100 thousand contacts with eachother, each of which contains two time stamps markingthe start and end time of the sighting. We extract 4 con-text items from the raw data: time-line, day-of-week,hour-of-day and sighting-duration. They are of ordinaltypes. The ground truth of the experiment is that 27participants claimed themselves as students from theBusiness School, and 60 are identified as students orstaff from the Media Lab of the Computer Science De-partment. Other 10 participants did not identify them-selves. We discard related data of these 10 participants.

Fig.12(a) shows the result with a traditional commu-nity detection algorithm[9] (other methods such as [4]yield similar results). Of the detected two communi-ties, nodes 29, 30, 39, 43, 57, 59 and 86 are wronglyclassified, resulting in a correct rate of 92.0%.

First of all, we create the context relevancemodel and generate the context proximity map (seeFigs. 12(e)∼12(h)). The following patterns are re-vealed:• Three context items have equal significance.• The community structures corresponding to two

intervals (2 am∼7 am and 10 am∼5 pm) of the hour-of-day context item are similar.• The day-of-week context item contains two differ-

ent community structures with respect to weekdays andweekends.

Fig.12. Results for the MIT Reality Mining dataset. (a) Input graph and two communities detected by a popular approach[9]. The

red dashed line indicates the real community structures (2 communities) according to the ground truth. (b) Filtering the context

of the input graph with the filtering function derived from (e)∼(h) yields a graph with fewer edges. (c) Applying the multi-context

community detection to the filtered graph (b) yields 4 communities. (d) Final result after an iterative visual reasoning process. (e)∼(h)

Context relevance model that is used to uncover significant context. (i) A multilateral node’s relations with different communities are

highlighted on the enhanced parallel coordinates plots view.


• For the meeting-duration context item, the com-munity structure corresponding to short meeting du-rations is noticeably different from the ones of longerdurations.

This analysis facilitates the following reasoningabout the individual context configurations. Concern-ing the hour-of-day context item, the time interval of10 am∼5 pm is the period when the participants inter-act with their colleagues and friends. This is not truefor the interval of 2 am∼7 am. For the day-of-week con-text item, it is reasonable to assume that the studentsand staff meet with their communities more frequentlyon weekdays than weekends. Finally, the observation onthe meeting-duration context item indicates that we canremove records with very short meeting durations be-cause they normally indicate random meeting on cam-pus.

The reasoning based on the context proximity mapresults in three context filtering functions: hour-of-day is set to be 10 am∼5 pm; day-of-week is set to beMon∼Fri; meeting-duration is set to be longer than 5minutes. By simultaneously applying these context fil-ters, a graph with fewer edges is formed (see the edgesin Fig.12(b)). By further applying the improved com-munity detection algorithm guided by the filtered con-text, the graph is partitioned into four communities,as colored in Fig.12(c). Note that the red node 67 isdisconnected from other nodes, indicating that this par-ticipant was not seen by other persons during the nor-mal school hours for more than 5 minutes in the entireperiod.

If we reformulate the detected four communities asthe four must-link constraints and pass this knowledgeonto the next round, an iterative process finally clas-sifies the graph into two communities, as illustrated inFig.12(d). In the result, only the node 67 is wronglyclassified, yielding a correct rate of 98.9%.

We further investigate the outlier node 67. It wasnot seen by others during the school hours for morethan 5 minutes. Fig.13 highlights its relations withother nodes with colored lines. The lines are displayed

Fig.13. Highlighting the connections of node 67 in the enhanced

PCPs view.

with the same color of the other nodes who saw or wereseen by node 67. From the enhanced parallel coordi-nates visualization, we can see that node 67 was seenon a few close-by dates. We may also find that thisnode met with some computer science participants onMonday and Tuesday nights, and some Business Schoolstudents on a Friday afternoon.

7 Conclusions

In this paper, we have presented a visual analysisbased framework for community detection with respectto context information. We formulated the problem ofcontext-aware community detection and proposed seve-ral visual analysis mechanisms to help users compareand understand different aspects of heterogeneous con-text information. We performed a few case studies onreal-world mobile social networks and showed that ourapproach can take the advantages of meaningful con-text information to achieve better discoveries on thecommunity structures.

One limitation of our method lies in that it can-not perform combined relevant context discovery formultiple context parameters at the same time. Forexample, currently it is not possible to automaticallydiscover that the Bluetooth sighting recorded on Sun-days (time) at a local church (location) is of greaterimportance than the others. Another future researchdirection could be exploring the parallel multi-contextcommunity detection pipeline. We plan to extend ourwork in the directions mentioned above.

References

[1] Cho E, Myers S A, Leskovec J. Friendship and mobility: Usermovement in location-based social networks. In Proc. the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, August 2011, pp.1082-1090.

[2] Scott J, Gass R, Crowcroft J et al. CRAWDAD meta-data: Cambridge/haggle/imote/infocom2006 (v. 2009-05-29).http:// tinyurl.com/6ugh4nz, August 2013.

[3] Kernighan B W, Lin S. An efficient heuristic procedure forpartitioning graphs. Bell System Technical Journal, 1970,49(2): 291-307.

[4] Girvan M, Newman M E J. Community structure in social andbiological networks. Proc. the National Academy of Sciencesof the United States of America, 2002, 99(12): 7821-7826.

[5] Newman M E J. Fast algorithm for detecting communitystructure in networks. Physical Review E, 2004, 69(6):066133.

[6] Traag V A, Bruggeman J. Community detection in networkswith positive and negative links. Physical Review E, 2009,80(3): 036115.

[7] Tantipathananandh C, Berger-Wolf T, Kempe D. A frame-work for community identification in dynamic social networks.In Proc. the 13th ACM SIGKDD Int. Conf. KnowledgeDiscovery and Data Mining, August 2007, pp.717-726.

[8] Eagle N, Pentland A S, Lazer D. Inferring friendship networkstructure by using mobile phone data. Proc. the NationalAcademy of Sciences of the United States of America, 2009,


106(36): 15274-15278.

[9] Blondel V D, Guillaume J L, Lambiotte R et al. Fast unfold-ing of communities in large networks. Journal of StatisticalMechanics: Theory and Experiment, 2008, 2008(10): P10008.

[10] Hastie T, Tibshirani R, Friedman J. The Elements of Statis-tical Learning: Data Mining, Inference, and Prediction (2ndedition). Springer, 2009.

[11] Donetti L, Munoz M A. Detecting network communities: Anew systematic and efficient algorithm. Journal of StatisticalMechanics: Theory and Experiment, 2004, 2004: P10012.

[12] Fortunato S. Community detection in graphs. Physics Re-ports, 2010, 486(3/5): 75-174.

[13] Chen J C, Yuan B. Detecting functional modules in the yeastprotein–protein interaction network. Bioinformatics, 2006,22(18): 2283-2290.

[14] Newman M E J, Girvan M. Finding and evaluating commu-nity structure in networks. Physical Review E, 2004, 69(2):026113.

[15] Mitrovisc M, Tadisac B. Spectral and dynamical properties inclasses of sparse networks with mesoscopic inhomogeneities.Physical Review E, 2009, 80(2): 026123.

[16] Garrido P L, Marro J, Munoz M A (eds.). Modeling Coopera-tive Behavior in the Social Sciences: Eighth Granada Lectureson Modeling Cooperative Behavior in the Social. AmericanInstitute of Physics, 2005.

[17] Simonsen I. Diffusion and networks: A powerful combination!Physica A: Statistical Mechanics and Its Applications, 2005,357(2): 317-330.

[18] Backstrom L, Leskovec J. Supervised random walks: Predict-ing and recommending links in social networks. In Proc. the4th ACM International Conference on Web Search and DataMining, February 2011, pp.635-644.

[19] Zhou H. Distance, dissimilarity index, and network commu-nity structure. Physical Review E, 2003, 67(6): 061901.

[20] Ronhovde P, Nussinov Z. Local resolution-limit-free pottsmodel for community detection. Physical Review E, 2010,81(4): 046114.

[21] Lambiotte R, Delvenne J C, Barahona M. Laplacian dynamicsand multiscale modular structure in networks. ArXiv e-printsarXiv:0812.1770, http://arxiv.org/abs/0812.1770, May 2013.

[22] Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the over-lapping community structure of complex networks in natureand society. Nature, 2005, 435: 814-818.

[23] Gong M G, Zhang L J, Ma J J, Jiao L C. Community de-tection in dynamic social networks based on multiobjectiveimmune algorithm. Journal of Computer Science and Tech-nology, 2012, 27(3): 455-467.

[24] Chan S Y, Hui P, Xu K. Community detection of time-varyingmobile social networks. In Proc. the 1st Int. Conf. ComplexScience, Feb. 2009, pp.1154-1159.

[25] Wu Z H, Lin Y F, Gregory S, Wan H Y, Tian S F. Balancedmulti-label propagation for overlapping community detectionin social networks. Journal of Computer Science and Tech-nology, 2012, 27(3): 468-479.

[26] Wan H Y, Lin Y F, Wu Z H, Huang H K. Discovering typedcommunities in mobile social networks. Journal of ComputerScience and Technology, 2012, 27(3): 480-491.

[27] Wang X, Zhou X, Wang S. Summarizing large-scale databaseschema using community detection. Journal of ComputerScience and Technology, 2012, 27(3): 515-526.

[28] Batagelj V, Mrvar A. Pajek-program for large network anal-ysis. Connections, 1998, 21(2): 47-57.

[29] Heer J, Boyd D. Vizster: Visualizing online social networks.In Proc. IEEE Symposium on Information Visualization, Oc-tober 2005, pp.32-39.

[30] Smith M A, Shneiderman B, Milic-Frayling N et al. Analyz-ing (social media) networks with NodeXL. In Proc. the 4th

International Conference on Communities and Technologies,June 2009, pp.255-264.

[31] Bastian M, Heymann S, Jacomy M. Gephi: An opensource software for exploring and manipulating networks. InProc. International AAAI Conference on Weblogs and So-cial Media, May 2009, http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154/1009, Aug. 2013.

[32] Hu Y. Efficient, high-quality force-directed graph drawing.Mathematica Journal, 2005, 10(1): 37-71.

[33] Holten D. Hierarchical edge bundles: Visualization of adja-cency relations in hierarchical data. IEEE Transactions onVisualization and Computer Graphics, 2006, 12(5): 741-748.

[34] Wong N, Carpendale S, Greenberg S. Edgelens: An inter-active method for managing edge congestion in graphs. InProc. the 9th IEEE Symp. Inform. Visualization, October2003, pp.51-58.

[35] Martins R M, Andery G F, Heberle H et al. Multidimensionalprojections for visual analysis of social networks. Journal ofComputer Science and Technology, 2012, 27(4): 791-810.

[36] Collins C, Penn G, Carpendale S. Bubble sets: Revealing setrelations with isocontours over existing visualizations. IEEETransactions on Visualization and Computer Graphics, 2009,15(6): 1009-1016.

[37] Leguay J, Lindgren A, Scott J, Friedman T, Crowcroft J.Opportunistic content distribution in an urban setting. InProc. the 2006 SIGCOMM Workshop on Challenged Net-works, September 2006, pp.205-212.

[38] Wagstaff K, Cardie C. Clustering with instance-level con-straints. In Proc. the 17th Int. Conf. Machine Learning,June 29-July 2, 2000, pp.1103-1110.

Yu-Xin Ma is a Ph.D. candidatein the State Key Lab of CAD&CG atZhejiang University, Hangzhou. Heearned the B.S. degree in software en-gineering from Zhejiang University in2012. His research interests are vi-sual analytics and information visua-lization.

Jia-Yi Xu is an undergradu-ate student in the College of Com-puter Science and Technology, Zhe-jiang University. His research inter-est is information visualization.

Di-Chao Peng is a post-doc re-searcher in the State Key Lab ofCAD&CG at Zhejiang University. In2009, he earned the Ph.D. degree incomputer science from the Univer-sity of North Carolina at Charlotte,USA, where he was a member of theCharlotte Visualization Center. Hejoined the Visual Analysis Group atthe State Key Lab of CAD&CG, in

2011. His research interests include security visualization fornext generation networks and general visual analysis tech-niques.


Ting Zhang received the B.S.degree in software engineering fromZhejiang University in 2012. Herresearch interests include assistivetechnology, visual analytics and in-formation visualization.

Cheng-Zhe Jin is an undergrad-uate student of Zhejiang University.His research interests are informa-tion visualization and social networkanalysis.

Hua-Min Qu received the B.S.degree in mathematics from Xi’anJiaotong University, China, and theM.S. and Ph.D. (2004) degrees incomputer science from Stony BrookUniversity, USA. He is an associateprofessor in the Department of Com-puter Science and Engineering at theHong Kong University of Science andTechnology. His main research inter-

ests are in visualization and computer graphics. He has co-authored more than 70 refereed papers including 20 papersin the IEEE Transactions on Visualization and ComputerGraphics (TVCG). He is on the steering committee of theIEEE Pacific Visualization Conferences and is an associateeditor of the IEEE Transactions on Visualization and Com-puter Graphics. He received the Honorable Mention for theBest Paper Award at IEEE Visualization 2009 and is a win-ner of 2009 IBM Faculty Award.

Wei Chen is a professor in theState Key Lab of CAD&CG at Zhe-jiang University. From June 2000to June 2002, he was a joint Ph.D.student in Fraunhofer Institute forGraphics, Darmstadt, Germany, andreceived his Ph.D. degree in com-puter science in July 2002. FromJuly 2006 to Sept. 2008, Dr. WeiChen was a visiting scholar at Pur-

due University, working in PURPL. In December 2009, hewas promoted as a full professor of Zhejiang University. Hehas performed research in computer graphics and visua-lization and published more than 60 peer-reviewed journaland conference papers in the last five years. His currentresearch interests include visualization, visual analytics andbio-medical image computing.

Qun-Sheng Peng is a professorin the State Key Lab of CAD&CG,Zhejiang University. He graduatedfrom Beijing Mechanical College in1970 and received his Ph.D. de-gree in computer science from Schoolof Computing Studies, University ofEast Anglia, UK, in 1983. His re-search interests include realistic im-age synthesis, virtual reality, bio-

molecule graphics and scientific visualization. In thesefields, he has authored and coauthored more than two hun-dred journal and conference papers. He is a member of theeditorial board of several international and Chinese journals.

Documents

A Visual Analysis Approach for Community Detection of ... · to community detection, like community detection with overlapped communities[25], discovery of communities of speciﬂc