kernel Spectral Clustering for Community Detection in Complex Networks

From statwiki
Revision as of 07:34, 16 July 2013 by Lxin (talk | contribs)
Jump to navigation Jump to search

Abstract--This paper proposes a kernel spectral clustering approach for community detection in unweighted networks. The authors employ the primal-dual framework and make use of out-of-sample extension. They also propose a method to extract from a network a subgraph representative for the overall community structure. The commonly used modularity statistic is used as a model selection procedure. The effectiveness of the model is demonstrated through synthetic networks and benchmark real network data.

Backgrounds

Network

A network (graph) consists of a set of vertices or nodes and a collection of edges that connect pairs of nodes. A way to represent a network with N nodes is to use a similarity matrix S, which is an [math]\displaystyle{ N\times N }[/math] matrix. This paper only deals with unweighted networks, where the components of S is 1 or 0. The matrix S is called adjacency matrix and [math]\displaystyle{ S_{ij}=1 }[/math] if there is an edge connecting nodes i and j; otherwise, [math]\displaystyle{ S_{ij}=0 }[/math]. The degree matrix D is defined as a diagonal matrix with diagonal entries [math]\displaystyle{ d_i=\sum\limits_{j}S_{ij} }[/math] indicating the degree of node i, i.e. the number of edges connected to node i.

Community Detection

The structure of networks sometimes reveals a high degree of organization. Nodes with similar properties/behaviors are more likely to be linked together and tend to form modules. Discovering such modules is called community detection, which is a hot topic in network data analysis. Moreover, once communities have been detected, the roles of the nodes in each community can be further investigated. The community structure of a graph can be used to give a compact visualization of large networks, i.e. one community can be treated as a big node. This kind of representation is called a supernetwork.

Introduction

Community detection is indeed a clustering problem. Spectral clustering method is a standard technique for clustering, which is based on the eigendecomposition of a Laplacian matrix. A spectral clustering framework as a weighted kernel PCA with primal and dual representations is proposed by <ref>C. Alzate and J. A. K. Suykens, Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 335-347, 2010 </ref>. The main advantage is the extension of the clustering model to out-of-sample nodes. The clustering model can be trained on a small subset of the network and then be applied to the rest of the graph. In other words, it is baded on a predictive model previously learned and can be used for online clustering of huge growing networks. This is new to the field of community detection on networks. Other contributions of the paper includes the following two points. First, a criterion based on Modularity <ref>M. E. J. Newman, Modularity and community structure in networks, Proc. Natl. Acad. Sci. USA, vol. 103, no. 23, pp. 8577-8582, 2006</ref> is used for parameter and model selection. Second, regarding the selection of a small representative subgraph as training set, a method based on Expansion factor <ref>A. S. Maiya and T. Y. Berger-Wolf, Sampling community structure, in Proc. 19th ACM Intl. Conference on the World Wide Web, 2010.</ref> is proposed.

Kernel Spectral Clustering Model

General Picture

The classical spectral clustering involves the study of the eigenspectrum of graph Laplacian matrices, i.e. L=D-S. For kernel spectral clustering model, a graph over the network needs to be build up in order to describe the similarity among the nodes in the kernel-based framework.

Primal-dual formulation

The kernel spectral clustering model is described by a primal-dual formulation. Suppose we have a network with N nodes and an adjacency matrix S. Let [math]\displaystyle{ x_i }[/math] denote the i-th row/column of S.


References

<references/>