kernel Spectral Clustering for Community Detection in Complex Networks
Abstract--This paper proposes a kernel spectral clustering approach for community detection in unweighted networks. The authors employ the primal-dual framework and make use of out-of-sample extension. They also propose a method to extract from a network a subgraph representative for the overall community structure. The commonly used modularity statistic is used as a model selection procedure. The effectiveness of the model is demonstrated through synthetic networks and benchmark real network data.
Backgrounds
Network
A network (graph) consists of a set of vertices or nodes and a collection of edges that connect pairs of nodes. A way to represent a network with N nodes is to use a similarity matrix S, which is an [math]\displaystyle{ N\times N }[/math] matrix. This paper only deals with unweighted networks, where the components of S is 1 or 0. The matrix S is called adjacency matrix and [math]\displaystyle{ S_{ij}=1 }[/math] if there is an edge connecting nodes i and j; otherwise, [math]\displaystyle{ S_{ij}=0 }[/math]. The degree matrix D is defined as a diagonal matrix with diagonal entries [math]\displaystyle{ d_i=\sum\limits_{j}S_{ij} }[/math] indicating the degree of node i, i.e. the number of edges connected to node i.
Community Detection
The structure of networks sometimes reveals a high degree of organization. Nodes with similar properties/behaviors are more likely to be linked together and tend to form modules. Discovering such modules is called community detection, which is a hot topic in network data analysis. Moreover, once communities have been detected, the roles of the nodes in each community can be further investigated. The community structure of a graph can be used to give a compact visualization of large networks, i.e. one community can be treated as a big node. This kind of representation is called a supernetwork.
Introduction
Community detection is indeed a clustering problem. Spectral clustering methods are standard techniques fro clustering, which is based on the eigendecomposition of a Laplacian matrix. A spectral clustering framework as a weighted kernel PCA with primal and dual representations is proposed by <ref>C. Alzate and J. A. K. Suykens, Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 335-347, 2010 </ref>. The main advantage is the extension of the clustering model to out-of-sample nodes. The clustering model can be trained on a small subset of the network and then be applied to the rest of the graph. In other words, it is baded on a predictive model previously learned and can be used for online clustering of huge growing networks. A criterion based on Modularity <ref>M. E. J. Newman, Modularity and community structure in networks, Proc. Natl. Acad. Sci. USA, vol. 103, no. 23, pp. 8577-8582, 2006</ref> is used for parameter and model selection. Expansion factor <ref>A. S. Maiya and T. Y. Berger-Wolf, Sampling community structure, in Proc. 19th ACM Intl. Conference on the World Wide Web, 2010.</ref>
References
<references/>