Spectral Clustering
17.6. Spectral Clustering#
Spectral clustering is a graph based clustering algorithm [84].
We build a graph
Each vertex in the graph represents a data point.
Each edge in the graph represents the similarity between two data points.
denotes the list of vertices. denotes the adjacency matrix built from the similarities.
Once the graph has been built, the following steps are performed.
The degree of a vertex
is defined as .The degree matrix
is defined as the diagonal matrix with the degrees .The unnormalized graph Laplacian is defined as
.The normalized graph Laplacian 1 is defined as
The subscript
stands for random walk.We compute
and examine its eigen-structure to estimate the number of clusters and the label vector .If
is known in advance, usually the first eigen vectors of corresponding to the smallest eigen-values are taken and their row vectors are clustered using K-means algorithm [71].Since, we don’t make any assumption on the number of clusters, we need to estimate it.
A simple way is to track the eigen-gap statistic.
After arranging the eigen values in increasing order, we can choose the number
such that the eigen values are very small and is large.This is guided by the theoretical results that if a Graph has
connected components then exactly eigen values of are 0.However, when the data points are not clearly separated, and noise is introduced, this approach becomes tricky.
We can go for a more robust approach by analyzing the eigen vectors as described in [87].
The approach of [87], with a slightly different definition of the graph Laplacian
[62], has been adapted for working with the Laplacian as defined above.We estimate the number of clusters from the Graph Laplacian.
It can be easily shown that
is an eigen value of with an eigen vector [84].Further, the multiplicity of eigen value
equals the number of connected components in .In fact the adjacency matrix can be factored as
where
is the adjacency matrix for the -th connected component of corresponding to the -th cluster and is the unknown permutation matrix.The graph Laplacian for each
has an eigen value and the eigen-vector .Thus, if we look at the
-dimensional eigen-space of corresponding to eigen value , then there exists a basis such that each row of is a unit vector in and the columns contain ones.Actual eigen vectors obtained through any numerical method will be a rotated version of
given by .[87] suggests a cost function over the entries in
such that the cost is minimized when the rows of are close to coordinate vectors.It then estimates a rotation matrix as a product of Givens rotations which can rotate
to minimize the cost.The parameters of the rotation matrix are the angles of Givens rotations which are estimated through a Gradient descent process.
Since
is unknown, the algorithm is run over multiple values of and we choose the value which gives minimum cost.Note that, we reuse the rotated version of
obtained for a particular value of when we go for examining eigen-vectors.This may appear to be ad-hoc, but is seen to help in faster convergence of the gradient descent algorithm for next iteration.
When
is small, we can do a complete SVD of to get the eigen vectors.However, this is time consuming when
is large (say 1000+).An important question is how many eigen vectors we really need to examine!
As
increases, the number of Givens rotation parameters increase as .Thus, if we examine too many eigen-vectors, we will lose out unnecessarily on speed.
We can actually use the eigen-gap statistic described above to decide how many eigen vectors we should examine.
Finally, we assign labels to each data point to identify the cluster they belong to.
As described above, we maintain the rotated version of
during the estimation of rotation matrix.Once, we have zeroed in on the right value of
, then assigning labels to is straight-forward.We simply perform non-maximum suppression on the rows of V, i.e. we keep the largest (magnitude) entry in each row of
and assign zero to the rest.The column number of the largest entry in the
-th row of is the label for .This completes the clustering process.
While eigen gap statistic based estimation of number of clusters is quick,
it requires running an additional