Wednesday, July 1, 2009

Data Clustering with Self-Organizing Graph

We have just released VisuMap version 2.7.832. In this release we have added a very interesting clustering service called the self-organizing graph (SOG). SOG is basically an extension to the self-ogranizing map (SOM) that simulates a homogeneous artificial neural network. Unlike SOM, SOG simulates a network of arbitrary structure. The network can be in fact any weighted undirected graph. The network is a kind of parameter for the algorithm: the user defines the network depending on his/her knowledge or interest about the dataset.

The following picture illustrates an application of SOG. On the left side is a map of a dataset that shows roughly two data point clouds. The right side shows the network we used to classify the dataset. During the learning process each node of the network will become associated with some data points. After the learning process has finished each data point will be classified as their corresponding nodes in the SOG network.

We see that the SOG algorithm not only correctly classified the two data point clouds, but also captured those data points located a the peripheries of the clouds (data points shown as blue triangles).

The following is a short video tutorial for the SOG clustering service in VisuMap:

Comparing with traditional clustering algorithms, like k-Mean and hierarchical clustering algorithms, SOG provides more flexible clustering service. With SOG the user can not only specify the number of clusters, but also the size and other geometric or topological relationships between the clusters. In fact, SOG enables the user to specify a model to reflect existing knowledge and interests about the data.

There are many variants of the self-organizing map that use irregular networks. Those algorithms often employ adaptive strategies to generate the irregular networks depending on the data. Those irregular network are results of those algorithms. Contrary to those algorithms SOG uses the network as input for the algorithm, the user can express his/her interest and knowledge about the data through the network.

SOG is our first attempt to provide a new kind of clustering service. While its framework resembles to provide general pattern matching capability. The learning algorithm it employed has only limited capability to capture structural information. In view of this there are still many challenges and potentials for SOG algorithm in the future.