Tuesday, May 7, 2013

Sorting high dimensional data with RPM maps

I have recently published a dataset  that contains the daily prices of the S&P 500 component companies for a period of 5 years. The mountain view of VisuMap provides a pretty good overview of the whole dataset in 3D styles. The following short video clips shows the normalized values (so that all price history start from 1.0) of the dataset:

We notice that the curves are colored using the k-Mean algorithm, so that stocks with similar development will have common colors. However, these curves are not ordered according to their group, but according to the alphabetic order of the stock tick, so that curves of different colors are all mixed with each other.

A simple way to enhance the 3D view is just to re-order the curves, so that curves of the same group will be located together. We can do this easily with the heat-map view of VisuMap. However, this method does not reorder the groups in a meaningful way, so that closely located groups, are not necessarily similar groups.

The RPM map provides a better way to sort high dimensional data according to similarities. To do this with VisuMap, we first create 3D RPM map for the dataset with very large width (e.g. 1200 pixels) but small height and depeth (e.g. 50 pixels), so that RPM map geometrically resembles a ring. Then we sort the  data points according to the x-coordinates in the 3D RPM map ( that determines the data point's position on the ring.) The nature of the RPM algorithm will make sure that closely located data points on the RPM map will be similar to each other.

The following short video shows how to do this with VisuMap. Notice that we have already created the 3D RPM map. To reorder the data points, we opened a table for the XYZ coordinates, then sorted the table on the X-coordinate column. The mountain view has been configured to automatically re-order its content when re-ordering-event occurred.

No comments: