Visualizing High Dimensional Data: Deep Line Chart for Big Data

Line chart is a popular technique to visualize trend among sequential data. Whereas it has been very a useful tool in general, it has a often neglected problem for large data sets: most conventional software library draw curves sequentially in certain order, so that the early drawn curves will be covered by curves drawn afterwards. This means that the appearance of the line chart depends on the drawing order, that is often random in the practices. The following animated picture illustrates a case where yellow curves get covered by blue curves. This drawback would especially become more apparent for large data set that comprises many high density curves.

In this note I'll present a new line chart method that avoids the above mentioned problem. Before we describe the new drawing method, let's recall a simplified version of the conventional way to draw line chart:

Input: A set of line segments each with 2 ending points and a color.
Output: A bitmap P with a color P_xy at each coordinator (x, y).
Initialize P with the background color.
for each line segment L
Let (r, g, b) be the color for L
for each pixel coordinator (x,y) on L
Set P_xy:=(r,g,b)

The above algorithm is a single pass drawing method where each pixel color of a line segment is directly painted on the bitmap. With this method, painted lines on the bitmap may potentially be painted over by successive lines.

The new method is a two-pass algorithm: it first sum up all pixel colors in an intermediate matrix; then convert the matrix into a bitmap. It works roughly as follows:

Input: A set of line segments each with 2 ending points and a color.
Output: A bitmap P with a color P_xy at each coordinator (x, y).
Initialize a matrix M with the same dimension as P, but
each component of M holds an integer vector M_xy=(r,g,b,n).
for each line segment L
Let (r, g, b) be the color for L
for each pixel coordinator (x,y) on L
Set M_xy := M_xy + (r, g, b, 1)
for each coordinator (x,y) in M
Set P_xy:= (r/n, g/n, b/n), where (r,g,b,n) is the vector M_xy

Compared to the previous method, the key change of the new method is the use of an intermediate matrix : the matrix sums up all colors from all lines and counts how many times a pixel has been painted. In the second pass, the matrix is converted to bitmap by taking the "average" color for each pixel. It should be pointed out here that, for an actual implementation, the data type for matrix components should be all 32 bits integers, so that no data overflow occurs when summing up the colors.

The following picture shows a line chart created with the new line chart algorithm for the same data used in previous picture above:

We see in above picture that the group of yellow curves are clearly visible. In general, the new line chart will reveal groups of curve, as long as the group is statistically significant with respect to the complete data set. The conventional line chart, on the other side, only shows curves on the "surface" of the curve bundle.

The new line chart algorithm has another significant advantage over the conventional line chart: since the algorithm does not mandate particular order to update the matrix and bitmaps, it can be efficiently parallelized on modern GPU hardware and achieve very good performance. The implementation included in VisuMap version 5.0.930 is, for a system with a GTX 960 card (that has ca. 2000 GPU cores), more than two orders of magnitude faster than the conventional line chart.

Visualizing High Dimensional Data

Saturday, June 3, 2017

Deep Line Chart for Big Data

No comments:

About Me

Blog Archive

Tweet