Friday, June 4, 2021

High Fidelity t-SNE

 High Fidelity t-SNE with Gradual Diminishing Exaggeration

When we repeat t-SNE on the same dataset with the same parameters, we may get different maps with noticeable random variations.  This kind of randomness limits the accuracy of t-SNE algorithm as a visualization tool.

In this note we describe a simple yet effective method to improve the accuracy of t-SNE: Instead of switching the exaggeration rate at a fixed time during the training, we propose starting the training process with a high exaggeration, say 12.0, then gradually reduce it to 1.0.  We will demonstrate the effectively of this method with example from single cell RNA sequential study.

Post-Training Normalization

Since the cost function optimized by t-SNE is invariant under rigged transformations (i.e. rotations and shifting), t-SNE normally produced maps with random rotations and shift. These kind of randomness can be easily removed by post-training normalization. For this study, we propose the following steps to normalize t-SNE maps as illustrated in the following diagram with 4 sample t-SNE maps:


As shown in above diagram, a t-SNE map is normalized in 3 steps:
  1. Centralizing: centralizing the map so that each of the 2D or 3D coordinate has zero-mean.
  2. PCA based Alignment: Rotating the map, so that the x-axis aligns with the first eigen vector of the map; y-axis with the second eigen vector, and potentially the z-axis with the third eigen vector.
  3. Moment based Flipping: Calculate the 0.5-th moment of the map along the x-axis according the following forms: 

    If the moment is negative, flip the map along the x-axis by multiple all x components with -1.0;
    Do the same with y-axis and potentially z-axis.

After normalization we can then measure the discrepancy between two t-SNE maps by summing up the Euclidean distances between the data points in both maps. For more convenience, we can just count number of the data points whose images points have distance from each other larger than certain values, say 40 pixels.

It should be pointed out that there are cases that the suggested method might fail. For instance, when the t-SNP maps are rotational symmetrical where no clear orientation is present, this method will have difficulty find common orientations. In the practice, we often encounter this case, when the perplexity parameter of t-SNE is too small, so that the resulting t-SNE map resemble a disc or a ball. Fortunately, this kind of issue can be easily fixed by increasing the perplexity.

Finally, it is interesting to notice that there is no need to scale the maps, since t-SNE with fixed parameter settings normally produces maps with roughly the same size.

Training with Gradual Diminishing Exaggeration

In order to improve the training speed and efficiency, t-SNE algorithm initially proposed to start the training with a high exaggeration factor, train the map for certain number of epochs; then switch the exaggeration factor to 1.0 to practically terminate the exaggeration phase. The following diagram shows the cost minimized during a typical t-SNE training process, the diagram has been overplayed with the exaggeration factor.

We see that the objection cost has been reduced rapidly for a short period after the exaggeration phased has ended, but then stayed basically unchanged during the rest of the training process. This behavior suggests that real learning only takes place when exaggeration factor changes. We propose thus the following function to gradually decrease the exaggeration factor:

In above formula,  f(n) is the exaggeration factor at n-th training epoch; C is the initial exaggeration factor; N is the number of total training epochs; L is the length of exaggeration phase which is set normally to 0.9N. 

Example 

As example I picked a dataset from single cell RNA sequence (scRNA seq) study. The dataset contains the expression counts of about 22000 cells with respect to 28000 genes. The input data for the t-SNE is thus a 22000x28000 matrix. The following picture shows a heatmap and a t-SNE map of this dataset:

For the test, t-SNE has been run on the dataset 4 times with and without enabling the new exaggeration method. The following pictures shows the cost reduction during training of the four repeats:



We see that, with the new method enabled, the final t-SNE maps have consistently lower cost. We also notice that, with the new method, the final maps have almost equal costs, whereas with old method, the cost of final maps vary slightly. Those small variations are more visible in the final maps. The following two pictures show the maps created with and without the new method:


With some effort we can discern relative large variation amount the maps created with old exaggeration strategy. We can visualize those variations more clearly with animation by interpolation between those maps as be shown in the following video clip:



Knockout Test with High Fidelity t-SNE

Knockout test has been used in microbiology to probe gene functions during the development of organisms. For doing that, certain genes are disabled or knocked-out; and changes induced in the organism give indications about those knocked-out genes.  In general, knockout test help us to trace the correlation between genotypes and phenotypes.

High fidelity t-SNE enables us to probe the effect of individual features on the output t-SNE maps. If consider the t-SNE maps as phenotypes derived from genotypes (i.e. gene expressions), we can perform kind of "knockout" test with t-SNE:  We create first a t-SNE map with all features; then create a t-SNE  map without certain features. We can then expect that the knocked out features would be responsible for the variation in the t-SNE maps.

The following short video demonstrates how "knockout" test might been done to study correlation between gene- and cell-clusters:


Conclusion

With a small change in its algorithm, t-SNE achieved much high accuracy and opens therefore doors for new applications.  

No comments: