Monday, February 2, 2015

On Loopy GMS (Geometrical Modeling of Sequences)

This note is going to apply the GMS model to loop structured sequences. The adaptation from previous serial structure to loopy structure is straightforward: the only change is, as illustrated in the following diagram, that the two ends of the serial sequence are now connected to each other.

When applying the GMS model, the scanning machine ( or the scanner ) runs over the loopy sequence and produces a series of high dimensional vectors which are argumented with timestamps. Those high dimensional vectors will then be embedded into low dimensional space with the affinity embedding algorithm.

The first question arise in this new scenario is whether the GMS model will produce loop alike geometrical shapes, when the scanner runs a whole loop back to its initial location. The answer to this question is yes, but under two conditions. First, the decay speed parameter λ used to calculate the affinity between vectors must be zero, so that the effect of timestamps will be nullified. Otherwise, if λ is not zero, different vectors will be produced when the scanner comes back to its initial position.

Secondly, the total number of nodes, L, in the loop must be a multiple of the dimension of the output vector (without the timestampe component), i.e. the parameter K denoted in previous note. This requirement is necessary because of the circular shifting of the scanned vectors. If L is not a multiple of K, after a whole loop the scanner will produce the same set of  values, but circularly shifted to different order, so that they will normally be different as vectors in the high dimensional space.

The following pictures shows the resulting maps produced by GMS for a short loopy sequence for different cases discussed above.

 Figure 1: Conditions for loopy output maps from loopy sequence. The input sequence is "CCC TGT GGA GCC GGA GCC ACA AGT", K=6. (A) The decay speed λ=0.1, the resulting map is a broken loop in the 3D space;  (B) The sequence is extended with an extra node 'G', so that K is not a multiple of L. The resulting map is a broken loop.  (C) λ = 0 and L is a multiple of K, the resulting map is a loop in the 3D space.

With a loopy sequence, the scanner can in general run multiple rounds around the loop sequence. In this way, the scanner can produce multiple sets of vectors offset just by different timestamps. These vectors sets form repeating patterns when embedded in to low dimensional space. The following short video shows the resulting maps for above sequence for repeat number n = 1, 2, 5 and 10. The decay speed λ is set to 0.1. We notice that in these maps, the repeated structure gradually become flattened when the repeat number increases to high number. Consequently, the output map changes from simple repeated structure to tube alike shape. We notice that both repeated structures  and tubes are pretty frequent structures in biological systems as phenotypic traits.

We notice that running the scanner multiple loops is effectively the same as scannng multiple copies of a sequence sequentially. Thus, the loopy scanning might be considered as a manifestation (or extension ) of sequential scanning where the sequence is a the concatenation of variable number of copies of a reference sequence.

Another extension of the loopy GMS model is, as depicted in the following diagram, using two scanners to scan the loop in the opposite orientations, then concatenate their outputs and a timestampe to form final high dimensional vectors.

The loopy GMS model with dual scanners normally produces symmetrical shapes. The reason for this is that for each configuration of the scanner (i.e. their specific positions on the loop during the scanning) there is always a "dual" configuration in which the two scanners simply swapped their positions. Because of this, the whole collection of output vectors may be split into two sets which differ from each other just by the different timestamps. The following short video shows various symmetrical shapes produced by loopy GMS with dual scanners for various sequences:


This note has experimentally demonstrated that GMS model with simple extension may produce interesting macroscopical shapes, like repeating pattern, tubes and symmetrical structures. Understanding how those discrete sequences give rise to geometrical pattern might help us to investigate how genetic code determines phenotypical traits biological organism.