Skip to content

2D branching data distribution example, with and without noise

Andrei Zinovyev edited this page Nov 3, 2017 · 13 revisions

2D toy example of a uniformly sampled branching pattern

We will approximate a toy example of a branching data distribution

data = load('test_data\tree23\tree23.data'); 
plot(data(:,1),data(:,2),'ko','MarkerSize',5);

Use construction of a principal tree with default parameters and 30 nodes

[NodePositions,Edges] = computeElasticPrincipalGraph(data,30);

This produces a set of 3 figures

Let us explain how to read them.

MSE and Elastic energy plot

This plot simply shows how Mean Squared Error and the value of the penalty (elastic energy) depends on the number of nodes. Two horizontal lines - black and pink show the reference values of MSE for approximating the data by 1 and 2 principal components. For example, here we see that starting from 5-nodes approximation, the principal tree outperforms the approximation made by a line (PC1).

Complexity vs Accuracy plot

This plot shows how normalized penalty (generalization of non-linearity) depends on the Fraction of Explained Variance (FEV). Local minimas of this plot usually corresponds to important structural changes of the elastic principal graph, and allow choosing the optimal number of nodes for the graph construction. The local minima are labeled by the structural barcodes of the graph, which should be read as follows.

N_k-stars| ... | N_4-stars | N_3-stars || N_nodes

For example, '2|6||15' means a principal tree with 15 nodes, 6 stars of order 3 (3 leaves) and 2 stars of order 4 (4 leaves).

For example, in this example, first several local minima corresponds to 4 scales when the principal graphs catches the main structural features of the branching distribution ('1||6','2||8','1|1||11' and '1|2||16'):

For more information on using complexity/accuracy plots read an article Zinovyev A. and Mirkes E. Data complexity measured by principal graphs. 2013. Computers and Mathematics with Applications 65:1471-1482..

PCA view on the principal tree

Here 2D projection from the multidimensional space onto the two selected principal components of the data distribution together with the principal graph is shown. By default, node numbers are shown and the size of the node is proportional to the number of data points projected into this node. The data points are colored accordingly to the branch/branching point of the graph/tree.

Metro map layout of the principal tree

Trees are planar graphs: therefore, one can always find such a 2D layout of a tree which will not contain edge intersections. However, there is infinitely many such layouts. We suggested to present the structure of elastic principal tree by so called 'Metro map layout'. This layout is constructed in order to represent in the best fashion the harmonical nature of the embedding of a principal tree into multidimensional space. The center of each star of the tree is the mean point of the set of the star's leaves. The distances between tree nodes in 2D represent approximately the edge lengths in the multidimensional space.

For more details on metro map layout, read the article Gorban A, Sumner N, Zinovyev A. Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes. 2008. Lecture Notes in Computational Science and Engeneering 58: 223-240..

Example of creating a metro map layout for a principal tree:

 % compute the number of data points projected into each node
 partition = PartitionData(data,NodePositions,100000,sum(data.^2,2));
 NodeSizes = histc(partition,[min(partition):max(partition)]);
 % draw the metro map layout, visualize the number of data points projected
 drawMetroMap(NodePositions,Edges,'NodeSizes',NodeSizes)

Adding noise to the data, applying robust elastic principal tree

Now let us add some background noise to the pattern to be approximated:

data_noise = load('test_data\tree23\tree23_noise.data'); 
plot(data_noise(:,1),data_noise(:,2),'ko','MarkerSize',5);

Application of the default principal tree algorithm fails: the result is strongly affected by the bakground noise:

[NodePositions,Edges] = computeElasticPrincipalGraph(data_noise,30);

Using the robust version of the algorithm with trimming radius = 0.4 and slightly adjusted stretching elasticity identifies the branching structure almost correctly.

[NodePositions,Edges] = computeRobustElasticPrincipalGraph(data_noise,40,0.4,'Lambda',0.02);

Of course, adding more noise to the branching pattern gradually makes detecting the branching structure more difficult. However, the algorithm is able to locate the main features of the structure even in the presence of strong noise.

 amount_of_noise = 200;
 data_noise = add_uniform_background_noise(data,amount_of_noise);
 [NodePositions,Edges] = computeRobustElasticPrincipalGraph(data_noise,30,0.35,'Lambda',0.02,'Plots',0);
 plot(data_noise(:,1),data_noise(:,2),'k.','MarkerSize',5); hold on; 
 plot(data(:,1),data(:,2),'r.','MarkerSize',5); 
 drawGraph2D(NodePositions,Edges); 
 title(sprintf('Percentage of noise points = %i%%',amount_of_noise));