-
Notifications
You must be signed in to change notification settings - Fork 6
2D branching data distribution example, with and without noise
We will approximate a toy example of a branching data distribution
data = load('test_data\tree23\tree23.data');
plot(data(:,1),data(:,2),'ko','MarkerSize',5);
Use construction of a principal tree with default parameters and 30 nodes
[NodePositions,Edges] = computeElasticPrincipalGraph(data,30);
This produces a set of 3 figures
Let us explain how to read them.
This plot simply shows how Mean Squared Error and the value of the penalty (elastic energy) depends on the number of nodes. Two horizontal lines - black and pink show the reference values of MSE for approximating the data by 1 and 2 principal components. For example, here we see that starting from 5-nodes approximation, the principal tree outperforms the approximation made by a line (PC1).
This plot shows how normalized penalty (generalization of non-linearity) depends on the Fraction of Explained Variance (FEV). Local minimas of this plot usually corresponds to important structural changes of the elastic principal graph, and allow choosing the optimal number of nodes for the graph construction. The local minima are labeled by the structural barcodes of the graph, which should be read as follows.
N_k-stars| ... | N_4-stars | N_3-stars || N_nodes
For example, '2|6||15' means a principal tree with 15 nodes, 6 stars of order 3 (3 leaves) and 2 stars of order 4 (4 leaves).
For example, in this example, first several local minima corresponds to 4 scales when the principal graphs catches the main structural features of the branching distribution ('1||6','2||8','1|1||11' and '1|2||16'):
For more information on using complexity/accuracy plots read an article Zinovyev A. and Mirkes E. Data complexity measured by principal graphs. 2013. Computers and Mathematics with Applications 65:1471-1482..
Here 2D projection from the multidimensional space onto the two selected principal components of the data distribution together with the principal graph is shown. By default, node numbers are shown and the size of the node is proportional to the number of data points projected into this node. The data points are colored accordingly to the branch/branching point of the graph/tree.
Trees are planar graphs: therefore, one can always find such a 2D layout of a tree which will not contain edge intersections. However, there is infinitely many such layouts. We suggested to present the structure of elastic principal tree by so called 'Metro map layout'. This layout is constructed in order to represent in the best fashion the harmonical nature of the embedding of a principal tree into multidimensional space. The center of each star of the tree is the mean point of the set of the star's leaves. The distances between tree nodes in 2D represent approximately the edge lengths in the multidimensional space.
For more details on metro map layout, read the article Gorban A, Sumner N, Zinovyev A. Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes. 2008. Lecture Notes in Computational Science and Engeneering 58: 223-240..
Example of creating a metro map layout for a principal tree:
% compute the number of data points projected into each node
partition = PartitionData(data,NodePositions,100000,sum(data.^2,2));
NodeSizes = histc(partition,[min(partition):max(partition)]);
% draw the metro map layout, visualize the number of data points projected
drawMetroMap(NodePositions,Edges,'NodeSizes',NodeSizes)
Adding noise to the data, applying robust elastic principal tree
Now let us add some background noise to the pattern to be approximated:
data_noise = load('test_data\tree23\tree23_noise.data');
plot(data_noise(:,1),data_noise(:,2),'ko','MarkerSize',5);
Application of the default principal tree algorithm fails: the result is strongly affected by the bakground noise:
[NodePositions,Edges] = computeElasticPrincipalGraph(data_noise,30);
Using the robust version of the algorithm with trimming radius = 0.4 and slightly adjusted stretching elasticity identifies the branching structure almost correctly.
[NodePositions,Edges] = computeRobustElasticPrincipalGraph(data_noise,40,0.4,'Lambda',0.02);
Of course, adding more noise to the branching pattern gradually makes detecting the branching structure more difficult. However, the algorithm is able to locate the main features of the structure even in the presence of strong noise.
amount_of_noise = 200;
data_noise = add_uniform_background_noise(data,amount_of_noise);
[NodePositions,Edges] = computeRobustElasticPrincipalGraph(data_noise,30,0.35,'Lambda',0.02,'Plots',0);
plot(data_noise(:,1),data_noise(:,2),'k.','MarkerSize',5); hold on;
plot(data(:,1),data(:,2),'r.','MarkerSize',5);
drawGraph2D(NodePositions,Edges);
title(sprintf('Percentage of noise points = %i%%',amount_of_noise));