-
Notifications
You must be signed in to change notification settings - Fork 6
Basic use of Elastic Principal Graphs Matlab package
One of the simplest but non-trivial types of the principal graphs are graphs having no loops, called principal trees. All possible trees can be generated from the simplest graph containing one single edge, using "Add a node to a node"+"Bisect an edge" graph grammar. For even more simple grammars constructing principal curves and circles, see the section "Elastic principal graphs" below.
To construct default elastic principal tree for a data matrix, containing n rows (objects) and m columns (variables), run
data = load('test_data/iris/iris.data');
[NodePositions,Edges,ReportTable] = computeElasticPrincipalGraph(data,20);
In this case, the default choice of the graph grammar will be used (the one constructing principal tree).
The required arguments are data matrix and the number of nodes in the tree. The output is
NodePositions - coordinates of the constructed tree nodes in the data space
Edges - two-column array of integers, each line specifies an edge in the graph
ReportTable - table containing detailed report on constructing the graph
computeElasticPrincipalGraph function creates three figures:
"MSE and Elastic energy plot" shows the dynamics of data approximation and elastic energy term during principal tree construction.
"Accuracy/Complexity plot" shows the dependence of the normalized geometrical complexity on the number of nodes
"PCA view on principal tree" shows the projection of a principal graph onto the first two principal components of the data
Example of a simple 2D branching data distribution with description of the output.
Two non-branching types of principal graphs have been implemented in the package together with principal trees. They are principal curves and ''principal circles'' (principal closed curves). All the above mentioned options are applicable for this type of graphs as well.
cc = load('test_data/circle/simple_circle.data');
computeElasticPrincipalCurve(cc,30);
cc = load('test_data/circle/simple_circle.data');
computeElasticPrincipalCircle(cc,30);
'verbose' - binary value
verbose equals zero suppresses printing the report table on the fly.
Changing the elastic coefficients of the graph
'Lambda', double - coefficient of elastic stretching of the graph (default - 0.01) 'Mu', double - penalty coefficient for deviation from harmonicity (default - 0.1)
Example: Setting stretching penalty to 0.1 and deviation from harmonicity penalty to 0.5
computeElasticPrincipalGraph(data,20,'Lambda',0.1,'Mu',0.5);
Using robust local version of the algorithm
'TrimmingRadius', double - local radius used to define close neighbourhood of a graph node. Used to construct robust local principal graphs described here. Note that constructing robust principal trees requires a graph initialization different from the global one. Choosing TrimmingRadius requires certain expertise: however, function ComputeDataCoarseGrainedRadius gives a good initial estimate.
Example: using robust elastic principal graph algorithm instead of the default global one
computeElasticPrincipalGraph(data,40,'TrimmingRadius',0.1);
Reducing dimension of the dataset
'ReduceDimension', integer or array of integers
This option allows reducing projecting the dataset into the subspace spanned by first M principal components, before constructing the principal graph. If ReduceDimension is a single integer K then the dataset is projected into the linear subspace spanned by the first K principal components. If ReduceDimension is an array of two integers [K1,K2] then the dataset is projected into the linear subspace spanned by all principal components between K1 and K2 inclusively. By default, dimension reduction is not applied. If the number of the elements in the array is more than 2 then the specified principal component numbers will be used for dimension reduction.
Example: reduce dimension of the data to 2 first principal components before applying the principal tree algorithm
computeElasticPrincipalGraph(data,30,'ReduceDimension',2);
Example: removing the first principal component from the dataset (can be usefull in certain data analyses for data "normalization", e.g. removing average gene expression signal).
computeElasticPrincipalGraph(data,30,'ReduceDimension',[2 size(data,2)]);
Setting an initial graph configuration
'InitGraph', structure having fields InitNodes and InitEdges
Example: training the principal graph in two epochs: "rigid" and "soft"
[NodePositions,Edges,ReportTable] = computeElasticPrincipalGraph(data,20,'Lambda',0.1,'Mu',1);
close all;
computeElasticPrincipalGraph(data,30,'InitGraph',struct('InitNodes',NodePositions,'InitEdges',Edges),'Lambda',0.001,'Mu',0.01);
Controlling plotting behaviour
'Plots' - integer number
The number represent a sum of codes for several plots (1 - for accuracy/complexity plot, 2 - for pca view plot, 4 - for metro map plot (applicable only to trees), 8 - for MSE plot). Default value is 11 (three generic plots are shown). Zero value will suppress all plots.
Example: showing only PCA plot.
[NodePositions,Edges,ReportTable] = computeElasticPrincipalGraph(data,20,'Plots',2);
Boosting the algorithm performance by local optimization
'LocalSearch' - integer value
The algorithm of growing elastic principal graphs can be accelerated (approximately, by a factor of two for large graphs and datasets) by avoiding global graph optimization after each application of a graph grammar. The LocalSeach value is used to define the node neighbourhood in a graph for local optimization. The larger the LocalSearch value, the more the optimization is close to the global one. Reccomended value is 2.
Example: Comparing two versions of the graph: one constructed using global optimization and one by local optimization.
data = load('test_data/tree23/tree23_inflated.data');
tic; [NodePositions,Edges,ReportTable] = computeElasticPrincipalGraph(data,50,'Plots',2); toc;
figure;
tic; [NodePositions,Edges,ReportTable] = computeElasticPrincipalGraph(data,50,'Plots',2,'LocalSearch',2); toc;
Defining graph grammars
'GrowGrammars' and 'ShrinkGrammars' - arrays of string labels
These options might be used to modify the default applicatoin of the graph grammars. Currently only four graph grammar operations ('bisectedge','addnode2node','shrinkedge','removenode') are implemented, but in the future this set can be more extensive.
Example: Growing principal trees without shrinking step (accelerates significantly the principal tree algorithm but makes it more prone to local minima in terms of the suboptimal graph structures, since any change in the tree structure remains irreversible).
data = load('test_data/tree23/tree23_inflated.data');
[NodePositions,Edges,ReportTable] = computeElasticPrincipalGraph(data,50,'GrowGrammars',[{'bisectedge','addnode2node'}],'ShrinkGrammars',[]);