This is a MATLAB Package of scEpath ("single-cell Energy path"). scEpath is a novel computational method for quantitatively measuring developmental potency and plasticity of single cells and transition probabilities between cell states, and inferring lineage relationships and pseudotemporal ordering from single-cell gene expression data. In addition, scEpath performs many downstream analyses including identification of the most important marker genes or transcription factors for given cell clusters or over pseudotime.
The rational of scEpath for inferring cellular trajectories is based on the famous Waddington's landscape metaphor for describing the cellular dynamics during the development. Below is a conceptual illustration from a paper (Takahashi et al. Development, 2015)
Check out our paper (Jin et al. Bioinformatics, 2018) for the detailed methods and applications. Below is the overview of scEpath.
scEpath is independent of operating systems because it is written in Matlab. Basic requirement for running scEpath includes MATLAB and the Statistics toolbox. The pseudotime estimation step requires the R package "princurve" for principal curve analysis. In this case, both R and Matlab are required for running scEpath.
This Package has been tested using MATLAB 2016a/b/2017a on Mac OS/64-bit Windows.
Unzip the package. Change the current directory in Matlab to the folder containing the scripts.
This directory includes the following main scripts:
- scEpath_demo.m -- an example run of scEpath on a specific dataset
- preprocessing.m -- do preprocessing of the input data (if applicable)
- constructingNetwork.m -- construct a gene-gene co-expression network
- estimatingscEnergy.m -- estimate the single cell energy (scEnergy) for each cell
- ECA.m -- prinpipal component analysis of energy matrix
- clusteringCells.m -- perform unsupervised clustering of single cell data
- addClusterInfo.m -- integrate clustering information
- inferingLineage.m -- infer the cell lineage hierarchy
- FindMDST.m -- find the minimal directed spanning tree in a directed graph
- inferingPseudotime.m -- reconstruct pseudotime
- smootheningExpr.m -- calculating the smooth version of expression level based on pseudotime
- identify_pseudotime_dependent_genes.m -- identify pseudotime dependent marker genes
- identify_keyTF.m -- identify key transcription factors responsible for cell fate decision
- cluster_visualization.m -- visualize cells on two-dimensional space
- lineage_visualization.m -- display cell lineage hierarchy with transition probability
- scEnergy_comparison_visualization.m -- comparison of scEnergy among different clusters
- landscape_visualization -- display energy landscape in 2-D contour plot and 3-D surface
- plot_genes_in_pseudotime.m -- plot the temporal dynamics of individual gene along pseudotime
- plot_rolling_wave.m -- create "rolling wave" showing the temporal pattern of pseudotime-dependent genes and display gene clusters showing similar patterns
- plot_rolling_wave_TF.m -- create "rolling wave" showing the temporal pattern of key transcription factors
For each run, the final results of the analysis are deposited in the "results" directory:
- results/figures, containing PDF figures of the analysis.
- results/PDG_in_each_cluster, containing the identified pseudotime-dependent marker genes in each cluster
- results/temporalfiles, containing intermediate results from the analysis.
Please refer to scEpath_demo.m for instructions on how to use this code. Input Data are gene expression data matrix (rows are genes and columns are cells).
If you have any problem or question using the package please contact suoqin.jin@uci.edu