Skip to content
/ scEpath Public

An energy landscape-based approach for measuring developmental states and inferring cellular trajectories from single cell RNA-seq data

Notifications You must be signed in to change notification settings

sqjin/scEpath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scEpath

Package of scEpath (a novel tool for analyzing single cell RNA-seq data)

Overview

This is a MATLAB Package of scEpath ("single-cell Energy path"). scEpath is a novel computational method for quantitatively measuring developmental potency and plasticity of single cells and transition probabilities between cell states, and inferring lineage relationships and pseudotemporal ordering from single-cell gene expression data. In addition, scEpath performs many downstream analyses including identification of the most important marker genes or transcription factors for given cell clusters or over pseudotime.

The rational of scEpath for inferring cellular trajectories is based on the famous Waddington's landscape metaphor for describing the cellular dynamics during the development. Below is a conceptual illustration from a paper (Takahashi et al. Development, 2015)

Check out our paper (Jin et al. Bioinformatics, 2018) for the detailed methods and applications. Below is the overview of scEpath.

Overview of scEpath

Systems Requirements

scEpath is independent of operating systems because it is written in Matlab. Basic requirement for running scEpath includes MATLAB and the Statistics toolbox. The pseudotime estimation step requires the R package "princurve" for principal curve analysis. In this case, both R and Matlab are required for running scEpath.

This Package has been tested using MATLAB 2016a/b/2017a on Mac OS/64-bit Windows.

Usage

Unzip the package. Change the current directory in Matlab to the folder containing the scripts.

This directory includes the following main scripts:

  1. scEpath_demo.m -- an example run of scEpath on a specific dataset
  2. preprocessing.m -- do preprocessing of the input data (if applicable)
  3. constructingNetwork.m -- construct a gene-gene co-expression network
  4. estimatingscEnergy.m -- estimate the single cell energy (scEnergy) for each cell
  5. ECA.m -- prinpipal component analysis of energy matrix
  6. clusteringCells.m -- perform unsupervised clustering of single cell data
  7. addClusterInfo.m -- integrate clustering information
  8. inferingLineage.m -- infer the cell lineage hierarchy
  9. FindMDST.m -- find the minimal directed spanning tree in a directed graph
  10. inferingPseudotime.m -- reconstruct pseudotime
  11. smootheningExpr.m -- calculating the smooth version of expression level based on pseudotime
  12. identify_pseudotime_dependent_genes.m -- identify pseudotime dependent marker genes
  13. identify_keyTF.m -- identify key transcription factors responsible for cell fate decision

  1. cluster_visualization.m -- visualize cells on two-dimensional space
  2. lineage_visualization.m -- display cell lineage hierarchy with transition probability
  3. scEnergy_comparison_visualization.m -- comparison of scEnergy among different clusters
  4. landscape_visualization -- display energy landscape in 2-D contour plot and 3-D surface
  5. plot_genes_in_pseudotime.m -- plot the temporal dynamics of individual gene along pseudotime
  6. plot_rolling_wave.m -- create "rolling wave" showing the temporal pattern of pseudotime-dependent genes and display gene clusters showing similar patterns
  7. plot_rolling_wave_TF.m -- create "rolling wave" showing the temporal pattern of key transcription factors

For each run, the final results of the analysis are deposited in the "results" directory:

  1. results/figures, containing PDF figures of the analysis.
  2. results/PDG_in_each_cluster, containing the identified pseudotime-dependent marker genes in each cluster
  3. results/temporalfiles, containing intermediate results from the analysis.

Please refer to scEpath_demo.m for instructions on how to use this code. Input Data are gene expression data matrix (rows are genes and columns are cells).

If you have any problem or question using the package please contact suoqin.jin@uci.edu

About

An energy landscape-based approach for measuring developmental states and inferring cellular trajectories from single cell RNA-seq data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages