-
Notifications
You must be signed in to change notification settings - Fork 170
Running InferCNV
InferCNV can be run via a simple 2-step protocol, or can be run step-by-step with customization for more exploratory purposes.
Creating an InferCNV object based on your three required inputs: the read count matrix, cell type annotations, and the gene ordering file:
# create the infercnv object
infercnv_obj = CreateInfercnvObject(raw_counts_matrix="singleCell.counts.matrix",
annotations_file="cellAnnotations.txt",
delim="\t",
gene_order_file="gene_ordering_file.txt",
ref_group_names=c("normal"))
where the ref_group_names parameter is set to the various normal-cell type (non-tumor) as defined in the cellAnnotations.txt file. See File-Definitions for more details here.
Note, if you do not have reference cells, you can set ref_group_names=NULL, in which case the average signal across all cells will be used to define the baseline. This can work well when there are sufficient differences among the cells included (ie. they do not all show a chromosomal deletion at the same place).
Note, inferCNV expects that you've already filtered out low quality cells. If you need to further impose minimum/maximum read counts per cell, you can include an additional filter, such as: min_max_counts_per_cell=c(1e5,1e6)
After creating the infercnv_obj, you can then run the standard infercnv procedure via the built-in 'infercnv::run()' method like so:
# perform infercnv operations to reveal cnv signal
infercnv_obj = infercnv::run(infercnv_obj,
cutoff=1, # use 1 for smart-seq, 0.1 for 10x-genomics
out_dir="output_dir", # dir is auto-created for storing outputs
cluster_by_groups=T, # cluster
denoise=T
)
The cutoff value determines which genes will be used for the infercnv analysis. Genes with a mean number of counts across cells will be excluded. For smart-seq (full-length transcript sequencing, typically using cell plate assays rather than droplets), a value of 1 works well. For 10x (and potentially other 3'-end sequencing and droplet assays, where the count matrix tends to be more sparse), a value of 0.1 is found to generally work well.
The out_dir is given an output directory name. If the directory doesn't exist, it will be created directly.
The 'cluster_by_groups' setting indicates to perform separate clustering for the tumor cells according to the patient type, as defined in the cell annotations file.
The general infercnv workflow as performed via the above infercnv::run() method operates as follows:
data:image/s3,"s3://crabby-images/bdefa/bdefaf2d49e95de147e09cc8f52af99a3eae5082" alt=""
To interactively explore the inferCNV heatmap, see our documentation here.
- InferCNV Home
- Quick Start
- Installing inferCNV
- Running InferCNV
- Applying Noise Filters
- Predicting CNV via HMM
- Bayesian Mixture Model
- Tumor heterogeneity - define tumor subclusters
- Interpreting the Figure
- Inputs to InferCNV
- Outputs from InferCNV
- More inferCNV example data sets
- Using 10x data
- Interactively navigating data using the Next Generation Heatmap Viewer
- Extracting HMM features
- FAQ and common issues