Skip to content

Deconvolution of brain cell types

ejh243 edited this page Oct 19, 2023 · 3 revisions

Estimate the cellular composition of bulk brain tissue from DNA methylation data

As well as pretrained models for blood, we have included a number of pre-trained models for adult brain samples. This takes advantage of novel profiles we have generated for several glial subtypes. These are relevant for quantifying the cellular heterogeneity of the cortex and other regions of the human brain from bulk DNA methylation data. To estimate cellular composition of your samples you just require a matrix of (normalised) beta values, where rows are DNA methylation sites and columns are samples. The CETYGO package enables you to not only calculate the composition of your samples but additionally calculates a metric, the CETYGO score, which can be used to infer the accuracy of the estimation. This can be helpful to determine whether the reference panel is appropriate for use with your samples and if any individual samples have a low signal:noise ratio that negatively affects the accuracy of the estimation.

The pre-trained models are provided in the R object modelBrainCoef. This is a list, where the first level of the list separates the two methods for selecting cell-specific sites, either ANOVA or IDOL. The second level of the list then separates the different reference panels (defined in the table below), which can be indexed by the number of the panel. For example, to get the sites needed to estimate cellular composition for reference panel 1 and selected with the IDOL methodology you need to run modelBrainCoef[["IDOL"]][[1]]

We have also provided 10 exemplar adult prefrontal cortex profiles generated with the 450K array in the R object pfcdata for the purpose of demonstrating how to calculate the cellular proportions for brain cell types and the CETYGO score for each sample. This can be done as follows for a single pre-trained model.

library(CETYGO)

predProp<-projectCellTypeWithError(pfcdata, modelBrainCoef[["IDOL"]][[1]])

head(predProp)

Substituting modelBrainCoef[["IDOL"]][[1]] with the coefficients of a different pre-trained model will calculate a different set of cell type variables.

To calculate the cellular composition of your samples using all the pre-trained models we supply you can run:

predPropAll<-list()

for(method in names(modelBrainCoef)){
    for(j in 1:length(modelBrainCoef[[method]])){
       if(!is.null(modelBrainCoef[[method]][[j]])){
            predPropAll[[method]][[j]]<-projectCellTypeWithError(pfcdata, modelBrainCoef[[method]][[j]])
       }
    } 
}

You can then extract the best estimator for each cell type as follows.


predPropBest<-cbind(predPropAll[["ANOVA"]][[1]][,c("NeuNNeg_SOX10Neg", "NeuNPos")], predPropAll[["IDOL"]][[5]][,c("NeuNNeg_Sox10Neg_IRF8Pos","NeuNNeg_Sox10Neg_IRF8Neg")], predPropAll[["ANOVA"]][[3]][,c("SATB2Neg", "SATB2Pos")], predPropAll[["ANOVA"]][[6]][,c("NeuNNeg", "NeuNPos_SOX6Pos", "NeuNPos_SOX6Neg")], predPropAll[["IDOL"]][[4]][,"NeuNNeg_SOX10Pos"])
colnames(predPropBest)[10]<-"NeuNNeg_SOX10Pos"

Pre-trained models

Presently, there is a range of reference DNA methylation profiles available for brain cell types, some of which target overlapping populations of cells, through the use of different FANS gating strategies. Therefore, we have defined multiple combinations of cell types that could serve as a reference panel for the deconvolution of cellular composition of brain DNAm data. In addition there are multiple methodologies for selecting cell specific sites, which is effectively the training component of the algorithm. We have included 10 different pre-trained models for you to apply to your brain DNA methylation data. This includes 5 different reference panels, which consist different combinations of cell types, and up to 2 ways to select the cell-specific sites, either by ANOVA (the classical approach implemented originally in minfi) or IDOL. All these are stored in the R object modelBrainCoef which is distributed with the CETYGO package.

<style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} </style>
   
Panel   
   
Included Fractions   
   
Corresponding Cell Types   
   
1   
NeuNPos
NeuNNeg/SOX10Pos
NeuNNeg/SOX10Neg
Neuronal enriched
Oligodendrocyte enriched
Other glial enriched
   
2   
NeuNPos
NeuNNeg/SOX10Pos
NeuNNeg/SOX10Neg/IRF8Pos
NeuNNeg/SOX10Neg/IRF8Neg
Neuronal enriched
Oligodendrocyte enriched
Microglia enriched
Astrocyte enriched
   
3   
SATB2Pos
SATB2Neg
Excitatory (glutamatergic) neuronal enriched
Inhibatory (GABAergic) neuronal and other glial enriched
   
4   
NeuNPos
SATB2Pos
NeuNNeg/SOX10Pos
NeuNNeg/SOX10Neg
Neuronal enriched
Excitatory (glutamatergic) neuronal enriched
Oligodendrocyte enriched
Other glial enriched
   
5   
NeuNPos
SATB2Pos
NeuNNeg/SOX10Pos
NeuNNeg/SOX10Neg/IRF8Pos
NeuNNeg/SOX10Neg/IRF8Neg
Neuronal enriched
Excitatory (glutamatergic) neuronal enriched
Oligodendrocyte enriched
Microglial enriched
Astrocyte enriched
   
6   
NeuNPos/SOX6Pos
NeuNPos/SOX6Neg
NeuNNeg
Inhibatory (gabaergic) neuronal enriched
Excitatory (glutamatergic) neuronal enriched
Glial enriched
   
7   
NeuNPos/SOX6Pos
NeuNPos/SOX6Neg
NeuNNeg/SOX10Pos
NeuNNeg/SOX10Neg
Inhibatory (gabaergic) neuronal enriched
Excitatory (glutamatergic) neuronal enriched
Oligodendrocyte enriched
Other glial enriched
   
8   
NeuNPos/SOX6Pos
NeuNPos/SOX6Neg
NeuNNeg/SOX10Pos
NeuNNeg/SOX10Neg/IRF8Pos
NeuNNeg/SOX10Neg/IRF8Neg
Inhibatory (gabaergic) neuronal enriched
Excitatory (glutamatergic) neuronal enriched
Oligodendrocyte enriched
Microglia enriched
Astrocyte enriched

Note that panels 4 and 5, contain both SATB2Pos and NeuNPos fractions, that both target excitatory neuronal nuclei. Essentially this means that there may be cells in your bulk tissue that rightly belong to both populations. This direct conflict has a negative effect on the accuracy of the deconvolution of the bulk tissue and we therefore do not recommend the use of these panels for the estimation of neuronal composition.

Data availability for brain cell type reference profiles

Given the limitations with GitHub and uploading large files, we took the practical decision to limit the data we distributed with the CETYGO package to the minimal amount needed to implement the deconvolutions we present in this manuscript. Specifically this is the coefficients for the subset of cell-specific sites and these can be found in the modelBrainCoef object.

In addition the raw and processed data for the samples generated y our group at the University of Exeter can be downloaded from GEO under accession number GSE234520. These were augmented with reference profiles from the EpiGABA project, the data for which are available via the synapse platform under accession number syn7072866.

Citation

If you use our pre-trained brain models please cite our preprint as well as the CETYGO package.