Processing TAR-scRNA-seq outputs #8

dkeitley · 2021-06-10T18:16:25Z

Hi Michael,

Sorry for all the questions. This might be a bit naive.

I was just wondering how you went about processing the matrices outputted from the pipeline to construct a counts matrix...? Did you use standard functions to aggregate the matrices across the different directories (e.g. something like read10Counts from DropletUtils)?

The text was updated successfully, but these errors were encountered:

fw262 · 2021-06-10T21:14:14Z

Hi Dan,

More questions are never a problem!

The pipeline generates count matrices in text form (from_fastq branch) and in 10X output form (from_cellranger branch) for each individual sample. The count matrices are not combined across samples. If you want to combined count matrices across samples, you would need to do that in Seurat or Scanpy through different integration techniques such as Harmony, Scanorama, SCT transform, etc. Integration is currently NOT available in the TAR-scRNA-seq workflow.

Hope that helps!
Michael

dkeitley · 2021-06-17T15:39:44Z

Ok makes sense. I thought there might be a standard way to read in lots of .txt.gz files and aggregate them together but maybe just calling fread and cbind will work fine in a loop.

But now I'm thinking about this, I've also realised that the features in the TAR count matrices across my different samples aren't the same.

e.g.

> mat1 <- fread("SIGAA12_S45_L001_TAR_expression_matrix_withDir.txt.gz")
> mat2 <- fread("../SIGAB12_S49_L001/SIGAB12_S49_L001_TAR_expression_matrix_withDir.txt.gz")

> dim(mat1)
[1] 41553 15001

> dim(mat2)
[1] 88374 15001

> mat1[1:5,1:3]
                                       GENE TCCACGTGTTGACGGA CACTGTCCACACCGCA
1: 10_10019099_10066299_-_28895_C7orf31_-_1                0                0
2:            10_10021149_10060449_+_6602_0                0                0
3:            10_10074599_10079749_+_1017_0                0                0
4:             10_10106549_10110499_+_505_0                0                0
5:            10_10117999_10129149_-_1177_0                0                0

> mat2[1:5,1:3]
                                       GENE CACGTGGAGCCGATCC TCCACCATCGACGCGT
1: 10_10019099_10066299_-_28895_C7orf31_-_1                0                0
2:            10_10021149_10060449_+_6602_0                0                0
3:            10_10074599_10079749_+_1017_0                0                0
4:             10_10106399_10110949_-_567_0                0                0
5:            10_10117999_10129149_-_1177_0                0                0

Maybe I've misunderstood or have run the pipeline incorrectly but I was expecting that the features would be consistent across the different samples (ignoring maybe the coverage values in the feature names which differ) so that I could combine the count matrices together and then as you say, integrate to get a multi-sample dataset that includes TAR features.

Am I getting confused? In the chicken dataset for example, is it possible to combine the day 4 and day 7 samples with TAR features?

fw262 · 2021-06-21T13:32:46Z

Hi Dan,

Sorry for getting back to you so late.

Please note that there is in fact a common set of TAR features across all samples, but each sample will not have expression in all of the features. In your example, mat1 and mat2 have many features in common (i.e. rows 1,2,3,5), but mat1 has expression in the feature named "10_10106549_10110499_+505_0" while mat2 has expression in the feature named "10_10106399_10110949-_567_0". You can simply merge (with rbind, not just combine) your mat1 and mat2 dataframes if you want to merge your samples.

This can occur in scRNA-seq with standard annotations as well where some samples have expression of genes unique to that particular sample.

I hope this clears up the confusion.

Best,
Michael

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing TAR-scRNA-seq outputs #8

Processing TAR-scRNA-seq outputs #8

dkeitley commented Jun 10, 2021

fw262 commented Jun 10, 2021

dkeitley commented Jun 17, 2021

This comment has been minimized.

fw262 commented Jun 21, 2021

Processing TAR-scRNA-seq outputs #8

Processing TAR-scRNA-seq outputs #8

Comments

dkeitley commented Jun 10, 2021

fw262 commented Jun 10, 2021

dkeitley commented Jun 17, 2021

This comment has been minimized.

fw262 commented Jun 21, 2021