How to estimate single-cell mutational burdens #65

xyzheng123 · 2024-09-18T17:55:32Z

Hi,

Thank you for developing this tool. I would like to use it to estimate mutational burdens at both the cell-type and single-cell levels. While there is some discussion regarding mutational burden at the cell-type level, I am wondering if you could explain how single-cell mutational burdens can be estimated using SComatic.

In the original article, you mentioned: "To estimate single-cell mutational burdens, we divided the number of mutations detected in each unique cell by the number of sites with a sequencing depth of at least one read, within the set of callable sites across all cells of the same type."

I'm not entirely sure which outputs I should use or what steps are required to transform SComatic's outputs in order to obtain "the number of mutations detected in each unique cell" and "the number of sites with a sequencing depth of at least one read, within the set of callable sites across all cells of the same type."

Thanks,
Xiang

ArthurDondi · 2024-09-19T11:48:58Z

Hi,

You should have a read at https://github.com/cortes-ciriano-lab/SComatic/blob/main/docs/OtherFunctionalities.md

There, you're interested in SingleCellGenotype.py for the mutational burden and SitesPerCell.py for the callable sites.

You can find how to run it here: https://github.com/cortes-ciriano-lab/SComatic/blob/main/docs/SComaticExample.md

For SingleCellGenotype.py, you should first filter and keep only the PASS mutations in the FILTER column of your BaseCellCalling.step2.tsv file.

Let me know if it worked for you!
Arthur

xyzheng123 · 2024-10-03T16:47:20Z

Hi @ArthurDondi,

Sorry for the late response, and thank you so much for your help. I was able to follow your steps and estimate the mutational burden at the single-cell level for the dataset I’m working with. I assumed that each unique cell barcode corresponds to a cell, and each row in SingleCellGenotype.py that passed the filter represents a mutation. After filtering, I calculated the occurrence of mutations for each barcoded cell. When I ranked them from the highest to the lowest occurrence, I found that the highest number of mutations was only 6 (out of 445,222 callable sites). I’m not sure if these numbers seem too low—do you have any insights on the typical magnitude of mutations (& callable sites) per cell?

Best,
Xiang

ArthurDondi · 2024-10-04T13:57:15Z

You mean that the highest number of mutations present in a cell was 6? From how many PASS mutations total in BaseCellCalling.step2.tsv ? You can see that by running
awk -F'\t' '{if ($6 == "PASS") {print $0}}' output.step4.2.tsv > only_PASS_output.step4.2.tsv in the command line (and then count lines with wc -l only_PASS_output.step4.2.tsv)

In a recent analysis I had up to 85 mutations in a cell with 265011 reads, from a total of 341 PASS mutations, but the median was around 40 mutations per cell.

If I'm correct, your 445,222 callable sites are for the cell type, not individual cells, and for this cell type I had 99,771,226 callable sites, which is way more than you, so your 6 mutations in a single cell does not seem too bad. It depends how many PASS mutations you have to start with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to estimate single-cell mutational burdens #65

How to estimate single-cell mutational burdens #65

xyzheng123 commented Sep 18, 2024

ArthurDondi commented Sep 19, 2024

xyzheng123 commented Oct 3, 2024

ArthurDondi commented Oct 4, 2024

How to estimate single-cell mutational burdens #65

How to estimate single-cell mutational burdens #65

Comments

xyzheng123 commented Sep 18, 2024

ArthurDondi commented Sep 19, 2024

xyzheng123 commented Oct 3, 2024

ArthurDondi commented Oct 4, 2024