Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to estimate single-cell mutational burdens #65

Open
xyzheng123 opened this issue Sep 18, 2024 · 3 comments
Open

How to estimate single-cell mutational burdens #65

xyzheng123 opened this issue Sep 18, 2024 · 3 comments

Comments

@xyzheng123
Copy link

Hi,

Thank you for developing this tool. I would like to use it to estimate mutational burdens at both the cell-type and single-cell levels. While there is some discussion regarding mutational burden at the cell-type level, I am wondering if you could explain how single-cell mutational burdens can be estimated using SComatic.

In the original article, you mentioned: "To estimate single-cell mutational burdens, we divided the number of mutations detected in each unique cell by the number of sites with a sequencing depth of at least one read, within the set of callable sites across all cells of the same type."

I'm not entirely sure which outputs I should use or what steps are required to transform SComatic's outputs in order to obtain "the number of mutations detected in each unique cell" and "the number of sites with a sequencing depth of at least one read, within the set of callable sites across all cells of the same type."

Thanks,
Xiang

@ArthurDondi
Copy link
Contributor

Hi,

You should have a read at https://github.com/cortes-ciriano-lab/SComatic/blob/main/docs/OtherFunctionalities.md

There, you're interested in SingleCellGenotype.py for the mutational burden and SitesPerCell.py for the callable sites.

You can find how to run it here: https://github.com/cortes-ciriano-lab/SComatic/blob/main/docs/SComaticExample.md

For SingleCellGenotype.py, you should first filter and keep only the PASS mutations in the FILTER column of your BaseCellCalling.step2.tsv file.

Let me know if it worked for you!
Arthur

@xyzheng123
Copy link
Author

Hi @ArthurDondi,

Sorry for the late response, and thank you so much for your help. I was able to follow your steps and estimate the mutational burden at the single-cell level for the dataset I’m working with. I assumed that each unique cell barcode corresponds to a cell, and each row in SingleCellGenotype.py that passed the filter represents a mutation. After filtering, I calculated the occurrence of mutations for each barcoded cell. When I ranked them from the highest to the lowest occurrence, I found that the highest number of mutations was only 6 (out of 445,222 callable sites). I’m not sure if these numbers seem too low—do you have any insights on the typical magnitude of mutations (& callable sites) per cell?

Best,
Xiang

@ArthurDondi
Copy link
Contributor

You mean that the highest number of mutations present in a cell was 6? From how many PASS mutations total in BaseCellCalling.step2.tsv ? You can see that by running
awk -F'\t' '{if ($6 == "PASS") {print $0}}' output.step4.2.tsv > only_PASS_output.step4.2.tsv in the command line (and then count lines with wc -l only_PASS_output.step4.2.tsv)

In a recent analysis I had up to 85 mutations in a cell with 265011 reads, from a total of 341 PASS mutations, but the median was around 40 mutations per cell.

If I'm correct, your 445,222 callable sites are for the cell type, not individual cells, and for this cell type I had 99,771,226 callable sites, which is way more than you, so your 6 mutations in a single cell does not seem too bad. It depends how many PASS mutations you have to start with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants