Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very low number of matched variants when comparing known SNVs from several studies with the results obtained by SComatic when applied to the same data #45

Open
FranSoriano opened this issue Jan 8, 2024 · 4 comments

Comments

@FranSoriano
Copy link

Hello everyone,

We are interested in using SComatic to detect single-nucleotide mutations in single-cell data, for which we have tested the performance of the tool using scRNA-seq open data from two studies (PMIDs 35140215 and 32415257), which also have available WES data. We have compared the SNVs reported by those studies with the results from SComatic and detected a very low number of matching variants (ranging from 0% to 5%). Why is the percentage so small? Is this a normal/expected rate?
Previously to this analysis we used the tool following the tutorial that appears in the repository and everything worked correctly, so a priori we believe that it is not due to an error on our part in the execution.

Thank you.

@Francesc-Muyas
Copy link
Collaborator

Dear user,
Thanks for using SComatic. Could you please put the number of mutations detected in a WES sample and the matched scRNA-seq sample?

And secondly, could you run this command and put the output here?

awk '$1 ~ /^#/ || $6 == "PASS"'  file.step2.tsv | grep -v '^#' | awk -v OFS=">" '{print $4,$5}' | sort | uniq -c

Thanks,
Fran

@FranSoriano
Copy link
Author

Dear Fran,

The number of mutations (without including frameshift mutations) reported in WES samples that we examined, and the matched sc-RNAseq mutations detected by SComatic, respectively, are 48/0, 48/1, 48/1, 28/1, 28/0, 28/0, 24/1, 24/0, 24/0, 93/5, 382/0, and 281/0.

Here is the output of the command for one of the files:

  6 A>C
 46 A>G
  5 A>T
 11 C>A
  5 C>G
 23 C>T
 28 G>A
  3 G>C
  6 G>T
  2 T>A
 31 T>C
  4 T>G

Hope this sheds some light into the issue.

Thanks.

@Francesc-Muyas
Copy link
Collaborator

Hi,
Could you please check the coordinates of some of the expected (WES) mutations in the output of the Step4.1 ? The column FILTER should say the reason because they were (or not) filtered. If you do not find these coordinates in the file, it means that there were not enough reads covering such sites and they were not interrogated.

Thanks,
Fran

@FranSoriano
Copy link
Author

Hi, Fran,

The low number of reads covering the sites may indeed be the reason why these variants are not called as expected. However, we are using the default threshold, and we guess a lower minimum coverage to consider a genomic site would be too low. We will have that in mind when doing our analyses.

Thank you so much for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants