-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whole Genome normalization question #12
Comments
Hi Rashesh The normalization is required when the signatures have been estimated from exome data and the mutation counts correspond to whole-genome data or viceversa. Please refer to the help of whichSignatures where this is explained in more detail. For the simulated data under the test directory, there isn't any need to normalize the data as they don't refer to actual exome/genome counts, but simply to a linear combination of the signatures. I am not sure I fully understand your question, though. |
Thank you Javier for the quick reply. |
Thank you Javier for the quick reply. Sorry, closed the issue by mistake. So I have 2 questions:
Sorry if I am a bit confusing, I am still not adept in simulations. The thing is I am testing a few signature tools and SigneR is also one of them. They provide a simulated dataset of 21 breast cancer tumors with and without opportunity. But they did not provide a truthset or a script. Since you guys have been helpful enough to provide details about the Simulation, I was wondering if I could generate a simulated dataset with opportunity. |
From the whichSignatures() help: "The method of normalization chosen should match how the input signatures were normalized. For exome data, the 'exome2genome' method is appropriate for the signatures included in this package. For whole genome data, use the 'default' method to obtain consistent results."
If you wanted to compare WGS and WES data, you can have a look at the issue #2 where this was discussed and assessed. Essentially, we looked at the result of using WGS data and compared that to the result of using only the mutations on the exome for the same real samples. |
Hi Javier, I was just looking at the COMIC site (http://cancer.sanger.ac.uk/cosmic/signatures ), they mention that "Mutational signatures are displayed and reported based on the observed trinucleotide frequency of the human genome, i.e., representing the relative proportions of mutations generated by each signature based on the actual trinucleotide frequencies of the reference human genome version GRCh37" So shouldn't that mean we should use 'genome' method while using WGS samples? Sorry for the repetitive question, I am just trying to make sure I understand the difference. Thanks, |
Hi,
I am a bit confused about when to use set the tri.counts.method to Genome.
For the 70-30 Simulation dataset with tri.counts.method should be default, right?
So If I want to test how the tool does with Genome normalization, which Simulated dataset should I use? Or is there a way to make the 70-30 Simulation dataset based on the tri.counts.genome
The text was updated successfully, but these errors were encountered: