Whole Genome normalization question #12

Rashesh7 · 2017-03-08T17:59:35Z

Hi,

I am a bit confused about when to use set the tri.counts.method to Genome.
For the 70-30 Simulation dataset with tri.counts.method should be default, right?

So If I want to test how the tool does with Genome normalization, which Simulated dataset should I use? Or is there a way to make the 70-30 Simulation dataset based on the tri.counts.genome

jherrero · 2017-03-08T18:31:29Z

Hi Rashesh

The normalization is required when the signatures have been estimated from exome data and the mutation counts correspond to whole-genome data or viceversa. Please refer to the help of whichSignatures where this is explained in more detail.

For the simulated data under the test directory, there isn't any need to normalize the data as they don't refer to actual exome/genome counts, but simply to a linear combination of the signatures.

I am not sure I fully understand your question, though.

Rashesh7 · 2017-03-08T18:40:03Z

Thank you Javier for the quick reply.

Rashesh7 · 2017-03-08T18:51:29Z

Thank you Javier for the quick reply. Sorry, closed the issue by mistake.

So I have 2 questions:

If I would be using any VCF file with Somatic mutations from a WGS sample , would I need to normalize using tri.counts.method as 'genome' ?
As you mentioned, the simulated data is a linear combination of the signatures, Is there a way to generate a simulated data mimicking real data (basically considering the Tri nucleotide counts)?

Sorry if I am a bit confusing, I am still not adept in simulations. The thing is I am testing a few signature tools and SigneR is also one of them. They provide a simulated dataset of 21 breast cancer tumors with and without opportunity. But they did not provide a truthset or a script. Since you guys have been helpful enough to provide details about the Simulation, I was wondering if I could generate a simulated dataset with opportunity.

jherrero · 2017-03-08T19:03:24Z

From the whichSignatures() help: "The method of normalization chosen should match how the input signatures were normalized. For exome data, the 'exome2genome' method is appropriate for the signatures included in this package. For whole genome data, use the 'default' method to obtain consistent results."

No, use the "default" method, which leaves the proportions unchanged (you are comparing WGS data to signatures calculated for whole-genome data).
It depends on what you mean by that. The simulations in the test directory do create simulated counts of mutations in their context based on the signatures. Each simulated sample will have approximately 500 mutations in different tri-nucleotide contexts based on the probabilities defined by the signatures. Potentially you could simulated VCF files by generating mutations in the whole genome based on those signatures, but you would end up having the same result.

If you wanted to compare WGS and WES data, you can have a look at the issue #2 where this was discussed and assessed. Essentially, we looked at the result of using WGS data and compared that to the result of using only the mutations on the exome for the same real samples.

Rashesh7 · 2017-06-15T18:47:30Z

Hi Javier,

I was just looking at the COMIC site (http://cancer.sanger.ac.uk/cosmic/signatures ), they mention that "Mutational signatures are displayed and reported based on the observed trinucleotide frequency of the human genome, i.e., representing the relative proportions of mutations generated by each signature based on the actual trinucleotide frequencies of the reference human genome version GRCh37"

So shouldn't that mean we should use 'genome' method while using WGS samples?

Sorry for the repetitive question, I am just trying to make sure I understand the difference.

Thanks,
Rashesh

Rashesh7 closed this as completed Mar 8, 2017

Rashesh7 reopened this Mar 8, 2017

jherrero closed this as completed Mar 8, 2017

Rashesh7 mentioned this issue Jun 20, 2017

More normalization #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whole Genome normalization question #12

Whole Genome normalization question #12

Rashesh7 commented Mar 8, 2017

jherrero commented Mar 8, 2017

Rashesh7 commented Mar 8, 2017

Rashesh7 commented Mar 8, 2017

jherrero commented Mar 8, 2017

Rashesh7 commented Jun 15, 2017

Whole Genome normalization question #12

Whole Genome normalization question #12

Comments

Rashesh7 commented Mar 8, 2017

jherrero commented Mar 8, 2017

Rashesh7 commented Mar 8, 2017

Rashesh7 commented Mar 8, 2017

jherrero commented Mar 8, 2017

Rashesh7 commented Jun 15, 2017