You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran a preliminary experiment to see if the all contexts methylation models work in an unusual system. Basically, I amplified the whole genome to remove any marks, then treated samples with either water, AluI methyltransferase (5mC), or EcoRI methyltransferase (6mA) and then rapid barcoded and sequenced. Samples were basecalled with supv5.0.0 models.
When I run modkit summary, all of my samples have similar levels of 6mA detected, and slightly different threshold values.
wga-AluI.bam
> sampling 10042 reads from BAM
> calculating threshold at 10% percentile
> calculated thresholds: A: 0.8144531 C: 0.734375
base code pass_count pass_frac all_count all_frac
A - 11871547 0.98895884 12893560 0.9683128
A a 132539 0.011041157 421931 0.03168723
C - 6307226 0.9691437 6870735 0.951314
C h 9119 0.00140119 37663 0.0052147754
C m 191695 0.029455105 313965 0.04347123
wga-Eco.bam
> sampling 10042 reads from BAM
> calculating threshold at 10% percentile
> calculated thresholds: A: 0.8183594 C: 0.734375
base code pass_count pass_frac all_count all_frac
A - 15070288 0.98938465 16412277 0.97023183
A a 161693 0.010615363 503553 0.029768152
C - 8202646 0.99656403 8944574 0.9780853
C h 11379 0.0013824689 47668 0.005212475
C m 16902 0.0020534745 152742 0.01670227
wga-water.bam
> sampling 10042 reads from BAM
> calculating threshold at 10% percentile
> calculated thresholds: A: 0.8261719 C: 0.75390625
base code pass_count pass_frac all_count all_frac
A - 14412553 0.99083996 15703601 0.9718222
A a 133240 0.009160037 455323 0.028177805
C - 7866807 0.9970703 8592602 0.980716
C h 9278 0.0011759305 43996 0.0050214804
C m 13837 0.0017537562 124962 0.014262528
I have two questions: for samples that were sequenced together, is it appropriate to use the same thresholds? Is there a way, similar to modkit validate to use another sample as a ground truth?
I plan to look at position-specific differential methylation next, but want to be sure I understand the rationale behind thresholds. Because there is a known motif for both samples relative to the negative control, would it be best to calculate position-specific thresholds?
The text was updated successfully, but these errors were encountered:
It is often a good idea to use the same thresholds for a larger experiment, especially when the estimated thresholds are so close like this. When applying the thresholds to new samples though it is important to note the fraction of calls filtered. When this value is large it can vastly skew downstream results and is often an indication of a poor quality run (due to contamination, very high mod content or other different run conditions).
Is there a way, similar to modkit validate to use another sample as a ground truth?
I'm not quite sure what you mean here. If you want to use the combination of these samples in order to estimate the thresholds you can do that by merging the bam files (potentially with some balancing). The modkit validate command is generally intended for application to strands where the modified status of each read in the sample will be absolutely known at a particular reference position (primarily from synthetically printed strands). The command will run on any file presented, but the results may not represent valid results.
would it be best to calculate position-specific thresholds?
We would generally advise against position specific thresholds. Users are free to explore this option (probably via the modkit extract output), but position specific thresholds are not implemented in any modkit commands, so application of these would have to be handrolled by the user.
I hope this helps and please let me know if further clarification would help with your analysis.
Hello,
I ran a preliminary experiment to see if the all contexts methylation models work in an unusual system. Basically, I amplified the whole genome to remove any marks, then treated samples with either water, AluI methyltransferase (5mC), or EcoRI methyltransferase (6mA) and then rapid barcoded and sequenced. Samples were basecalled with
supv5.0.0
models.When I run modkit summary, all of my samples have similar levels of 6mA detected, and slightly different threshold values.
I have two questions: for samples that were sequenced together, is it appropriate to use the same thresholds? Is there a way, similar to
modkit validate
to use another sample as a ground truth?I plan to look at position-specific differential methylation next, but want to be sure I understand the rationale behind thresholds. Because there is a known motif for both samples relative to the negative control, would it be best to calculate position-specific thresholds?
The text was updated successfully, but these errors were encountered: