-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display RNA/DNA modification probabilities at single read level. #1869
Comments
I have also raised a similar feature request in IGV igvteam/igv#945 |
I think probably this work would occur on our jbrowse 2 project, jbrowse 1 is sort of in maintenance mode (it's not super obvious from our readme but that development is at this repo https://github.com/gmod/jbrowse-components) I actually started trying this out awhile back in jbrowse 2 but it got tabled Demo using MM and ML tags on jbrowse 2 for multiple modification types with transparency indicating probabilities: Probably most commonly someone would just look at maybe one type and select that type but this was a fun demo to show multiple I think this is a great idea though... I'd say, it shall be probably be worked on! things like methplotlib are great references for this already working in the wild |
Sorry for the confusion and for missing the PR in the other repo 😖 Feel free to transfer the issue in the jbrowse-components repo. I am putting together a few test files containing "real" modification data obtained with Guppy and Megalodon. |
@cmdcolin is there a branch with prototype work on this? |
Yes see #1865 Needs a bit of work still but can ref the PR for todos |
I generated a BAM file with Megalodon containing reads aligned on the human chromosome 20 with around 5X of coverage. It contains predicted modification sites for both 5mC and 5hmC and follows the proposed SAM specification. Files (including the reference) can be downloaded from: https://nanoporetech.box.com/s/82pnw3lhusfs93s0vj7azxiz4x7kxabo I am also making other larger test files with Zymo community in which we spiked 5mC, so it contains a higher proportion of mods but not in CG context only. Hope this helps |
awesome, thanks for this...would it be ok if I rehost this in a demo instance on our s3 bucket? can provide some metadata to credit you as the source :) |
Yes, feel free to re-host it there with credit to Oxford Nanopore Technologies |
@a-slide thanks...just a note, I think this BAM may be generated on hg19 chr20 but the filename in the box link suggests it is chr20 on hg38 Loading on hg38, the BAM has a "C" mismatch where there is a C in the reference, and converting to CRAM using hg38 makes it pretty confused too just a note :) |
hmmm, maybe actually it is not hg19...i'll dig into this a little further |
made a different issue for our incorrect display of SNPs on this dataset, it is indeed hg38 we just have a bug |
Ok I think maybe I found the reason...I think the issue is that the MD tag was not generated correctly for the file IGV and tview may ignore this and hence display the correct stuff JBrowse generally uses trusts the MD tag as written so it displays SNPs wrong. JBrowse can also generate the MD tag if none is supplied in the file but in this case, since it is in the file, does not I regererated the MD tag with That appears to fix it for jbrowse at least... Do you think the upstream tool will be able to fix this? Just wondering whether jbrowse should distrust MD tag in general |
Hi Colin, |
No worry:) |
I saw a few when looking at the reads, but they are much rarer, and the model to predict them is not as good as the 5mC |
@a-slide I was trying to get a useful "Color by methylation" setting with this data which I figured was a little different from just coloring the modified bases (e.g. since drawing unmodified CpG also is important in addition to modified CpG). A bit inspired by igv stuff like http://software.broadinstitute.org/software/igv/interpreting_bisulfite_mode example screenshot, with the demo data you sent still needs a bit of work, but thought it would be of interest... |
Hi @cmdcolin, What about using a light to dark color scale for individual reads based on the ML tag like in the initial prototype ? Unmodified bases could still be represented I guess in the light side of the color scale. It could also be applied to the coverage track using the mean or median modification score for all reads aligned at a given position. We are working on a tools to generate bedmethyl files from a modBAM which could actually be used directly to show the overall methylation score (one file per modification). Hope it helps |
Sorry to not explain :) there is still the original coloring system proposed in the prototype still, I just thought it would be good (at least to my mind) to make a specialized methylation view so that you can see the unmodified CpG (which may not be indicated by the MM tag) so I tried to make this specialized mode scan for CpG and then color the unmodified ones blue and modified ones red (just binary) But, yes...we do still have the original code that colors all the modifications still, as a separate option (Color by -> "Modifications" vs Color by "Methylation") in the track menu |
Curious about the bedMethyl stuff too...we could try to think about how best to display those if just basic boxes wouldnt work |
Oh I see. Great idea then. I think most people will actually be more familiar with that type of binary display, even if it doesn't have the probability information. Weirdly, I cannot get the original mod color map display mode in the online interactive session you shared, but I am not very familiar with JBrowse so I might be doing something wrong. |
Actually I forgot that Megalodon does generate a BedMethyl file per modification. The tool in development I mentioned will be for Guppy direct mod base basecalling. The format is described here https://www.encodeproject.org/data-standards/wgbs/ |
Random update @a-slide I think I found maybe a factor in the calmd issue from earlier that could be of interest... It appears that after I ran calmd, then only the reads that are marked as "reverse strand" get marked up with tons of mismatches On the reads that are forward strand, the mismatches look ok This could have implications for whatever tool generated these reads. The reads that are on the reverse strand may need to have their seq field reverse complemented or something like this(???) All the reads on the forward strand have an ok number of mismatches, but all the reads on the reverse strand after running calmd have very high number of mismatches Here is a zoomed out view showing basically the pattern where about half the reads have very high proportion of mismatches (in coverage graph, half are grey indicating matching the ref, other half are misc colors coming from the reverse strand reads with high mismatch rate) |
Here is a share link showing both methylation and modifications-only mode (created a copy of the track, and added separate settings in both) In order to change the colorscheme manually, can use the "three dots" on the track label and navigate to Pileup settings->Color by->Methylation There is starting to be a lot of options so maybe this is due for a redesign hehe, but that works on the link Note that if you have files of your own on the web (can't open local files from computer currently) you can also add your own track in that URL |
Hi @cmdcolin. |
Thanks very much for the link to the interactive session. I am not quite sure I understood everything but my understanding is that the methylation display mode (blue/red) shows all the CpGs even if they are not in the MM tag (meaning fully unmethylated). However I found quite a few instances where the CpGs are displayed in the modification mode but not in the methylation one (see below). Is it because they don't fall on a reference CpG site ? |
yes...those are not on a reference CpG site if interested, I could draw those still though even if they don't fall on the reference CpG...or even draw them in a different color...still exploring what is sort of "expected" or the most useful way to depict this data, and also address how to display non-CpG methylation and hydroxymethylation and stuff also (in the Methylation mode with blue and red it only draws the conventional "m" modifications now...but should it draw other types too perhaps?) |
Apparently this was recently fixed in the last version of guppy aligner (4.5.4). |
I would say it is fine to display everything in the generic modification display mode. I think restricting to a specific reference sequence context should be done by the modification calling tool itself and not the genome browser. That said I think the CpG methylation display mode is very useful as it is. Not quite sure if this should be extended to other modifications. 5hmC maybe. For RNA it's going to be much more complicated as consensus motifs are loosely defined at best. Any thoughts @marcus1487 ? |
I would say that displaying all contexts would be best and leave any context constraints to the caller producing the file. Megalodon has a tool to filter the corresponding database by reference context. It might be worth adding a command to megalodon to perform this filtering on the modbase bam (I'll log an issue). In terms of what is expected, this is somewhat organism specific, so I would be opposed to coloring based on sequence context. My opinion would be to color based on modification type. Not sure if consistent colors can be coded for some common mods and then use a color wheel for any mods not known. I'm happy with any default convention here really. |
@marcus1487 thanks for the input! as far as the sequence context, this is a separate mode, and the generic "Color by modifications" mode should be ok. I made it so that we can show the color mapping now that was chosen, and could probably allow user selectable color picker or pre-configured config file. we merged this to our main branch pending release now, will probably be more refinements but excited thus far :) |
this will probably get released shortly. may close issue for now but look forward to seeing more data etc. can post to new issues or discussion threads (https://github.com/GMOD/jbrowse-components/discussions) to continue the conversation! |
@a-slide @cmdcolin found this issue when trying to figure out how to display a bedMethyl track. It was derived from ONT Format is here https://github.com/nanoporetech/modkit#bedmethyl-column-descriptions and this is a head of the file:
|
@nextgenusfs good question. the existing work on modifications so far has been displaying the data directly on the reads and not on bedMethyl but it would be good to make a bedMethyl example as well that can probably be loaded as a plain bed tabix or bigbed file, but would be expected to display in some special way? could definitely make a new issue for tracking |
Okay, I'll make a new issue -- in this repo correct? |
yep that'd be great :) thanks |
Hi @GMOD team,
The fields of epigenetics and now epitranscriptomics have evolved very fast in recent years and there are now a few DNA and RNA modifications which can be detected at single molecule resolution, in particular with Oxford Nanopore sequencing.
In order to avoid the proliferation of non-standard formats to store these modifications there was a long discussion with the samtools team on how to represent all modifications within SAM/BAM format (samtools/hts-specs#362), leading to the proposition to use additional flags in the specification to store both modification positions (Mm) and probabilities (Ml) at read level. There can also be multiple modification tracks included within a single read. Here is the link to the Pull Request currently being considered: samtools/hts-specs#418. Hopefully the PR will be merged soon and will be part of the official SAM specification.
Several Oxford Nanopore tools to detect RNA modifications already implement the specification ahead of the official release, including Megalodon (https://github.com/nanoporetech/megalodon) and Guppy basecaller.
An ideal implementation in Jbrowse would display 1 track per modification and per read showing the probability of a modification presence as a color scale, or maybe multiple modification overlaid with different color scales. There is another old issue GMOD/jbrowse#509 discussing a way to display methylation information, but this is per position and not per read.
Does that sound like it could be easily implemented as a Jbrowse plugin ?
The text was updated successfully, but these errors were encountered: