Downsampling of reads #213

cwuensch · 2017-02-07T16:58:51Z

We are using Biodalliance genome browser with high-coverage bam-files (up to 10,000 reads per base pair).
By default, the limit of reads to be displayed is set to 100. (And there has to be a limit, because it gets terribly slow, if not)
Problem is, the genome browser seems to just take the first 100 reads then. In a recent case, there was not one singe read displayed for the locus in question, but only reads which started right from the current position. In other cases you may have only wildtype-reads being displayed while the mutated ones get clipped.
Could you implement some form of statistical downsampling? e.g. selecting the reads to be displayed per random? Or just taking every 100th read or something like that?

dasmoth · 2017-02-07T21:08:54Z

There isn't a built in option for this, but if you configure your tracks programatically, you can do this via a plugin.

Something like:

function readDownsampler(featureSets) {
    const reads = featureSets[0];
    const sampledReads = [];
    for (var i = 0; i < reads.length; i += 10) {
         sampledReads.push(reads[i]);
    }
    return sampledReads;
}

...then configure your source with...

{
    name: 'Downsample test',
    bamURI: '/path/to/data.bam',
    merge: readDownsampler
}```

A slight downside is that if you return to almost but not quite the same region of the genome, you'll end up seeing a different subset of the reads.  If this matters, it might be better to sample based on,  e.g., MD5 of the read ID instead.

Having said all that, I think it's a great idea to have something along these lines this built into the core -- so will leave this issue open for now.

cwuensch · 2017-02-14T15:42:35Z

Thank you for this great solution!!
Unfortunately something goes wrong here...
When I copy this code exactly as described here, the function readDownsampler never gets called (I inserted some debug log output - which never gets printed).
When I write it with brackets, i.e. merge: readDownsampler(), then the function gets called, but featureSets is undefined.
What to do about this issue?

cwuensch · 2017-02-15T10:40:22Z

And another question: Is it possible to access the user-defined variable limit from within this function?
With this the downsampling could be adapted to the limit of reads to be shown, as defined by the user in the config dialog.

dasmoth · 2017-02-15T17:08:27Z

Sorry for the confusion -- the example I sent is something that really ought to work, but currently doesn't because of the way two features (combining multiple data sources in one track, and applying arbitrary filters to data) are coupled together.

The following version is actually tested :-)

{
    name: 'Downsample test',
    overlay: [{bamURI: '/path/to/data.bam'}],
    merge: readDownsampler
}

(The readDownsampler function itself is fine). I'm going to tweak things so that the example as I originally wrote it does actually work -- but might not happen right away.

cwuensch · 2017-02-16T11:10:17Z

Great! This solution works indeed for filtering the read data.

But... sorry that I have to ask questions again...

We have use this in combination this with
(a) a bam index file (bai)
(b) a style sheet configuration that enables "Highlight mismatches and strands" by default
(c) a readDownsampler() function that considers the user configured read limit

I could not figure out by now, where to correctly place the style sheet information in order to work correctly with the overlay command. When we put it below the merge command, the checkbox "Highlight mismatches and strands" gets checked, but the corresponding style seems not to be applied. Even with manually unchecking and re-checking the checkbox, the style gets not applied.
Do you have an idea, how to solve this?

Additionally, is there a possibility to read out the user configured read limit in order to use it in the downsampling-function?

dasmoth · 2017-02-18T17:50:42Z

Re: mismatch colouring...

Thanks for spotting this. It sounds like your config is fine, but some logic that's used to determine whether reference sequence data needs to be threaded through to a given track's renderer was failing when your custom filter was applied. This has been fixed in the git-latest version.

Re: user-configurability of the the custom filter.

Do you want to be able to configure this at run time (via a custom field in the track editor). Currently no way of doing this, but I'd certainly agree it would be nice!

cwuensch · 2017-02-21T16:58:03Z

Thanks again!

Re: mismatch colouring...
After having built the latest version from git, the mismatch colouring actually works fine.
BUT, there has appeared some new issue with the latest version...
The "cursor" indicating the middle position gets not correctly positioned after applying the API function SetLocation().
Furthermore, the cursor "jumps" back and forward, when the user opens the configuration dialog. That is kind of ... weird.

Re: custom filter
Actually I do not really need to let the user configure the read downsampling limit in the track editor.
BUT, there already IS a field "limit" in the track editor, which is pre-configured with 100, and which will be out-of-function, if we cannot read out its value from the downsampling function.
Furthermore, the limit of 100 is not very suitable for us. If it is a hard limit that cannot be changed by the user anymore, then it would be nice, if we could at least pre-configure it with a higher value, like 500.
Does there exist a style-option for changing this value?

dasmoth · 2017-02-22T20:45:54Z

I'm concerned about what you say regarding the "cursor" (do you mean the vertical position indicator in the middle of the browser area). Could you send a screenshot or two to illustrate this (offline to thomas.a.down@gmail.com is fine if you prefer).

Regarding the "bumping limit", it can be configured as a top level (not stylesheet) option on a track configuration:

        {
               name: 'my track',
               bamURI: '...',
               subtierMax: 500
         }

cwuensch · 2017-02-28T16:18:40Z

Thanks for the solution to increase the bumping limit!

Regarding the "cursor":
Right, I am talking about the position indicator in the middle of the browsing area.

1.) When I change the position to be displayed via the API SetLocation(), the cursor gets displayed at a wrong location (and I think with a wrong width):

2.) Opening the configuration panel causes the cursor to jump to the left (seems the panel's width gets subtracted from the browser's width, and the cursor is rendered in the middle of the reduced with.

3.) Closing the config panel causes the cursor to be finally displayed at the correct position.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downsampling of reads #213

Downsampling of reads #213

cwuensch commented Feb 7, 2017

dasmoth commented Feb 7, 2017

cwuensch commented Feb 14, 2017

cwuensch commented Feb 15, 2017

dasmoth commented Feb 15, 2017 •

edited

Loading

cwuensch commented Feb 16, 2017

dasmoth commented Feb 18, 2017

cwuensch commented Feb 21, 2017

dasmoth commented Feb 22, 2017

cwuensch commented Feb 28, 2017 •

edited

Loading

Downsampling of reads #213

Downsampling of reads #213

Comments

cwuensch commented Feb 7, 2017

dasmoth commented Feb 7, 2017

cwuensch commented Feb 14, 2017

cwuensch commented Feb 15, 2017

dasmoth commented Feb 15, 2017 • edited Loading

cwuensch commented Feb 16, 2017

dasmoth commented Feb 18, 2017

Re: mismatch colouring...

Re: user-configurability of the the custom filter.

cwuensch commented Feb 21, 2017

dasmoth commented Feb 22, 2017

cwuensch commented Feb 28, 2017 • edited Loading

dasmoth commented Feb 15, 2017 •

edited

Loading

cwuensch commented Feb 28, 2017 •

edited

Loading