-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the OverlapDetector to be generic over input feature types #34
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about performance of this PR. See #6 for an example of how we may want to test this.
I'll find a way to test that! I don't think there will be any performance issues using the OverlapDetector but it is possible building one will be a little slower since you need to check each object you add for a reference name property. |
@clintval that's the case I'm worried about. I wonder if we can avoid it somehow. |
@clintval I skimmed the PR briefly and wanted to suggest an alternative to see you have considered it and discarded it. The OverlapDetector is named for the same construct from HTSJDK ... and that version does things slightly differently. Beyond the fact that it can take any implementation of At it's simplest you can think of it as:
I.e. for every interval you associate the object you'd like returned when a query overlaps that interval. It fixes some, but not all, of your issues I think - primarily that you don't need to keep an external mapping of interval back to your source type. |
@tfenne I thought about that too but didn't give it a lot of thought because for large interval collections it seems it could ~2x your memory footprint by having a sibling object for every object you actually want to query. Does that seem right? |
I think that's right @clintval - more memory usage for that solution vs. likely more CPU for having to indirect function calls. |
@clintval another solution is to sub-class |
I like it! That is a good idea for how to use the existing classes and functions without the need for creating a dictionary. It would work well for custom genomic features. It would still require making an interval companion object for every genomic feature and wouldn't work for 3rd-party genomic features (where I cannot define the class hierarchy), but it's a little more elegant than what I've been doing. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #34 +/- ##
==========================================
+ Coverage 91.84% 95.25% +3.40%
==========================================
Files 8 8
Lines 552 674 +122
Branches 97 119 +22
==========================================
+ Hits 507 642 +135
+ Misses 28 18 -10
+ Partials 17 14 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I'm interested in seeing what others think given that the OverlapDetector isn't an ergonomic utility to use, especially for custom interval-types which I may often choose to implement myself, or have received from a third-party package/API.
Goals
Example with Existing API
The following demonstrates using a custom interval type with the OverlapDetector:
What makes this challenging to use is:
OverlapDetector
only supports storingInterval
objectsOverlapDetector
can only be queried withInterval
objectsBedRecord
doesn't work withOverlapDetector
to_interval()
converterfrom_bedrecord()
convertername
field ofInterval
and perform a dictionary lookup (as shown above). If you don't have a primary key, you will need to make one.Example with this PR
NB: although we are using an
Interval
to query the OverlapDetector, the type system is aware that we built the OverlapDetector withCustomPrimer
types and all overlap functions will always returnCustomPrimer
objects. A user could alternatively query the overlap detector with any other interval-like feature but still receiveCustomerPrimer
objects which overlap.NB: the BED spec notation BED3+1 is described here