-
Notifications
You must be signed in to change notification settings - Fork 115
Support for finding regions overlapped by two different IntervalTrees #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -437,6 +437,17 @@ def intersection_update(self, other): | |
if iv not in other: | ||
self.remove(iv) | ||
|
||
def overlap_intervals(self,other): | ||
""" | ||
Returns a new IntervalTree consisting of intervals representing the | ||
regions overlapped by at least one interval in both of self and other. | ||
""" | ||
splits = (self | other) | ||
splits.split_overlaps() | ||
self_int_other = IntervalTree(filter(lambda r: self.overlaps(r) and other.overlaps(r), splits)) | ||
self_int_other.merge_overlaps() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason that the overlaps have to be merged? This causes a new tree to be created a second time, which is O(nlogn). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mainly, I was after API compatibility with the R functions that do the same thing (in this case, IRanges intersect() function - unfortunately 'intersect' is already used with different semantics in this library). My own use case for this kind of function always involves merging because I want to do computations on the parts in common between two sets of ranges, and I rarely want to do that computation on the same interval twice, so merging makes sense. I can imagine there could be uses where that's not the case. I tend to port a lot of code b/w R and python so having something that behaves the same way between the two was useful to me. |
||
return self_int_other | ||
|
||
def symmetric_difference(self, other): | ||
""" | ||
Return a tree with elements only in self or other but not | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting the overlaps is currently very inefficient, O(n^2logn) it should be possible to calculate interval intersections without this step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am sure you are right ... but it goes beyond my expertise to implement it using the internals, with the time I have available in the near future. I am happy for you not to merge this change if the performance worries you - I only offered it because I am using it very often myself due to porting a lot of code over from other libraries which have this function, and in practice it works for me well enough that I felt it might be useful to contribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chaimleib Hello, I am really interested in this method.
Do you have any hints for finding interval intersections without splitting the overlaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating through the boundary tables and testing with
a.overlaps(point) and b.overlaps(point)
should yield worst case O( (n+m)*log(n+m) ).