Skip to content

Support for finding regions overlapped by two different IntervalTrees #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions intervaltree/intervaltree.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,17 @@ def intersection_update(self, other):
if iv not in other:
self.remove(iv)

def overlap_intervals(self,other):
"""
Returns a new IntervalTree consisting of intervals representing the
regions overlapped by at least one interval in both of self and other.
"""
splits = (self | other)
splits.split_overlaps()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting the overlaps is currently very inefficient, O(n^2logn) it should be possible to calculate interval intersections without this step.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am sure you are right ... but it goes beyond my expertise to implement it using the internals, with the time I have available in the near future. I am happy for you not to merge this change if the performance worries you - I only offered it because I am using it very often myself due to porting a lot of code over from other libraries which have this function, and in practice it works for me well enough that I felt it might be useful to contribute.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaimleib Hello, I am really interested in this method.
Do you have any hints for finding interval intersections without splitting the overlaps?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating through the boundary tables and testing with a.overlaps(point) and b.overlaps(point) should yield worst case O( (n+m)*log(n+m) ).

self_int_other = IntervalTree(filter(lambda r: self.overlaps(r) and other.overlaps(r), splits))
self_int_other.merge_overlaps()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that the overlaps have to be merged? This causes a new tree to be created a second time, which is O(nlogn).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly, I was after API compatibility with the R functions that do the same thing (in this case, IRanges intersect() function - unfortunately 'intersect' is already used with different semantics in this library). My own use case for this kind of function always involves merging because I want to do computations on the parts in common between two sets of ranges, and I rarely want to do that computation on the same interval twice, so merging makes sense. I can imagine there could be uses where that's not the case. I tend to port a lot of code b/w R and python so having something that behaves the same way between the two was useful to me.

return self_int_other

def symmetric_difference(self, other):
"""
Return a tree with elements only in self or other but not
Expand Down