Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have .add(value) return the index for value #6

Closed
riggs opened this issue Jun 22, 2014 · 3 comments
Closed

Have .add(value) return the index for value #6

riggs opened this issue Jun 22, 2014 · 3 comments

Comments

@riggs
Copy link

riggs commented Jun 22, 2014

It would be very useful for the .add methods on both the SortedList & SortedSet classes to return the new index at which the added item exists in the logical array.

@grantjenks
Copy link
Owner

Can you describe your use case in more detail?

The .add method doesn't compute the new index at which the item was added. Adding this functionality would decrease the performance of the current method.

Returning that index would be pretty similar to:

def add(self, val):
    self._original_add(val)
    return self.index(val)

You might simply monkey-patch the add method yourself to suit your needs. I could also imagine others wanting the added value returned or the list/set itself.

@riggs
Copy link
Author

riggs commented Jun 25, 2014

The use case is timeseries-sorted data. I have multiple data streams, each
containing multiple data points, and I want to combine them simply without
storing the timestamp with each piece of data.

Also, looking at the code for your library, I don't think this would impact
performance. The call to bisect.insort is effectively a call to
bisect.bisect followed by list.insert. All that would be required would be
making these two calls directly rather than via insort and doing a bit of
math to adjust the index for which 'bin' it's in.

@grantjenks
Copy link
Owner

This seems like a pretty ideal case for subclassing and getting the behavior you want:

from sortedcontainers import SortedList as OriginalSortedList

class SortedList(OriginalSortedList):
    def add(self, val):
        super(SortedList, self).add(val)
        return self.index(val)

The insort function is also about 20% faster than bisect followed by insert:

>>> from timeit import timeit
>>> def test(command, setup):
...     times = [timeit(command, setup=setup, number=1) for rpt in xrange(99)]
...     times.sort()
...     avg = sum(times) / len(times)
...     return 'median:', times[49], 'average:', avg
>>> test('insort(l, 42)', 'from bisect import insort; l = list(xrange(100))')
('median:', 9.5367431640625e-07, 'average:', 1.2715657552083333e-06)
>>> test('pos = bisect(l, 42); l.insert(pos, 42)', 'from bisect import bisect; l = list(xrange(100))')
('median:', 1.1920928955078125e-06, 'average:', 1.625581221147017e-06)

Also, in most cases it should be just "a bit of math to adjust the index" if the _index cache is valid. But in case it's not, iterating the lengths of the segments is required. This can be a slow process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants