Skip to content

PERF: investigate performance impact of iNaTs on tz-aware operations #24603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qwhelan opened this issue Jan 3, 2019 · 3 comments
Closed

PERF: investigate performance impact of iNaTs on tz-aware operations #24603

qwhelan opened this issue Jan 3, 2019 · 3 comments
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance Timezones Timezone data dtype

Comments

@qwhelan
Copy link
Contributor

qwhelan commented Jan 3, 2019

Problem description

Placeholder issue for use of np.searchsorted in pd._libs.tslibs.conversion.tz_convert_dst() introduced in #22181 and modified in #24491. The latter indicates that the iNaT optimization comes at high cost for notnull data, but it's unclear at what fraction null should we attempt to optimize by skipping the np.searchsorted() call entirely.

@jbrockmendel
Copy link
Member

@qwhelan thanks for opening this. Do you have any idea how expensive the searchsorted calls are? i.e. are we sure this is worth optimizing?

Some more context: we have a ~7 places in tslibs that do something like this:

    cdef ndarray[int64_t] pos
 
   trans, deltas, typ = get_dst_info(tz)

    for i in range(N):
        val = arr[i]
        if val == iNAT:
            result[i] = iNaT
        else:
            pos = trans.searchsorted(val, side='right') - 1
            result[i] = some_func_of(val + deltas[pos])

The alternative would be to do the trans.searchsorted call outside of the loop. So the tradeoff is more calls vs unnecessary calculations when there are many iNaT values

Some notes based on the last time I looked at this:

  • trans.searchsorted is a python-space call, so has more overhead than a C-space call would
  • cnp.PyArray_SearchSorted's signature does not seem to care about whether it takes a scalar or an array. Presumably there is a lower-level function that takes only the scalar. If we could call that instead, that'd shave some overhead.
  • At one point there was a problem with cython's numpy.pxd file not having the right signature for PyArray_SearchSorted, or NumPy changing the signature, or something. Not sure if this has been resolved. https://stackoverflow.com/questions/28184211/calling-pyarray-searchsorted-from-cython-3-or-4-arguments/28184301
  • If we were to make it one call up-front, there might be some mileage to be gained from the fact that often we are passing arrays known to be monotonic. To actually get that mileage would probably require implementing our own cython searchsorted, which I would not want to do unless profiling shows it would be a huge perf win.
  • If we were to go down that road, caching might also be useful since it is the same handful of trans arrays being searched repeatedly.
  • IIRC searchsorted is the last thing left in tslibs that makes us cimport ndarray. Without that, we might be able to use memoryviews more directly, which might have perf or compat upsides.

Big picture, we need to get a handle on how costly these searchsorted calls are.

@qwhelan
Copy link
Contributor Author

qwhelan commented Jan 3, 2019

@jbrockmendel Thanks for your notes - I agree on most points. A couple points of clarification regarding numpy's implementation:

@mroeschke mroeschke added Datetime Datetime data dtype Performance Memory or execution speed performance Timezones Timezone data dtype labels Jan 13, 2019
@jbrockmendel
Copy link
Member

The relevant code now uses bisect_right_i8, which is optimized about as far as it will go. closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants