Skip to content

BUG: Avoid undefined behaviour when converting from float to timedelta #28918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 12, 2019
Merged

BUG: Avoid undefined behaviour when converting from float to timedelta #28918

merged 1 commit into from
Oct 12, 2019

Conversation

siddhesh
Copy link
Contributor

@siddhesh siddhesh commented Oct 11, 2019

Summation of timedelta series with NaTs in them result in undefined
behaviour because the final wrapping step of the summation ends up
converting the NaNs in the sum through a direct cast to int64. This
cast is undefined for NaN and just happens to work on x86_64 because
of the way cvttd2si works. On Aarch64, the corresponding fcvtzs sets
the result to 0 on undefined input.

This fix trivially sets the conversion target to m8 instead of i8 so
that numpy correctly casts from NaN to NaT. Note that the fix in
numpy for the same is pending in PR numpy/numpy#14669 .

There is an existing test (test_sum_nanops_timedelta in
frame/test_analytics.py) that exercises this bug and has been verified
to have been fixed with this and the numpy patch.

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jreback jreback added the Timedelta Timedelta data type label Oct 11, 2019
@jreback
Copy link
Contributor

jreback commented Oct 11, 2019

There is an existing test (test_sum_nanops_timedelta in
frame/test_analytics.py) that exercises this bug and has been verified
to have been fixed with this and the numpy patch.

is there a test that would fail currently on our CI and pass with this patch? or is this just platform specific?

@jreback
Copy link
Contributor

jreback commented Oct 11, 2019

cc @jbrockmendel

@jreback jreback added this to the 1.0 milestone Oct 11, 2019
@siddhesh
Copy link
Contributor Author

There is an existing test (test_sum_nanops_timedelta in
frame/test_analytics.py) that exercises this bug and has been verified
to have been fixed with this and the numpy patch.

is there a test that would fail currently on our CI and pass with this patch? or is this just platform specific?

It looks like all of your CI tests are x86-only, so you won't see this issue. If you run the testsuite on aarch64 though, you'll see test_sum_nanops_timedelta fail without this patch and pass with it.

@@ -360,7 +360,7 @@ def _wrap_results(result, dtype, fill_value=None):

result = tslibs.Timedelta(result, unit="ns")
else:
result = result.astype("i8").view(dtype)
result = result.astype("m8").view(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would making this m8[ns] break things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't. I've started a test run on my arm box with it.

@jbrockmendel
Copy link
Member

If you run the testsuite on aarch64 though

Do azure or travis offer such a thing? (Or anyone else?)

@siddhesh
Copy link
Contributor Author

If you run the testsuite on aarch64 though

Do azure or travis offer such a thing? (Or anyone else?)

I believe Travis does now, as of just 4 days ago:

https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support

Summation of timedelta series with NaTs in them result in undefined
behaviour because the final wrapping step of the summation ends up
converting the NaNs in the sum through a direct cast to int64.  This
cast is undefined for NaN and just happens to work on x86_64 because
of the way cvttd2si works.  On Aarch64, the corresponding fcvtzs sets
the result to 0 on undefined input.

This fix trivially sets the conversion target to m8 instead of i8 so
that numpy correctly casts from NaN to NaT.  Note that the fix in
numpy for the same is pending in PR #numpy/numpy/14669 .

There is an existing test (test_sum_nanops_timedelta in
frame/test_analytics.py) that exercises this bug and has been verified
to have been fixed with this and the numpy patch.
@jreback jreback merged commit c387a28 into pandas-dev:master Oct 12, 2019
@jreback
Copy link
Contributor

jreback commented Oct 12, 2019

thanks @siddhesh

would be interested in adding a travis build on arm? (can you open a separate issue to track).

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
pandas-dev#28918)

Summation of timedelta series with NaTs in them result in undefined
behaviour because the final wrapping step of the summation ends up
converting the NaNs in the sum through a direct cast to int64.  This
cast is undefined for NaN and just happens to work on x86_64 because
of the way cvttd2si works.  On Aarch64, the corresponding fcvtzs sets
the result to 0 on undefined input.

This fix trivially sets the conversion target to m8 instead of i8 so
that numpy correctly casts from NaN to NaT.  Note that the fix in
numpy for the same is pending in PR #numpy/numpy/14669 .

There is an existing test (test_sum_nanops_timedelta in
frame/test_analytics.py) that exercises this bug and has been verified
to have been fixed with this and the numpy patch.
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
pandas-dev#28918)

Summation of timedelta series with NaTs in them result in undefined
behaviour because the final wrapping step of the summation ends up
converting the NaNs in the sum through a direct cast to int64.  This
cast is undefined for NaN and just happens to work on x86_64 because
of the way cvttd2si works.  On Aarch64, the corresponding fcvtzs sets
the result to 0 on undefined input.

This fix trivially sets the conversion target to m8 instead of i8 so
that numpy correctly casts from NaN to NaT.  Note that the fix in
numpy for the same is pending in PR #numpy/numpy/14669 .

There is an existing test (test_sum_nanops_timedelta in
frame/test_analytics.py) that exercises this bug and has been verified
to have been fixed with this and the numpy patch.
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
pandas-dev#28918)

Summation of timedelta series with NaTs in them result in undefined
behaviour because the final wrapping step of the summation ends up
converting the NaNs in the sum through a direct cast to int64.  This
cast is undefined for NaN and just happens to work on x86_64 because
of the way cvttd2si works.  On Aarch64, the corresponding fcvtzs sets
the result to 0 on undefined input.

This fix trivially sets the conversion target to m8 instead of i8 so
that numpy correctly casts from NaN to NaT.  Note that the fix in
numpy for the same is pending in PR #numpy/numpy/14669 .

There is an existing test (test_sum_nanops_timedelta in
frame/test_analytics.py) that exercises this bug and has been verified
to have been fixed with this and the numpy patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants