Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Timestamp and Timedelta .value changing in 2.0 #50891

Merged
merged 24 commits into from
Feb 9, 2023

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Jan 20, 2023

@MarcoGorelli
Copy link
Member Author

failed test:

FAILED pandas/tests/io/json/test_pandas.py::TestPandasContainer::test_date_index_and_values[date-True-epoch] - assert '{"1577836800...null,"a":"a"}' == '{"1577836800...null,"a":"a"}'
  - {"1577836800000":1577836800000,"null":null,"a":"a"}
  ?                                 ^^^^
  + {"1577836800000":1577836800000,"-6858695778871":null,"a":"a"}
  ?   

can't get my head round it - cc @jbrockmendel @WillAyd in case you have any ideas, else I'll keep at it

renaming .value affects so much of the codebase..

@jbrockmendel
Copy link
Member

can't get my head round it - cc @jbrockmendel @WillAyd in case you have any ideas, else I'll keep at it

best guess is in objToJSON.c there are a couple of get_long_attr(item, "value") calls that might not be caught by grepping for .value

@WillAyd
Copy link
Member

WillAyd commented Jan 23, 2023

I think the JSON module returns garbage values because of #49756

If you look, there are some branches in the code that do something like:

            if (PyObject_HasAttrString(item, "value")) {
                // see test_date_index_and_values for case with non-nano
                nanosecVal = get_long_attr(item, "value");
            } else {
                if (PyDelta_Check(item)) {
                    nanosecVal = total_seconds(item) *
                                 1000000000LL;  // nanoseconds per second
                } else {
                    // datetime.* objects don't follow above rules
                    nanosecVal = PyDateTimeToEpoch(item, NPY_FR_ns);
                }
            }

For the timedelta, this goes into total_seconds which is implemented as such:

static npy_float64 total_seconds(PyObject *td) {
    npy_float64 double_val;
    PyObject *value = PyObject_CallMethod(td, "total_seconds", NULL);
    double_val = PyFloat_AS_DOUBLE(value);
    Py_DECREF(value);
    return double_val;
}

I'm guessing that the object doesn't actually have total_seconds defined, and you end up getting pretty crazy results.

cc @lithomas1 who may be interested

@WillAyd
Copy link
Member

WillAyd commented Jan 23, 2023

As the master of code checks @MarcoGorelli if you had some kind of idea on how to set up CI so that all result = PyObject_... C functions are immediately followed by an if (result == NULL) { // handle error } that could help us clean up our extensions. Otherwise its probably worth doing a manual review of things as a pre-cursor to this PR

@MarcoGorelli MarcoGorelli marked this pull request as ready for review January 24, 2023 12:39
@MarcoGorelli
Copy link
Member Author

Thanks, get_long_attr(item, "value") was indeed the part I needed to change!

As the master of code checks @MarcoGorelli if you had some kind of idea on how to set up CI so that all result = PyObject_... C functions are immediately followed by an if (result == NULL) { // handle error } that could help us clean up our extensions

😄 I'll take a look

Copy link
Member Author

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna block myself here, want to make sure this is really what's desired, as then the public method .value would become unavailable for old dates: #49076 (comment)

@MarcoGorelli
Copy link
Member Author

gonna block myself here, want to make sure this is really what's desired, as then the public method .value would become unavailable for old dates: #49076 (comment)

have amended the error message to suggest using asm8 if nanoseconds aren't what they're after

@@ -89,6 +89,7 @@ class Timedelta(timedelta):
max: ClassVar[Timedelta]
resolution: ClassVar[Timedelta]
value: int # np.int64
_value: int # np.int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be Technically Correct do we need to make value a property here?

@@ -814,7 +814,7 @@ def infer_dtype_from_scalar(val, pandas_dtype: bool = False) -> tuple[DtypeObj,
dtype = _dtype_obj
else:
dtype = np.dtype("m8[ns]")
val = np.timedelta64(val.value, "ns")
val = np.timedelta64(val._value, "ns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this one is expecting the nanos value? (and probably needs to be updated to allow non-nano?)

@@ -546,7 +546,7 @@ def _maybe_convert_i8(self, key):
if lib.is_period(key):
key_i8 = key.ordinal
elif isinstance(key_i8, Timestamp):
key_i8 = key_i8.value
key_i8 = key_i8._value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like 1) this scalar path might not be necessary? and 2) there may be baked-in assumptions about everything being nano

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call - shall we make logic changes in a separate PR? I've opened #51196 about this nanosecond assumption here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate PR seems fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks - any objections here? shall we move forward before merge conflicts arise?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fairly good. Could this use a whatsnew note about .values raising a OverflowError for non nano unit?

@MarcoGorelli
Copy link
Member Author

thanks - @jbrockmendel can confirm but I don't think this (non-nano timestamp) would've been available anyway in 1.5.x

@jbrockmendel
Copy link
Member

I don't think this (non-nano timestamp) would've been available anyway in 1.5.x

correct

@mroeschke mroeschke added this to the 2.0 milestone Feb 8, 2023
@mroeschke mroeschke added Timedelta Timedelta data type Timestamp pd.Timestamp and associated methods labels Feb 8, 2023
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jbrockmendel jbrockmendel merged commit b37321c into pandas-dev:main Feb 9, 2023
@jbrockmendel
Copy link
Member

thanks @MarcoGorelli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timedelta Timedelta data type Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Timestamp and Timedelta .value changing in 2.0
4 participants