-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: tighten DTA/TDA _from_sequence signature #37179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a substantial (but good) api change, or is this just on master as _from_sequence is actually strict? (in 1.1.x)
not just on master, this makes _from_sequence substantially stricter than 1.1.x |
Thanks for looking at this! Now, for the actual behavioural change, |
pass | ||
elif scalars.dtype == object: | ||
if isinstance(scalars, ABCMultiIndex): | ||
raise TypeError("Cannot create a DatetimeArray from MultiIndex") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this tested?
# with inferred_type as above? | ||
pass | ||
else: | ||
msg = f"dtype {scalars.dtype} cannot be converted to datetime64[ns]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you ensure we have a test that hits this
# TODO: should go through from_sequence_of_strings? | ||
pass | ||
else: | ||
raise TypeError(data.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you ensure we have a test that hits hits
just pushed with a couple new tests that get us to full coverage on the affected methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of comments on is this added code tested
@@ -2869,6 +2869,7 @@ def test_from_tzaware_mixed_object_array(self): | |||
] | |||
assert (res.dtypes == expected_dtypes).all() | |||
|
|||
@pytest.mark.xfail(reason="DatetimeArray._from_sequence no longer accepts i8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a note for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the question is if we want to disable/deprecate this or find a way to make it work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something we want to support or disallow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the question is not so much about _from_sequence
, but rather about our public constructors (Series()
, Index()
, array()
).
Currently, those support converting integers to datetime dtype. For example:
pd.Series([1, 2, 3], dtype="datetime64[ns]")
works fine (and the same for the other two).
So the question is if we want to keep this kind of behaviour. Personally I would say there is not much harm in allowing it (also given that eg numpy and pyarrow have the same behaviour), it's only when there are timezones that there could be a bit more ambiguity potentially (but eg for timedelta that's already not relevant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree. Any thoughts on where to catch this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on where to catch this case?
Not fully sure what you mean, but (until #33254 is resolved) doing such conversions is the responsibility of _from_sequence
(for pd.array
), so if we want to keep this capability, at the moment it is _from_sequence
that needs to accept ints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully sure what you mean
I mean that we could make this Series construction work without loosening the _from_sequence restrictions by and instead calling [??] in this case.
so if we want to keep this capability, at the moment it is _from_sequence that needs to accept ints.
To be clear, is this what you are suggesting we do? My impression from previous conversations is that you are unambiguously in favor of this tightening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My earlier suggestion at #37179 (comment) was to use _from_sequence
/_from_sequence_strict
(instead of _from_sequence_non_strict
/_from_sequence
), until we decide on better naming for this. So keeping the non-strict behaviour of _from_sequence
for now.
That already cleans up the code (separating both behaviours more cleanly), without being blocked on API discussions.
My impression from previous conversations is that you are unambiguously in favor of this tightening.
Yes, for sure. But before we can implement that, we still need to come up with an API for the non-strict conversion as well, as that is still something we need to support. That's the whole discussion of #33254
if u can rebase; i think a couple of questions |
cc @WillAyd most recent commit made small edits to json code, can you take a look |
lgtm - I think the explicitness here is nice |
@jorisvandenbossche any objection to this as a temporary workaround? im pretty sure It will allow us to clean up a bunch of code in .isin and .equals xref #37528 |
updated per discussion so that _from_sequence remains non-strict while a new _from_sequence_strict has the hopefully-future behavior of from_sequence. im hoping to use from_sequence strict to simplify+de-duplicate .equals and .isin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
General question: in the tslib conversion functions, those are all "generic" in the sense of accepting a mixture of string, datetime, Timestamp, int, etc (like array_to_datetime
seems to do) ? We don't have one that is more strict on the input that could be used for this purpose? (instead of checking up front with infer_dtype)
Good idea, we can use Will update. |
mothballing to clear the queue while i address comments locally |
cc @jorisvandenbossche
need to decide how to handle test_from_2d_ndarray_with_dtype which this currently xfails