-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandera timezone-agnostic datetime type #1352
Comments
Hi @max-raphael this is somewhat of a challenging use case to fulfill with datetimes because if we have a timezone-agnostic datetime, how do we deal with coercion? Imagine we support something like: class MySchema(DataFrameModel):
local_datetime: DateTime(has_tz=True) # just checks that the datetimes have any timezone
class Config:
coerce = True If we do
|
This is similar to the problem of having a generic |
I hear you, that does pose a tricky problem. Thinking about it from my perspective as a user, I think I would prefer to have this as an option but be disallowed from coercing this field (via some Exception) due to the ambiguous nature of the data type rather than not have it accessible to me at all. Perhaps even an Exception is too much. Pandera could still allow users to specify |
How would you feel about defaulting to UTC on coercion (if the incoming raw data is not TZ-aware) and raising a warning that the dtypes are coerced to UTC? I generally like to do something rather than nothing on coercion to prevent propagation of surprise (i.e. a non-TZ aware dataframe after validation with |
That seems acceptable to me. I think if incoming data is not tz-aware, then that's a reasonable approach so long as Pandera logs the warning and includes it in the documentation! |
@cosmicBboy Hi, just following up here. Are we aligned on the feature? If so, what are the next steps? Thanks again for engaging with this, I think it would be helpful to many Pandera users. |
Yep! Feel free to make a PR with changes to the Also check out the contributing guide if it's your first time contributing: https://pandera.readthedocs.io/en/stable/CONTRIBUTING.html |
…1352) Signed-off-by: Max Raphael <mrap96@gmail.com>
I'm interested in this, too. Adding a comment under the "Union" issue for this reason. #1152 (comment) |
Is your feature request related to a problem? Please describe.
When defining a class that inherits from DataFrameModel, I want to define a field whose values are datetimes. Moreover, those values will have timezones. However, I will not be able to define during the class definition what timezone that may be. In other words, in dataframe A, they may be datetimes with tz="America/New_York. In dataframe B, they may be datetiems with tz="America/Los_Angeles". As far as I can tell, there is no type that I can assign that will allow me to pass datetimes with timezones, but not specify which timezone within the type hint.
Describe the solution you'd like
I would like there to be a type that I can use to say "this field will be datetimes, but I can't say what the timezone will be."
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
When setting the type of the field to datetime.datetime, pandera.dtypes.DateTime, etc. I get a pandera SchemaError that the series was expected to have type datetime64[ns], but got datetime64[ns, America/New_York] (for example).
I have also tried with DatetimeTZDtype, but that won't work because I need to specify the timezone I want (which I can't do upfront).
Additional context
Example Schema:
class MySchema(DataFrameModel):
local_datetime: <what type do I set here?>
The text was updated successfully, but these errors were encountered: