Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating datetime columns regardless of timezone #1543

Closed
robertdj opened this issue Mar 28, 2024 · 9 comments · Fixed by #1589
Closed

Validating datetime columns regardless of timezone #1543

robertdj opened this issue Mar 28, 2024 · 9 comments · Fixed by #1589
Labels
enhancement New feature or request

Comments

@robertdj
Copy link

I am using Pandera with the new Polars plugin, which is really exciting.

I am validating a schema, where one of the columns is a Datetime. I don't care if the Datetime has a timezone or not.
However, Pandera appears to be strict about whether or not there is a timezone. Is it possible ignore the presence of a timezone?

I suppose this could be handled with a union of type as in this issue: #1152
Although I fear that I then have to provide all possible allowed time zones?

@robertdj robertdj added the enhancement New feature or request label Mar 28, 2024
@cosmicBboy
Copy link
Collaborator

Hi @robertdj can you provide a code sample of the code you're working with?

@cosmicBboy
Copy link
Collaborator

Looking at the polars docs, could you use "*" to match any timezone, including no timezone?

@robertdj
Copy link
Author

Thanks for your quick answer! I was actually using datetime from the standard library:

class MySchema(pa.DataFrameModel):
    timestamp: datetime

But good point with using Polars' datetime. This seems to be equivalent:

class MySchema(pa.DataFrameModel):
    timestamp: pl.Datetime(time_zone=None)

Unfortunately, the docs say that if time_zone=* it needs a valid time zone. But a union of time_zone=None and time_zone=* probably does the trick.

@cosmicBboy
Copy link
Collaborator

I think another way to handle this would be to override the check method in pandera.engines.polars_engine.DateTime so that a plain pl.Datetime will pass validation check for both time_zone-aware and -unaware columns.

Does this make sense, or is the Union solution less ambiguous?

@robertdj
Copy link
Author

I think your suggestion sounds like a much better default!

@robertdj
Copy link
Author

robertdj commented Apr 7, 2024

Have you had time to consider this @cosmicBboy ?

@cosmicBboy
Copy link
Collaborator

One last thought: would it be too cumbersome to import the pandera data type instead?

from pandera.engines.polars_engine import DateTime

class MySchema(pa.DataFrameModel):
    timestamp: DateTime(tz_agnostic=True)

My main concern is that with my prior suggestion is that there would be no way to validate datetime types that don't have timezones:

  1. pl.Datetime: implicitly no timezone
  2. pl.Datetime(timezone=None): explicitly no timezone
  3. pl.Datetime(timezone=): some specific timezone
  4. pl.Datetime(timezone="*"): any timezone

If someone wanted to validate that a column is datetime and has no timezone, pl.DateTime or pl.DateTime(timezone=None) no longer provides that guarantee.

@robertdj
Copy link
Author

I think that is a good point. I'm fine with importing DateTime from pandera.

@cosmicBboy
Copy link
Collaborator

@robertdj see: #1589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants