-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh: Adds support for Polars Time datatype #2113
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @marvinl803 - This is a good start ππΌ
I would like to aim to support the following in this PR:
- Add a case test in
tests/expr_and_series/cast_test.py
- Support more than just polars, as specified in #1989 comment, pyarrow and duckdb should be possible to support
- Make sure that conversion from and to the backend is supported
Hi @FBruzzesi, I have added the test cases where you suggested and also implemented support for PyArrow and DuckDB. However, I encountered an issue in narwhals/dtypes.pyβI wasnβt sure how to provide an example for the docs using DuckDB. Additionally, I added Time to all the other supported types in narwhals_to_native_dtype, except for PyArrow. Iβm uncertain whether we should return time32 or time64 in that case. Could you provide some guidance on how best to handle this? I appreciate your help! |
Thanks for your adjustments - I think we are very close to have this ready. Regarding the narwhals failing tests:
I think we should convert narwhals Time to pyarrow time64 since that's what polars Time mentions:
As a side note, we will need to make PRs in shiny and marimo to fix their failing tests |
if dtype.startswith("time") and dtype.endswith("[pyarrow]"): | ||
return dtypes.Time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pattern is time<32|64>[<unit>][pyarrow]
Example:
from datetime import time
import pandas as pd
import pyarrow as pa
data = {"a": [time(12, 0, 0), time(12, 0, 5)]}
pd.DataFrame(data).convert_dtypes(dtype_backend="pyarrow").astype(pd.ArrowDtype(pa.time64("ns"))).dtypes
a time64[ns][pyarrow]
dtype: object
if isinstance_or_issubclass(dtype, (dtypes.Struct, dtypes.Array, dtypes.List)): | ||
if isinstance_or_issubclass( | ||
dtype, (dtypes.Struct, dtypes.Array, dtypes.List, dtypes.Time) | ||
): | ||
if implementation is Implementation.PANDAS and backend_version >= (2, 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, but... should we check modin backed by pyarrow?
@@ -142,9 +142,10 @@ def narwhals_to_native_dtype( | |||
dtypes.UInt8, | |||
dtypes.Enum, | |||
dtypes.Categorical, | |||
dtypes.Time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a datatype in pyspark for this, so it will just end up in the unsupported list
@marvinl803 I added a few commits to reach coverage and add pandas support (if backed by pyarrow dtype). I think this is ready to merge, but we should have ready the PRs to address marimo and shiny whenever we press the merge button. I will try to find a bit of time tomorrow (no promises π) Plotly failure is unrelated, they simply reworked some deps files |
def test_cast_time(request: pytest.FixtureRequest, constructor: Constructor) -> None: | ||
if "pandas" in str(constructor) and PANDAS_VERSION < (2, 2): | ||
request.applymarker(pytest.mark.xfail) | ||
|
||
if any(backend in str(constructor) for backend in ("dask", "pyspark", "modin")): | ||
request.applymarker(pytest.mark.xfail) | ||
|
||
data = {"a": [time(12, 0, 0), time(12, 0, 5)]} | ||
df = nw.from_native(constructor(data)) | ||
result = df.select(nw.col("a").cast(nw.Time())) | ||
assert result.collect_schema() == {"a": nw.Time()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this is working for the Expr.cast
case!
Could we also gets some tests to see how nw.Time
interacts with Expr.dt
methods?
I'm thinking we should aim to support anything that doesn't mention date, duration, time_zone components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The polars
Temporal docs usually specify which subset of TemporalType
they work with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only found polars.Expr.dt.to_string
mention Time
so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a huge suite here to draw from for test ideas: https://github.com/pola-rs/polars/blob/52b93ef5909cd0a5790001894aec4b473c361631/py-polars/tests/unit/datatypes/test_temporal.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might get more complex that expected: pandas does not access pyarrow time via .dt
:
from datetime import time
import pandas as pd
import pyarrow as pa
data = {"a": [time(12, 0, 0), time(12, 0, 5)]}
(
pd.DataFrame(data)
.convert_dtypes(dtype_backend="pyarrow")
.astype(pd.ArrowDtype(pa.time64("ns")))
["a"].dt
)
AttributeError: Can only use .dt accessor with datetimelike values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FBruzzesi we don't have to resolve all the issues in this PR - but I think it would still be worth identifying them for follow-ups.
Before reading through (#2113 (comment)) I wasn't aware of how little support there was for Time
Got it! Thanks for your help and the updates, @FBruzzesi. I'm not too familiar with Marimo and Shiny, but happy to assist however I can. Let me know if there's anything I can do! π |
Hey @marvinl803 I opened one PR in each repo, and forgot to mention it. You should be able to see the link somewhere here (I am from mobile and it's tricky to find the links) π |
No problem, @FBruzzesi! I found themβI'll check them out. |
What type of PR is this? (check all applicable)
Related issues
Binary
&Time
datatypeΒ #1989Checklist
If you have comments or can explain your changes, please do so below
I have added support for the Polars Time datatype. I wanted to implement this first to ensure there are no issues before proceeding with the Binary datatype, as mentioned in the issue.
Please let me know if any changes or fixes are needed. Thank you!