Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: Adds support for Polars Time datatype #2113

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

marvinl803
Copy link
Contributor

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

I have added support for the Polars Time datatype. I wanted to implement this first to ensure there are no issues before proceeding with the Binary datatype, as mentioned in the issue.

Please let me know if any changes or fixes are needed. Thank you!

@marvinl803 marvinl803 marked this pull request as ready for review February 28, 2025 17:09
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @marvinl803 - This is a good start πŸ™πŸΌ

I would like to aim to support the following in this PR:

@marvinl803
Copy link
Contributor Author

Thanks for the contribution @marvinl803 - This is a good start πŸ™πŸΌ

I would like to aim to support the following in this PR:

Hi @FBruzzesi,

I have added the test cases where you suggested and also implemented support for PyArrow and DuckDB. However, I encountered an issue in narwhals/dtypes.pyβ€”I wasn’t sure how to provide an example for the docs using DuckDB.

Additionally, I added Time to all the other supported types in narwhals_to_native_dtype, except for PyArrow. I’m uncertain whether we should return time32 or time64 in that case. Could you provide some guidance on how best to handle this?

I appreciate your help!

@FBruzzesi
Copy link
Member

I have added the test cases where you suggested and also implemented support for PyArrow and DuckDB. However, I encountered an issue in narwhals/dtypes.pyβ€”I wasn’t sure how to provide an example for the docs using DuckDB.

Thanks for your adjustments - I think we are very close to have this ready. Regarding the narwhals failing tests:

  • One is because of coverage: conversion from narwhals to native has no test yet
  • The other one if because of top level imports - try use constructor and mark unsupported backends as xfail rather than manually instantiating dataframes

Additionally, I added Time to all the other supported types in narwhals_to_native_dtype, except for PyArrow. I’m uncertain whether we should return time32 or time64 in that case. Could you provide some guidance on how best to handle this?

I think we should convert narwhals Time to pyarrow time64 since that's what polars Time mentions:

The underlying representation of this type is a 64-bit signed integer. The integer indicates the number of nanoseconds since midnight.


As a side note, we will need to make PRs in shiny and marimo to fix their failing tests

Comment on lines +415 to +416
if dtype.startswith("time") and dtype.endswith("[pyarrow]"):
return dtypes.Time()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern is time<32|64>[<unit>][pyarrow]

Example:

from datetime import time
import pandas as pd
import pyarrow as pa

data = {"a": [time(12, 0, 0), time(12, 0, 5)]}

pd.DataFrame(data).convert_dtypes(dtype_backend="pyarrow").astype(pd.ArrowDtype(pa.time64("ns"))).dtypes
a    time64[ns][pyarrow]
dtype: object

if isinstance_or_issubclass(dtype, (dtypes.Struct, dtypes.Array, dtypes.List)):
if isinstance_or_issubclass(
dtype, (dtypes.Struct, dtypes.Array, dtypes.List, dtypes.Time)
):
if implementation is Implementation.PANDAS and backend_version >= (2, 2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but... should we check modin backed by pyarrow?

@@ -142,9 +142,10 @@ def narwhals_to_native_dtype(
dtypes.UInt8,
dtypes.Enum,
dtypes.Categorical,
dtypes.Time,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a datatype in pyspark for this, so it will just end up in the unsupported list

@FBruzzesi
Copy link
Member

FBruzzesi commented Mar 6, 2025

@marvinl803 I added a few commits to reach coverage and add pandas support (if backed by pyarrow dtype).

I think this is ready to merge, but we should have ready the PRs to address marimo and shiny whenever we press the merge button. I will try to find a bit of time tomorrow (no promises πŸ™ˆ)


Plotly failure is unrelated, they simply reworked some deps files

Comment on lines +291 to +301
def test_cast_time(request: pytest.FixtureRequest, constructor: Constructor) -> None:
if "pandas" in str(constructor) and PANDAS_VERSION < (2, 2):
request.applymarker(pytest.mark.xfail)

if any(backend in str(constructor) for backend in ("dask", "pyspark", "modin")):
request.applymarker(pytest.mark.xfail)

data = {"a": [time(12, 0, 0), time(12, 0, 5)]}
df = nw.from_native(constructor(data))
result = df.select(nw.col("a").cast(nw.Time()))
assert result.collect_schema() == {"a": nw.Time()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this is working for the Expr.cast case!

Could we also gets some tests to see how nw.Time interacts with Expr.dt methods?
I'm thinking we should aim to support anything that doesn't mention date, duration, time_zone components

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polars Temporal docs usually specify which subset of TemporalType they work with

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only found polars.Expr.dt.to_string mention Time so far

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@FBruzzesi FBruzzesi Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might get more complex that expected: pandas does not access pyarrow time via .dt:

from datetime import time
import pandas as pd
import pyarrow as pa

data = {"a": [time(12, 0, 0), time(12, 0, 5)]}

(
 pd.DataFrame(data)
 .convert_dtypes(dtype_backend="pyarrow")
 .astype(pd.ArrowDtype(pa.time64("ns")))
 ["a"].dt
)

AttributeError: Can only use .dt accessor with datetimelike values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FBruzzesi we don't have to resolve all the issues in this PR - but I think it would still be worth identifying them for follow-ups.

Before reading through (#2113 (comment)) I wasn't aware of how little support there was for Time

@marvinl803
Copy link
Contributor Author

marvinl803 commented Mar 7, 2025

@marvinl803 I added a few commits to reach coverage and add pandas support (if backed by pyarrow dtype).

I think this is ready to merge, but we should have ready the PRs to address marimo and shiny whenever we press the merge button. I will try to find a bit of time tomorrow (no promises πŸ™ˆ)

Plotly failure is unrelated, they simply reworked some deps files

Got it! Thanks for your help and the updates, @FBruzzesi. I'm not too familiar with Marimo and Shiny, but happy to assist however I can. Let me know if there's anything I can do! πŸ‘

@FBruzzesi
Copy link
Member

FBruzzesi commented Mar 7, 2025

Got it! Thanks for your help and the updates, @FBruzzesi. I'm not too familiar with Marimo and Shiny, but happy to assist however I can. Let me know if there's anything I can do! πŸ‘

Hey @marvinl803 I opened one PR in each repo, and forgot to mention it. You should be able to see the link somewhere here (I am from mobile and it's tricky to find the links) 😊

@marvinl803
Copy link
Contributor Author

Hey @marvinl803 I opened one PR in each repo, and forgot to mention it. You should be able to see the link somewhere here (I am from mobile and it's tricky to find the links) 😊

No problem, @FBruzzesi! I found themβ€”I'll check them out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants