Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In cast(), the argument wrap_numerical works differently on floats and integers #18546

Open
2 tasks done
etiennebacher opened this issue Sep 4, 2024 · 2 comments
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@etiennebacher
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({"float": [100.0, 300], "int": [100, 300]})

df.with_columns(
    wrapped_float=pl.col("float").cast(pl.UInt8, wrap_numerical=True),
    wrapped_int=pl.col("int").cast(pl.UInt8, wrap_numerical=True),
)

Log output

shape: (2, 4)
┌───────┬─────┬───────────────┬─────────────┐
│ float ┆ int ┆ wrapped_float ┆ wrapped_int │
│ ---   ┆ --- ┆ ---           ┆ ---         │
│ f64   ┆ i64 ┆ u8            ┆ u8          │
╞═══════╪═════╪═══════════════╪═════════════╡
│ 100.0 ┆ 100 ┆ 100           ┆ 100         │
│ 300.0 ┆ 300 ┆ 255           ┆ 44          │
└───────┴─────┴───────────────┴─────────────┘

Issue description

In cast(), the argument wrap_numerical has a different output depending on whether the input column is float or int:

  • for floats, it keeps the maximum value of the datatype
  • for ints, it wraps the value

Expected behavior

I suppose it should wrap the value in both cases.

Installed versions

--------Version info---------
Polars:              1.6.0
Index type:          UInt32
Platform:            Windows-10-10.0.19045-SP0
Python:              3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2023.6.0
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.7.1
nest_asyncio         1.5.6
numpy                1.24.3
openpyxl             <not installed>
pandas               2.0.3
pyarrow              12.0.1
pydantic             2.6.4
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@etiennebacher etiennebacher added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 4, 2024
@etiennebacher etiennebacher changed the title In cast(), the argument wrap_numerical on floats and integers In cast(), the argument wrap_numerical works differently on floats and integers Sep 4, 2024
@orlp
Copy link
Collaborator

orlp commented Sep 4, 2024

This is going to be a bit tricky, involving f64::trunc and then I think manually shifting the mantissa being careful with handling shifts larger than the width.

EDIT: actually the simplest implementation will be to just use x.trunc().rem_euclid(uT::MAX + 1) as uT for now at least.

@etiennebacher
Copy link
Author

Hi, I don't have a usecase relying on this but I just wonder if this should be tagged with some priority since it silently returns wrong results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants