Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_database fails for UInts and Time dtypes when ADBC used #17312

Open
2 tasks done
mcrumiller opened this issue Jun 30, 2024 · 7 comments
Open
2 tasks done

write_database fails for UInts and Time dtypes when ADBC used #17312

mcrumiller opened this issue Jun 30, 2024 · 7 comments
Assignees
Labels
A-io-database Area: reading/writing to databases bug Something isn't working python Related to Python Polars

Comments

@mcrumiller
Copy link
Contributor

mcrumiller commented Jun 30, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Here is a check that goes through all of the basic (non-nested) dtypes:

from datetime import date, time
from datetime import datetime as dt
from datetime import timedelta as td

from adbc_driver_postgresql import dbapi

import polars as pl

uri = "postgresql://mcrumiller@winhost/mydb"
conn = dbapi.connect(uri)


def try_write(data, dtype):
    """Attempt to write single-column df using ADBC."""
    df = pl.DataFrame({"a": pl.Series(data, dtype=dtype)})
    try:
        df.write_database("test_dtype", conn, if_table_exists="replace")
    except Exception:
        print(f"{dtype} FAIL")


# these all fail
try_write([1, 2, 3], dtype=pl.UInt8)  # FAIL!
try_write([1, 2, 3], dtype=pl.UInt16)  # FAIL!
try_write([1, 2, 3], dtype=pl.UInt32)  # FAIL!
try_write([1, 2, 3], dtype=pl.UInt64)  # FAIL!
try_write([time(1), time(2), time(3)], pl.Time)  # FAIL!

# these all pass
try_write([1, 2, 3], dtype=pl.Int8)
try_write([1, 2, 3], dtype=pl.Int16)
try_write([1, 2, 3], dtype=pl.Int32)
try_write([1, 2, 3], dtype=pl.Int64)
try_write([True, True, False], pl.Boolean)
try_write(["a", "b", "c"], pl.String)
try_write(["a", "b", "c"], pl.Categorical)
try_write(["a", "b", "c"], pl.Enum(["a", "b", "c"]))
try_write([date(2024, 1, 1), date(2024, 1, 2), date(2024, 1, 3)], pl.Date)
try_write([dt(2024, 1, 1), dt(2024, 1, 2), dt(2024, 1, 3)], pl.Datetime)
try_write([td(1), td(2), td(3)], pl.Duration)

Log output

UInt8 FAIL
UInt16 FAIL
UInt32 FAIL
UInt64 FAIL
Time FAIL

Due to:

adbc_driver_manager.NotSupportedError: NOT_IMPLEMENTED: [libpq] Field #1 ('a') has unsupported type for ingestion uint8

Issue description

df.write_database(...) fails for certain dtypes because adbc does not support those types. We should probably upcast where possible, e.g. u8 -> i16, u16 -> i32, u64 -> i64 if possible, warn if overflow.

Installed versions

main

@mcrumiller mcrumiller added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 30, 2024
@mcrumiller
Copy link
Contributor Author

@alexander-beedie

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jun 30, 2024

Hmm... might want to bring it to the attention of the ADBC folks too, since presumably it'll affect all client connections/data (eg: non-polars callers uploading pyarrow, etc) 🤔

@alexander-beedie alexander-beedie self-assigned this Jun 30, 2024
@alexander-beedie alexander-beedie added A-io-database Area: reading/writing to databases and removed needs triage Awaiting prioritization by a maintainer labels Jun 30, 2024
@mcrumiller
Copy link
Contributor Author

might want to bring it to the attention of the ADBC folks too

I'm writing up an issue for them, but figured we should have a workaround in the meantime.

@mcrumiller
Copy link
Contributor Author

apache/arrow-adbc#1950

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jul 2, 2024

I'm writing up an issue for them, but figured we should have a workaround in the meantime.

Looks like they're moving to fix this - better that than applying a workaround on our side. Let's see if it makes their next release 👌

@mcrumiller
Copy link
Contributor Author

@alexander-beedie I think we may want to at least put in a version check along with a warning telling the user to upcast, or do an automatic upcast ourselves based on adbc version. Their Milestone 14 isn't due until mid-Aug. Not that that's super far away, but polars moves quickly :)

@alexander-beedie
Copy link
Collaborator

@alexander-beedie I think we may want to at least put in a version check along with a warning telling the user to upcast, or do an automatic upcast ourselves based on adbc version. Their Milestone 14 isn't due until mid-Aug. Not that that's super far away, but polars moves quickly :)

Yup; I prefer raising an error here - I don't want us to be replicating their logic to fix a temporary issue that will be properly fixed on their side shortly ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-database Area: reading/writing to databases bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants