Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When working with s3fs (For AWS S3), it still raises "Polars found a filename" warning #18040

Open
2 tasks done
MacHu-GWU opened this issue Aug 4, 2024 · 4 comments
Open
2 tasks done
Labels
A-io Area: reading and writing data A-io-cloud Area: reading/writing to cloud storage bug Something isn't working P-medium Priority: medium python Related to Python Polars

Comments

@MacHu-GWU
Copy link

MacHu-GWU commented Aug 4, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import s3fs

df = pl.DataFrame({
    "foo": ["a", "b", "c", "d", "d"],
    "bar": [1, 2, 3, 4, 5],
})

fs = s3fs.S3FileSystem()
destination = "s3://bucket/my_file.parquet"

# write parquet
with fs.open(destination, mode='wb') as f:
    df.write_parquet(f)

Log output

/Users/sanhehu/Documents/GitHub/polars_aws-project/polars_aws/s3/_write_parquet.py:56: UserWarning: Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance.
    df.write_parquet(f, **kwargs)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

Issue description

I am following the code in https://docs.pola.rs/user-guide/io/cloud-storage/#writing-to-cloud-storage, how ever, it still raises the "UserWarning: Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance." warning. I guess it's because the example doesn't match the polars recommended best practice.

Expected behavior

Should not have warning

Installed versions

--------Version info---------
Polars:               1.4.0
Index type:           UInt32
Platform:             macOS-14.3-arm64-arm-64bit
Python:               3.10.10 (main, Feb 20 2024, 22:22:03) [Clang 15.0.0 (clang-1500.1.0.2.5)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.1
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@MacHu-GWU MacHu-GWU added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 4, 2024
@deanm0000 deanm0000 added P-medium Priority: medium A-io Area: reading and writing data A-io-cloud Area: reading/writing to cloud storage and removed needs triage Awaiting prioritization by a maintainer labels Aug 5, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Aug 5, 2024
@deanm0000
Copy link
Collaborator

I'm not sure if gcs and s3 open file objects have fs attribute but files opened with adlfs have fs. If the other two do then I think we can just add a check here

if !py_f.is_exact_instance(&io.getattr("BytesIO").unwrap()) {
polars_warn!("Polars found a filename. \
Ensure you pass a path to the file instead of a python file object when possible for best \
performance.");
}

so in addition to skipping the warning for BytesIO also skip the warning if py_f.hasattr('fs').

@WillAgeG
Copy link

And how to solve the problem?

@breanna-gream
Copy link

Any update on this issue? I am also seeing this warning message when using gcsfs for Google Cloud Storage

@nameexhaustion
Copy link
Collaborator

I would recommend to try using write_parquet() to the S3 URL directly, as support for this was added in the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: reading and writing data A-io-cloud Area: reading/writing to cloud storage bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
Status: Ready
Development

No branches or pull requests

5 participants