Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support io.BytesIO as path for to_parquet (and other methods) #10534

Closed
1 task done
lostmygithubaccount opened this issue Nov 26, 2024 · 3 comments
Closed
1 task done
Labels
feature Features or general enhancements

Comments

@lostmygithubaccount
Copy link
Member

lostmygithubaccount commented Nov 26, 2024

Is your feature request related to a problem?

I want to start by acknowledging what I'm doing is convoluted, but in short I'm saving files as binary blobs in a SQLite database. I want to be able to do this with Parquet files produced by Ibis with table.to_parquet, but hit an error (Ibis writes to a file of the string of the buffer object, instead of the object itself). Using pyarrow.parquet to write the table directly works (and is my workaround until I stop doing this convoluted thing), but it'd be cool (?) if Ibis directly support writing to io.BytesIO for output methods.

What is the motivation behind your request?

as a simple example:

```{python}
import io
import ibis
import pyarrow as pa
import pyarrow.parquet as pq

ibis.options.interactive = True
```

```{python}
t = ibis.examples.penguins.fetch()
t
```

```{python}
b = io.BytesIO()
t.to_parquet(b)
b.getvalue()
# this outputs an empty buffer, and instead writes out to a file like <_io.BytesIO object at 0x11eb68a90>
```

```{python}
b = io.BytesIO()
pq.write_table(t.to_pyarrow(), b)
b.getvalue()
# this outputs the correct Parquet bytes
```

Describe the solution you'd like

it seems like at some point the buffer object is being turned into a string and passed down into the writer. I took a cursory look but wasn't exactly sure where

my ideal solution is I can pass io.BytesIO objects to Ibis table output methods

What version of ibis are you running?

9.5

What backend(s) are you using, if any?

sqlite, duckdb

Code of Conduct

  • I agree to follow this project's Code of Conduct
@lostmygithubaccount
Copy link
Member Author

I wouldn't mind trying to get a PR for this myself if someone can confirm this should work as I have in the example above

@gforsyth
Copy link
Member

It should work as you have it above for the backends that don't have their own parquet writer -- but for DuckDB it will definitely break, and I don't think BigQuery and Snowflake will be very happy about it either.
Basically, for all of the backends where we're generating SQL to handle parquet writing (vs. via pyarrow), it won't work (with the possible exception of polars and datafusion)

@lostmygithubaccount
Copy link
Member Author

that makes a lot of sense, thanks for the explanation. I'll just close this out as not planned for now, my workaround is fine/as I said I don't think this is a super real thing I need to do in the medium term

@lostmygithubaccount lostmygithubaccount closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: done
Development

No branches or pull requests

2 participants