Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support read_parquet/read_csv for most backends #9448

Open
jcrist opened this issue Jun 26, 2024 · 3 comments
Open

feat: support read_parquet/read_csv for most backends #9448

jcrist opened this issue Jun 26, 2024 · 3 comments
Assignees
Labels
io Issues related to input and/or output

Comments

@jcrist
Copy link
Member

jcrist commented Jun 26, 2024

Currently we only support read_parquet for backends that have native support (like duckdb). In contrast, we support to_parquet for all backends, falling back to a common pyarrow implementation if a backend doesn't natively support it.

To provide more uniform feature coverage, we could write an equivalent common pyarrow (or other) implementation of our IO input methods (read_parquet/read_csv/...) that backends like postgres could fall back on.

@jcrist jcrist added the io Issues related to input and/or output label Jun 26, 2024
@jcrist jcrist added this to the Q3 2024 milestone Jun 26, 2024
@jcrist
Copy link
Member Author

jcrist commented Jun 26, 2024

If duckdb is installed, there's also the option of using duckdb for faster support for parquet/csv table loading. I think that could be done in a 2nd pass, see #8110, but a pass using pyarrow first to cover all backends makes sense to me as a good first step.

@jcrist jcrist changed the title Support read_parquet/read_csv for most backends feat: support read_parquet/read_csv for most backends Jun 26, 2024
@lostmygithubaccount lostmygithubaccount removed this from the Q3 2024 milestone Jul 17, 2024
@jitingxu1 jitingxu1 self-assigned this Jul 29, 2024
@jitingxu1
Copy link
Contributor

HI @jcrist I could take this one. Will do it one by one.

Plan to add read_parquet and read_csv. How about read_json and read_delta?

@csubhodeep
Copy link

Any updates on this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Issues related to input and/or output
Projects
Status: backlog
Development

No branches or pull requests

4 participants