-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(parquet): Enable Parquet WASM loader #2773
Conversation
Either that or you need to make the requests from inside wasm. See https://kylebarron.dev/parquet-wasm/functions/esm_arrow2.readRowGroupAsync.html As a development note, there are two bindings, arrow-rs and arrow2 (see docs note here). In recent Rust ecosystem developments, arrow2 is "dying" (main contributor stepped back and other contributor forked it inside of polars), and so I'm moving my work to arrow-rs from now on in general. the arrow-rs (i.e. arrow1) bindings in parquet-wasm don't have async read support yet but it will be added at some point. There's also https://kylebarron.dev/parquet-wasm/functions/esm_arrow2.readParquetStream.html and https://kylebarron.dev/parquet-wasm/functions/esm_arrow1.readParquetStream.html added recently. They aren't documented well, though there's an example here: https://observablehq.com/d/f5723cea6661fb71. The main downside is that it looks like each column chunk is requested independently, and I'm not immediately sure how to get the underlying rust apis to batch the requests e.g. for each chunk.
You can do that with |
Nice. I would also need to be able to apply a column filter to the reader so it doesn't read the columns I didn't select. |
So the big problem to make this run in browser. seems to be that the generated The annoying util import at line 3 can be handle by
And something similar is needed for the fs import at the end. |
you're using the node bundle. You should be using either the esm or the bundler entry point. See https://github.com/kylebarron/parquet-wasm#choice-of-bundles and https://rustwasm.github.io/docs/wasm-pack/commands/build.html#target (what they call web, I call esm) What we did previously in loaders was try to conditionally use the node bundle when it was called from node, and use one of the others on the web. The node bundle is nice in node because it's a synchronous import |
Making an attempt to reintegrate the WASM parquet loader / writer.
Notes - the WASM loader is amazing but still have limitations:
ReadableFile
and does random access reads on that so it does not necessarily load all chunks into memory, but only loads chunks as needed.