Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the Arrow-based data transport. #509

Closed
whitphx opened this issue Mar 9, 2023 · 3 comments · Fixed by #601
Closed

Support the Arrow-based data transport. #509

whitphx opened this issue Mar 9, 2023 · 3 comments · Fixed by #601

Comments

@whitphx
Copy link
Owner

whitphx commented Mar 9, 2023

  • st.experimental_data_editor introduced with 1.19.0 relies on PyArrow.
  • The legacy transport may no longer be supported in the future.
@whitphx
Copy link
Owner Author

whitphx commented Jun 30, 2023

st.dataframe is being improved so much with Arrow-based data communication.

@lukasmasuch
Copy link
Contributor

lukasmasuch commented Jul 31, 2023

Some ideas from a private discussion on this topic:

I think what would already help is any way on how we could get the pandas dataframe serialized to the IPC/feather format (bytes) on Python side. We don't really need any other functionalities from Pyarrow. One pyodide supported library that might get us on step closer is fastparquet. This would probably allow us to serialize the dataframe into the parquet format. And maybe we can use something like parquet-wasm (or arrow-wasm) to actually convert it from parquet into the IPC format on JS side. Once we have this in IPC format, we can load this in our frontend (here) and everything should work fine

What we currently do in Streamlit is:

  1. Backend: Convert any data to Pandas DF
  2. Backend: Convert Pandas to Arrow Table
  3. Backend: Serialize Arrow Table to bytes (IPC format)
  4. Frontend: Load bytes via tableFromIPC

For stlite, we might be able to adapt this process so that:

  1. Backend: Convert any data to Pandas DF
  2. Backend - stlite-specific: Serialize Pandas to parquet file (via fastparquet)
  3. Frontend - stlite-specific:: Load parquet file and convert to IPC bytes via parquet-wasm
  4. Frontend: Load bytes via tableFromIPC

@whitphx
Copy link
Owner Author

whitphx commented Aug 18, 2023

@lukasmasuch Hi, thank you very much for such a detailed comprehensive guide of migration!
This is the changes I made on the Streamlit codebase forked for stlite: whitphx/streamlit#3. Can you please take a look on it and see if it makes sense?
Also, you can see st.data_frame and st.data_editor are working in the preview envs deployed in this PR: #601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants