Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFI Execution Plans that spawn threads panic #13851

Open
timsaucer opened this issue Dec 19, 2024 · 7 comments · May be fixed by #13937
Open

FFI Execution Plans that spawn threads panic #13851

timsaucer opened this issue Dec 19, 2024 · 7 comments · May be fixed by #13937
Assignees
Labels
bug Something isn't working

Comments

@timsaucer
Copy link
Contributor

Describe the bug

If you have an FFI Execution Plan that spawns it's own tasks, it has no reactor and causes a panic.

To Reproduce

This is most easily demonstrated in datafusion-python. Just create a table provider that spawns a thread. I have examples on my computer but they are non-trivial (and it's probably me working on this issue).

Expected behavior

We should never cause a panic across the ffi boundary.

Additional context

No response

@timsaucer timsaucer added the bug Something isn't working label Dec 19, 2024
@timsaucer timsaucer self-assigned this Dec 19, 2024
@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

Seems like there are two problems:

  1. The panic is going across FFI boundaries
  2. The FFI code doesn't have a tokio rutime setup

Maybe we need to have some way to pass in a handle to the tokio runtime 🤔 Handle across the FFI boundaries

@timsaucer
Copy link
Contributor Author

There is only one point where we have ffi calls that are async, currently. This happens in the record batch stream. So I think what needs to happen is that within the private data for the RBS we also include an Option<Arc<Runtime>> and if it exists, we enter the runtime during the async calls.

We do pass a context. I worry about a possible problem where we could have deadlocks if we have two different runtimes each doing a blocking call. The only place that is immediately jumping out to me as a potential problem spot is RecordBatchReceiverStreamBuilder, but I haven't had time to thoroughly look through this.

I think the immediate next thing to do is to build up a good reproducible test, ideally that I could include as a unit test.

@timsaucer
Copy link
Contributor Author

Initial testing of passing in an Option<Arc<Runtime>> to the table provider and feeding it down through the execution plan and record batch stream was successful.

@timsaucer timsaucer linked a pull request Dec 29, 2024 that will close this issue
@kevinjqliu
Copy link

Hi @timsaucer thanks for creating this issue!
I'm looking to expose table provider in iceberg-rust to python using FFI_TableProvider. I think I ran into the issue you described above.

This is the current WIP PR https://github.com/kevinjqliu/iceberg-rust/pull/2/files
and here's the error log with backtrace after running hatch run dev:test https://gist.github.com/kevinjqliu/0ab4597427d196c906f279242294d051

I've been looking at reasons why I'm see the error message: there is no reactor running, must be called from the context of a Tokio 1.x runtime. And this issue makes the most sense.

Would love to get your thoughts on this and see if its the same issue

@kevinjqliu
Copy link

I tried to incorporate #13937 in kevinjqliu/iceberg-rust#3 and ran into a different error https://gist.github.com/kevinjqliu/cba636cfd4a2a0c8acbba060768e33c9

tests/test_table_provider.py::test_iceberg_table_provider Fatal Python error: Segmentation fault

I'm not sure where the error occurs, but i think integrating the iceberg table provider will be a good test for this issue.

This is my first time working with rust/pyo3/datafusion so there might be something obvious I'm not seeing.

@timsaucer
Copy link
Contributor Author

I can definitely help and I have a PR to fix the spawn issue. I will be back to work on this next week if you can give me a few dats

@kevinjqliu
Copy link

Thank you! Feel free to ping me if you need any help setting up the env for iceberg-rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants