-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyCall segmentation fault when running a pure Julia task in parallel #33893
Comments
I would hope that as long as you only ever use Python from a single thread you should be okay … can you put together a minimal example that illustrates the problem? ( i.e. delete every line in your code until the problem goes away...) |
EDIT: It seems that this may only be a partial fix; when I increased the problem size, the problem resurfaced. In my case I am writing my own (Julia) package that uses a python package to obtain some methods that are difficult to code. Trouble was, I needed those methods both inside my package and when using my package. So I had two module dummy
using PyCall
const psi4 = PyNULL()
function __init__()
copy!(psi4, pyimport("psi4")
end
export psi4
end using dummy
psi4. .... |
I managed to figure out what the issue was. I have a function that reads a CSV file into a dataframe and does some transformations:
In the main program, I have a loop. If I try to write a list of tables using pure Julia code WHILE preprocessing other files, it crashes pretty reliably. It fails when trying to convert a Julia dataframe to a Pandas dataframe.
However, if I fetch the results before running the Python code, it doesn't crash
The error message I get upon crashing is a bit different this time. From what I can see still looks like the Python code and parallel Julia tasks are trying to access the same memory.
|
It would also be interesting to note that the only thing I'm using Python for here is writing files to parquet. This code is literally trying to write parquet files using Python while reading and processing files in Julia. If Julia had a functional Parquet writer, none of this mess would have to happen. |
Doesn't this involve calling Python code, which is not thread-safe? |
Yes, but I'm only ever running the Python code on one thread. I'm running Julia on the other threads. However, I keep getting memory conflicts. I've tried a number of different configurations just to see, and this is what I found: (1) Try to run pure Julia threads while calling Python to convert a Julia dataframe to pandas So it looks like we really can't run Julia on other threads while calling python on the main thread. |
Is this still crashing? Please reopen if so. |
I'm not sure if this is a package issue or an issue with the new multi-threading tool with Julia 1.3-rc5. I am trying to use Python's fastparquet to write a dataframe into a parquet file (as there is no Julia package that allows me to do this). While doing this, I am running a task in parallel to process a Julia Dataframe. However, trying to do so results in a segfault. I know that I probably can't run multiple PyCalls with multithreading (as Python is not thread-safe), but does this mean I also cannot do something in pure Julia while making a Python call? I'm not sure if this is a bug or a system limitation (which should be documented somewhere but I can't find it).
If you have a question please search or post to our Discourse site: https://discourse.julialang.org.
We use the GitHub issue tracker for bug reports and feature requests only.
If you're submitting a bug report, be sure to include as much relevant information as
possible, including a minimal reproducible example and the output of
versioninfo()
.If you're experiencing a problem with a particular package, open an issue on that
package's repository instead.
Thanks for contributing to the Julia project!
The text was updated successfully, but these errors were encountered: