-
-
Notifications
You must be signed in to change notification settings - Fork 21k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in code builders with SCons 3.0.2+ on Windows #24846
Comments
CC @SeleckyErik |
I don't seem to be able to replicate this locally with Python 3.7.0 and SCons 3.0.3 |
In my testing with similar issues, I've found if you don't have a reasonably fast system and storage it's less likely to show up. The issue (at least the working theory) is that Python releases the GIL around file open and closes and due to (seemingly) windows filesystems taking longer to do so than non-windows the files are being opened by other processes which have them locked before they actually close on the filesystem. As you might guess nailing this down exactly has been daunting. That said I believe in python 3.5+ file open's are supposed to explicitly set flags so threads don't share file handles. So it could be that this improvement could be resolving this long standing issue. Though in general my advice has long been don't do anything too significant in process in scons. You'll hold the GIL regardless and reduce your builds ability to parallelize. |
I wanted to test the solution that you sugessted, i. e. running the generator in a separate process, but I have no way of testing whther that fixed it. Wait, what you said is that by running it in a separate process will make it easier for SCons to parallelize the build? I thought that this will make the calling thread stall and wait till the spawned process finishes and returns? |
That's what you want to avoid the file access race condition. You want the
file actually closed by that action before another action uses it..
…On Sun, Feb 3, 2019 at 2:15 PM Erik Selecký ***@***.***> wrote:
I wanted to test the solution that you sugessted, i. e. running the
generator in a separate process, but I have no way of testing whther that
fixed it.
Wait, what you said is that by running it in a separate process will make
it easier for SCons to parallelize the build? I thought that this will make
the calling thread stall and wait till the spawned process finishes and
returns?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24846 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFBNCo28US_YG7LAj0tgTQy4W3Uxeuqks5vJzVpgaJpZM4Z1zS3>
.
|
Indeed. I totally understand why you'd want that. I was just wondering what you meant by better parallelization opportunity; maybe there's just something I overlooked :) |
Are you talking about this sentence:
Actions in SCons come in several flavors (actions are the things that get run to do your build steps). One of which is Python (general a class or function.. you can see more in the manpage). When you do too much in a python action like any python which is running in a thread, it's going to hold the GIL and stop all other python from running until something releases the GIL. So if you did a bunch off file processing/templating/etc in a python action in scons it wouldn't drop the GIL until it some IO call most likely. Anyway that's what I'm talking about. You want that stuff to happen when the GIL is not being held. The simplest way to do that is to have the command run in a shell. with a normal action. So you move the logic into a standalone script and pass the arguments to it. Which while not terribly intuitive if you have a bunch of cores can be faster. (While also avoiding these pesky race conditions which seem to almost exclusively show up on windows) |
Oh, alright, tahat makes sense. Thank you for explanation! |
Also, you were most likely right that the reson why I couldn't replicate this was that my setup is not fast enough. I set up AppVeyor for my fork of Godot and the issue immediately emerged. Thanks for all the help! |
@SeleckyErik - No problemo. We're here to help. Unfortunately fixing this race condition in SCons is tricky bit and moving the cause out seems generally a better solution when it can be done. So with a reasonable workaround it may take a while to dedicate (likely a week) to nail it down, if it turns out that it can be fixed in SCons and not require changes in Python.. |
Seems to work in the current master branch (as per #34717). |
Godot version:
Master (e4b0251)
OS/device including version:
Windows (10 I suppose, only checked on AppVeyor CI)
Python 2.7.15.
SCons 3.0.2 or 3.0.3.
Issue description:
I tried the newly released 3.0.2 (then yanked due to a packaging issue) and 3.0.3 in #24837 on AppVeyor CI, and it fails (so far in reproducible fashion, on 4-5 attempts) with:
AppVeyor build: https://ci.appveyor.com/project/akien-mga/godot/builds/21459795
It still runs fine with SCons 3.0.1. I've reported this to SCons maintainers who mention that such race conditions on Windows + Python are a known issue (as we've experienced too in the past), and that 3.0.2+ involved some performance improvements that could exacerbate this issue in our case (it might not have failed in 3.0.1 because it was slow enough to release the file handle in time).
As mentioned in #17595 (comment) already and confirmed today on IRC, SCons' @bdbaddog advises that we move our code builders to their own standalone Python scripts and execute them directly via a shell (so
python /path/to/script.py
).Excerpts from IRC discussion:
CC @viktor-ferenczi @hpvb
The text was updated successfully, but these errors were encountered: