Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running builder (content generator) functions in subprocesses on Windows #17595

Merged
merged 1 commit into from
Jul 28, 2018
Merged

Running builder (content generator) functions in subprocesses on Windows #17595

merged 1 commit into from
Jul 28, 2018

Conversation

viktor-ferenczi
Copy link
Contributor

@viktor-ferenczi viktor-ferenczi commented Mar 17, 2018

  • Refactored all builder (make_*) functions into separate Python modules along the build tree
  • Introduced utility function to wrap all invocations into subprocesses on Windows
  • Introduced stub to use the builder modules as a stand alone scripts to execute a selected function

Background

There is a problem with file handles related to writing generated content (*.gen.h and *.gen.cpp)
on Windows, which randomly causes a SHARING VIOLATION error to the compiler resulting in flaky
builds. Running all such content generator in a new subprocess instead of directly inside the build
script works around the issue.

Q&A

  • Root cause?

The root cause of the issue is unknown, but it is specific to how file handles are somehow shared with child processes, even if the files are closed before spawning the child. Attempting to disable file handle sharing by setting the HANDLE_FLAG_INHERIT option to zero on all file handles involved did not work out.

  • Why not using the multiprocessing module?

Yes, I tried the multiprocessing module. It did not work due to conflict with SCons on cPickle.
Suggested workaround did not fully work either.

  • Why run_in_subprocess is not used as a decorator on the builder functions?

One reason is to make it explicit in the Builder definition that the function will run in a subprocess.

Other reason is that the original builder function need to be called in the subprocess, which would need saving that function on the wrapper and special casing it there to call the original function while not running inside the main build script. It would involve adding magic auto detection code, also one more symbol to import in the builder modules. I decided not doing that, despite the decorator syntax would look nicer. I can still implement this per request if you would like it better that way.

  • Why the builder functions are wrapped on non-Windows platforms (osx, x11)?

Using the run_in_subprocess wrapper on osx and x11 platforms as well for consistency, but it
does not actually create subprocesses there. In case of running a cross-compilation on
Windows they would still be used, but likely it will not happen in practice. What counts is
the platform the build itself is running on, not the target platform.

Fixes #5042

@akien-mga
Copy link
Member

CC @garyo for review.

@akien-mga akien-mga added this to the 3.1 milestone Mar 19, 2018
compat.py Outdated
@@ -62,3 +70,60 @@ def escape_string(s):
result += chr(c)
return result


def run_in_subprocess(builder_function):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the compat module was meant only for Python 2/3 compatibility definitions. This should maybe go in methods?

Copy link
Contributor Author

@viktor-ferenczi viktor-ferenczi Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of platform compatibility. Maybe a new module would work best. Something like plat_compat.py or so. Could you please suggest a module name? Then I can separate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, methods.py already has some platform-specific methods. Maybe you could make a new platform_methods.py module and move the relevant methods.py methods there too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but that would be unrelated to the scope of this PR and the ticket behind.

To make it clear I suggest adding platform_methods.py with the new functions introduced here. Then we can move the rest of functions between the modules in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me :)

@mhilbrunner
Copy link
Member

Can you fix the conflicts?

@viktor-ferenczi
Copy link
Contributor Author

viktor-ferenczi commented Jul 9, 2018

Notes on resolving conflicts with two previously merged PRs:

@kavika13
Copy link

I was directed to try this patch out.
Did a clean build, applied this patch, ran build again. Consistently still get this error:

c:\users\...\development\godot\drivers\gles2\rasterizer_scene_gles2.h(36): fatal error C1083: Cannot open include file: 'shaders/scene.glsl.gen.h': No such file or directory

@akien-mga
Copy link
Member

c:\users\...\development\godot\drivers\gles2\rasterizer_scene_gles2.h(36): fatal error C1083: Cannot open include file: 'shaders/scene.glsl.gen.h': No such file or directory

Actually, this might not invalidate this patch, it's just that it needs to be completed to also handle the generation of the shader headers in drivers/gles2 and drivers/gles3. Could you do that @viktor-ferenczi?

@akien-mga
Copy link
Member

And basically all .gen.* files need to get the same treatment I guess.

@viktor-ferenczi
Copy link
Contributor Author

viktor-ferenczi commented Jul 26, 2018

Rebased branch to latest master.

Enrolled all remaining source generators to the subprocess based build mechanism to prevent random build errors.

Some generated files are written directly in an SConstruct or SCsub file, before the parallel build starts. They don't need to run in a subprocess, apparently, so I left them untouched.

Benchmark
Build time: 143 seconds
Command: scons.py -j16 platform=windows
CPU: Ryzen 1800X @ 3800MHz (8 physical, 16 logical cores)
RAM: 2400MHz DDR4
OS: Windows 10 Pro 64bit
Disk: ramdisk (ImDisk Virtual Disk Driver)
Python 2.7.15 32bit
SCons 3.0.0
MSVC 14.12.25827 (Visual Studio 2017 Professional)

@mhilbrunner
Copy link
Member

I can confirm that this seems to fix the issue for me.

I tested this on two different Windows machines where this fails on master, and it does fine even on completely clean builds using 8 or 16 cores.

@mhilbrunner
Copy link
Member

This has a conflict that needs to be fixed, then I'd be strongly in favor of merging this one.

- Refactored all builder (make_*) functions into separate Python modules along to the build tree
- Introduced utility function to wrap all invocations on Windows, but does not change it elsewhere
- Introduced stub to use the builders module as a stand alone script and invoke a selected function

There is a problem with file handles related to writing generated content (*.gen.h and *.gen.cpp)
on Windows, which randomly causes a SHARING VIOLATION error to the compiler resulting in flaky
builds. Running all such content generators in a new subprocess instead of directly inside the
build script works around the issue.

Yes, I tried the multiprocessing module. It did not work due to conflict with SCons on cPickle.
Suggested workaround did not fully work either.

Using the run_in_subprocess wrapper on osx and x11 platforms as well for consistency. In case of
running a cross-compilation on Windows they would still be used, but likely it will not happen
in practice. What counts is that the build itself is running on which platform, not the target
platform.

Some generated files are written directly in an SConstruct or SCsub file, before the parallel build starts. They don't need to be written in a subprocess, apparently, so I left them as is.
@viktor-ferenczi
Copy link
Contributor Author

viktor-ferenczi commented Jul 27, 2018

This PR is moving most of the generator code from SConstruct / SCsub / methods.py to corresponding *_builders.py modules.

Could somebody else please also verify that no code were missed in the process?

@akien-mga
Copy link
Member

Awesome work, thanks a ton @viktor-ferenczi!

@akien-mga akien-mga merged commit 66429a1 into godotengine:master Jul 28, 2018
@viktor-ferenczi
Copy link
Contributor Author

Thank you. It was a pleasure to help your awesome project.

@moiman100
Copy link
Contributor

moiman100 commented Jul 28, 2018

Trying to build master after this commit causes this https://puu.sh/B4B6M/2243083766.txt
Windows 10 64-bit Python 3.7

Works on Python 2.7

@bdbaddog
Copy link

Rather than moving your python generator functions such that they can be run separately and then running them via a builder which calls subprocess, wouldn't it be simpler to just have your builders call those python scripts via normal builders? Since the issue (if I understand correctly) was that you had blobs of python logic in your builders in the same python process and GIL issues, and you've resolve that by making the generators independent scripts.

So your Action would now be "python <generator_script.py> "? This would run the python scripts in a separate process

@viktor-ferenczi
Copy link
Contributor Author

viktor-ferenczi commented Jul 28, 2018

@moiman100 Type basestring is not available on Python 3.7, so the issue.

I'm going to add a new PR to fix Python 3.5+ compatiblity.

Also, build with Python 3.7 is not tested by CI and I did not do it either (my bad). As a result we did not catch this problem on time.

@akien-mga
Copy link
Member

@viktor-ferenczi @moiman100 It's fixed by #20544.

@viktor-ferenczi
Copy link
Contributor Author

Perfect, thanks for the update and sorry for the mistake.

@marcelofg55
Copy link
Contributor

I'm getting a build error on OS X when building with scons -j4 platform=osx bits=64 debug_symbols=full separate_debug_symbols=yes dev=yes:

[ 99%] progress_finish(["progress_finish"], [])
[100%] make_debug_osx(["bin/godot.osx.tools.64"], ["platform/osx/crash_handler_osx.osx.tools.64.o", "platform/osx/os_osx.osx.tools.64.o", "platform/osx/godot_main_osx.osx.tools.64.o", "platform/osx/sem_osx.osx.tools.64.o", "platform/osx/dir_access_osx.osx.tools.64.o", "platform/osx/joypad_osx.osx.tools.64.o", "platform/osx/power_osx.osx.tools.64.o"])
scons: *** [bin/godot.osx.tools.64] TypeError : 'NoneType' object has no attribute '__getitem__'
Traceback (most recent call last):
  File "/usr/local/Cellar/scons/3.0.0/libexec/scons-local/SCons/Action.py", line 1197, in execute
    result = self.execfunction(target=target, source=rsources, env=env)
  File "/Volumes/Data/git/godot_3.0/platform_methods.py", line 22, in wrapper
    return builder_function(target, source, None)
  File "/Volumes/Data/git/godot_3.0/platform/osx/platform_osx_builders.py", line 11, in make_debug_osx
    if (env["macports_clang"] != 'no'):
TypeError: 'NoneType' object has no attribute '__getitem__'
scons: building terminated because of errors.
ERROR!

@viktor-ferenczi
Copy link
Contributor Author

viktor-ferenczi commented Jul 30, 2018

Added ticket #20613

It is because the env parameter is not passed over to the subprocesses. It is because there were no usages in Windows nor Linux builds. It looks like I missed the reference on Mac.

Possible solutions:

  1. Pass env as a dictionary (converting if needed)
  2. Running the builds inside the process (instead of in subprocess) on non-Windows platforms

I think 2. would be simpler. It needs a simple platform condition to the run_in_subprocess method to return the builder function as is if not on Windows.

@viktor-ferenczi
Copy link
Contributor Author

Fix is in PR #20617

@viktor-ferenczi
Copy link
Contributor Author

viktor-ferenczi commented Jul 31, 2018

PR #20617 passed all CI checks and tested on Mac by @marcelofg55
It is ready for code review.

@viktor-ferenczi
Copy link
Contributor Author

PR #20617 completed, so the Mac build error caused by this PR has been fixed.

@kavika13
Copy link

kavika13 commented Aug 6, 2018

You probably got confirmation already, considering this is merged into master, but I thought I'd follow up.

Before this patch a parallel build on Windows would break fairly quickly, and would fail every time. I have just pulled master and tried a clean (git clean -xdf) then -j6 build. I tried it 3 times, and it ran without error 3 times. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants