-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows builds built with gcc -O2 works on Win10 but fails on Win11 #20081
Comments
To hopefully shed more light on this I've made a couple builds as described above, e.g. with default -O2 and with -Os, on both Win10 and Win11, e.g. basically 4 builds. Further, in order to try to provoke the error in more isolation I grabbed the t/test.pl and t/re/reg_mesg.t and placed them together in an empty directory. The reg_mesg.t file reset the INC in a BEGIN block, I commented that out. I then created a simple script:
This was run with the 4 builds; in both cases for the builds done (on Win10 and Win11 respectively) using -O2, this fails with this output;
Using the -Os builds, they run to completion as expected:
However, all 4 builds does work on my Win10 machine. So either my Win11 is bonkers in some way, or Win11 in general sets up the process in a way that hits whatever -O2 optimization does. The behavior from run to run is stable however, it always fails in the same spot. I have not yet tried to simplify the reg_mesg.t script to a point which would provoke the error with less code. I'm not sure if it's doable but I'll try for a bit, on this or on any of the other failing test files. I'm not sure if the fact that there are several 're' related tests that error out would be significant...? I've also tried the fix from PR #19912 by @sisyphus by simply forcing the fix even with gcc 8. It appears not to make a difference unfortunately. Hope all this helps. While I'll also try to keep digging in identifying something simpler to provoke it with but I honestly don't think I have a prayer of actually finding the problem...:-/ I'm available for further tests if anyone has suggestions to try and help isolate it further. E.g. I also do get failures trying to build (at least) v5.36.0, although not quite the same ones. ken1 |
From the exit codes, it seems that For the crashing one it would be useful to get a C-level stacktrace if you are able to run it under a debugger. It may then be possible to home in on the issue by showing that just one of the source files mentioned in the stacktrace needs to be compiled with A crash is usually a great source of information. But if we can't track down the underlying cause from that, then it would be useful to get a perl-level stacktrace for one or two examples of the dying test scripts using something like:
|
Thanks, will look into that. Admittedly however, is there a shortish help on how I could interpose a debugger given the tools I have (or what I would need)? My C/C++ days are about 25 years in the mirror, and then it was Windows only with msvc...:-)... But, I'll start digging on the popen front. |
Do you have an antivirus or some kind of anti-cheat software running? They sometimes cause weird problems on Windows. Anyway, to obtain a stacktrace, follow these instructions:
diff --git a/win32/GNUmakefile b/win32/GNUmakefile
index b241991dae..989dd92591 100644
--- a/win32/GNUmakefile
+++ b/win32/GNUmakefile
@@ -609,8 +609,8 @@ OPTIMIZE = -g -O2
LINK_DBG = -g
DEFINES += -DDEBUGGING
else
-OPTIMIZE = -O2
-LINK_DBG = -s
+OPTIMIZE = -ggdb -O2
+LINK_DBG = -ggdb
endif
EXTRACFLAGS =
|
Actually, you don't have to edit the Makefile, just passing |
Thanks @xenu, I'm using the builtin antivirues, i.e. Windows Security; turning off the realtime checks doesn't seem to make a difference. As you said, an absolutely clean clone, and built with the params to gmake (no patch). Jumping directly to "... TEST_FILES=win32\popen.t test" was disappointing...it passed ;-). So running it with gdb should be uninteresting, although it looks a bit weird. Since fork() is involved maybe this looks normal? Running it without the harness is quite unexciting...
However, re\reg_mesg.t still fails (same place as before) and possibly that will give a hint:
Rebuilding with standard settings does cause win32/popen.t to fail, so now I'm trying to establish whether it was a fluke or if the -ggdb stuff actually makes the error go away. ken1 |
BTW, would it help or hinder any to build with CFG=Debug? |
No, it seems that the win32\popen now fails in another build with the gdb settings made so...and the gdb run looks the same. IIUC, it's the watchdog that eventually kills the process. Not sure how to best proceed there...:-/ Also, for re\reg_mesg.t I'm unable to get a Carp:confess(), it appears to never reach that. The process dies a quick death I assume... |
That reg_mesg.t stacktrace looks like yet another instance of #17521, which is a recurring issue. But it's weird, I didn't expect it to be triggered by running on a different OS version. I thought it was just about toolchain. |
Yes, this does indeed seem to be the same underlying problem as in #17521, but twists it a little bit. The issue suggests that DEBUGGING 'fixes' the problem so I tried an out of the box build with just turning on that:
Repeating and turning on the gdb flags, shows the same set of errors, and running a gdb on op/gv.t shows a similar stack:
It's obviously (?) a toolchain issue and I'm assuming that Win11 somehow is more stringent/whatever in something and manages to entice the problem out of the woodwork where Win10 does not. But I think it would be extremely important to have at least one independent observation of this on another Win11 just to remove my particular machine as a potential source of problems. Still, having this issue reported much earlier makes this less likely. According to #19912, there is a gcc 12 used. @sisyphus, any chance of getting a copy of that to try? |
I'm sorry that it has taken me a couple of days to notice that request. (There's also a version available there that doesn't have LLVM/Clang/LLD/LLDB if you prefer.) Cheers, |
Thanks for the winlibs link; the short version is that I've tried as-is...and get other errors than with 8.3.0 from Strawberry. I decided not to pursue this at this point since it becomes so many variants of problems to keep track of. A guess is that there's a fair chance that the root cause is the same with the newer gcc, it's just moving to another place. When I've come to a stable point with 8.3.0 I will retry with those options. And that is a problem: the results are mostly stable as long as I don't touch settings, but not completely, making it hard to come up with truly understanding it. Especially since it appears to be from a code-generation/optimization issue which I have no experience in. So while it eventually would be great to actually find and fix the real problem, I've instead tried to find a stable workaround using more of brute force...:-) The hypothesis is that switching -O2 to -Os makes things work.However, I've found that this occasionally still left the win32\popen.t test failing. Running it manually however, it usually works...:-/. As seen above when compared with the other test failures, there was never any evidence of this dying so it was a bit of an anomaly anyway. I realized that it's setting a watchdog for 20 secs to kill the test if it's not finished. This should leave ample time for the fork and stuff it does. However, I'm wondering if part of the problem becomes the watchdog stuff itself...no actual known reason, but since I commented out the watchdog, it never fails. Hence, I would claim this is a change to be made regardless of the optimization issue. Unless anything else crops up I'll probably make a PR out of it. I'm using this content in the tests I show below. I also use the fix in #20033. Trying to find the offending optimizationReading https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/Optimize-Options.html, I read it as stating that -Os enables all the -O2 options except a few and adding one more (this differs slightly between versions). Given that, I've tried to figure out which of the option(s) makes things go awry and have arrived at using the following:
IIUC, the one thing different from -O2 is then the -Os addition of -finline-functions, and a lack of This configuration has been run repeatedly and seems to be stable in that it has so far never given any spurious errors. Logically, it should then be possible to go the other way around, e.g. use -O2 but specifically reverse the differences to above. Unfortunately, this have not met with success so far. I may be missing something though...checking with Any thoughts? |
I have a suspicion that "threads" is also playing a part - at least in the weirdness that I'm seeing on Windows 7. The downside to unthreaded perls on Windows is that you lose the fork() function - and that, for example, breaks the cpan utility as its Anyway, @kenneth-olwing, that's just another variant you could consider ... though I'm not sure that's exactly what you you were looking for ;-) Just in case it is (or becomes) relevant, be mindful of the fact that the older StrawberryPerl mingw-w64 compilers define _WIN32_WINNT to a different value than the newer winlibs ones. For me, Visual Studio (MSVC142) built perl-5.37.2 on Windows 7 without any problem at all. It might (or might not) be helpful to know whether Visual Studio provides the same trouble-free result on Windows 11. Cheers, |
Right. I'm sure you're right in that it's somehow thread-related (also). However, I need threading so disregarding that is not an option for me :-/ In short, I have a fairly large toolset in Perl at work which needs to work on both Linux & Windows. It's on the order of +20 years old and when I a few years ago was able to lay my hands on the project, I've worked hard to improve and modernize it in all the ways I can. A very important step was to get a common Perl version - before me it was constrained to whatever RHEL shipped, and an antique hand-modded (and somewhat broken) ActiveState 5.8.9 package on Windows...completely unworkable, I set up a build system to build Perl from source on Linux, use Strawberry on Windows, and then outfit them with the 'same' set of modules. So, that's my ultimate need, i.e. upgrade to a later Perl on both platforms. While this experience makes me consider building from source on Windows too, my secondary goal is to poke around so Strawberry can move forward - it's what I'd choose for my home projects obviously. Rather than mess around with the zillion optimization flags in GCC I've so far had zero problems with just plain '-Os' (with the caveat that as described above the win32\popen.t watchdog is removed, and the Errno fix is applied) so I'm considering placing PR's for them so it's at least stable for everyone (?), and someone with the means can worry about -O2 later. Maybe I'd try to get clarification on how the optim flags actually are supposed to be combined from the gcc folks. For the simple reason that 8.3.0 is a known quantity in Strawberry I figure it's reasonable to start with that and move on to 12.1+ later. Also, since Strawberry comes with a large amount of extra libs that I'm assuming is needed/useful for many things that also helps. Then again, if a coming Strawberry incorporates a later gcc, that would make sense to use then of course. Again however, getting independent confirmation on having issues on Win11 would be really great...:-/ Thanks, |
There's some comments in t/win32/popen.t that are a little obscure:
#77672 (which I haven't yet located) is referenced in perl5200delta - and the claim seems to be that it fixed the issue you are now seeing. I can't find any written record of having problems with that test file on my Windows 7 machine ... though it (vaguely) rings a bell ... maybe some unusual perl configuration for which I've kept no record. Note that the OPTIMIZE setting is already configurable via the command line (OPTIMIZE=-Os). Having a default value of -Os seems sane to me. Cheers, Cheers, |
Standard disclaimer by now: It's still possible that all this is a 'me' problem in my context only... Re: win32\popen.tThe tricky part is of course that the watchdog doesn't 'fix' any problem, just acknowledges and takes us out of a hang. Actually, just a possible hang - I've had a few instances where running the test manually takes an unusual long time, e.g. far longer than 20 secs, but still completed. I know, makes no sense either since it should be very quick. Some weird thread scheduling? Buffer sizes on the pipes for the qx process causing some kind of almost-but-not-quite race condition? Just on Win11? Hmmm...or perhaps better, it could be a configurable thing, e.g. for those of us seeing this particular problem (it is only relevant during a local build test anyway, different from the optimize thing), setting an envvar, e.g. WIN32_POPEN_T_WATCHDOG to a value which is the secs, where '0' turns it off completely would be sufficient. Sort of eat the cake and have it. Re: OPTIMIZEMy chief reason to de facto changing the default to -Os would be that, as it stands, -O2 appears to work fine with a build+test on a Win10 machine, but not on Win11. The worst scenario with requiring a change on commandline or local patching of GNUmakefile is that someone unsuspecting of the problem with -O2 would build with factory settings on Win10, and then distribute to Win11 users that would/could get weird errors in odd situations. Not that I know how big (if any) performance impact it produces but rather a slow working build than a fast crashing one...:-). I guess there could be provisions for keeping -O2 when not using threads of course... Thanks for your (and everyone elses) insight and work with Perl, |
Is this fixed by the merge of #20136 ? |
Yes, it is, thank you. |
The below was done with blead@c52f54c8521c6155984dba0a0675d3220170468b and using a standard MSI install of Strawberry Perl 5.32.1.1 as the toolchain.
Description
Building is done out-of-the box with:
The build seems fine. Also, on a Win10 machine tests passes.
However, on my Win11 box the tests fails with:
Most of the fails are the test process simply dying. I'm at this point assuming that the root cause is the same in all cases. Running 'install' on the build produces an installed Perl as expected. The install retains the problem; see further comments for more details on examining that.
After looking around I found issue #20024. Although the toolchain I have only has gcc 8.3.0, it felt worthwhile to test it out, so retrying this with a completely clean clone but adding
OPTIMIZE=-Os
on the command line, the test suite runs fine. Again, see further comments for more information.Steps to Reproduce
Hopefully anyone with Win11 can reproduce. I only have one Win11 box and so am unable to verify myself whether this particular machine is somehow culpable or if it's 'any' Win11.
Expected behavior
Working tests.
Perl configuration
More details to follow.
Thank you,
ken1
The text was updated successfully, but these errors were encountered: