Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows builds built with gcc -O2 works on Win10 but fails on Win11 #20081

Closed
kenneth-olwing opened this issue Aug 11, 2022 · 18 comments
Closed

Comments

@kenneth-olwing
Copy link
Contributor

The below was done with blead@c52f54c8521c6155984dba0a0675d3220170468b and using a standard MSI install of Strawberry Perl 5.32.1.1 as the toolchain.

Description
Building is done out-of-the box with:

cd win32
gmake -f GNUmakefile CCHOME=c:\Strawberry\c test

The build seems fine. Also, on a Win10 machine tests passes.
However, on my Win11 box the tests fails with:

Test Summary Report
-------------------
re/reg_mesg.t                                                      (Wstat: 65280 (exited 255) Tests: 339 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
re/regexp_unicode_prop.t                                           (Wstat: 65280 (exited 255) Tests: 0 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 1110 tests but ran 0.
re/regexp_unicode_prop_thr.t                                       (Wstat: 65280 (exited 255) Tests: 4 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 1110 tests but ran 4.
re/subst.t                                                         (Wstat: 65280 (exited 255) Tests: 236 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 278 tests but ran 236.
re/subst_wamp.t                                                    (Wstat: 65280 (exited 255) Tests: 236 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 278 tests but ran 236.
re/substT.t                                                        (Wstat: 65280 (exited 255) Tests: 236 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 278 tests but ran 236.
win32/popen.t                                                      (Wstat: 2304 (exited 9) Tests: 0 Failed: 0)
  Non-zero exit status: 9
  Parse errors: No plan found in TAP output
../dist/Safe/t/safeops.t                                           (Wstat: 65280 (exited 255) Tests: 71 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 418 tests but ran 71.
../dist/Storable/t/utf8.t                                          (Wstat: 65280 (exited 255) Tests: 5 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 6 tests but ran 5.
../ext/IPC-Open3/t/IPC-Open3.t                                     (Wstat: 0 Tests: 45 Failed: 0)
  TODO passed:   25
Files=2719, Tests=1114380, 4838 wallclock secs (72.62 usr + 112.17 sys = 184.80 CPU)
Result: FAIL
gmake: *** [GNUmakefile:1811: test] Error 9

Most of the fails are the test process simply dying. I'm at this point assuming that the root cause is the same in all cases. Running 'install' on the build produces an installed Perl as expected. The install retains the problem; see further comments for more details on examining that.

After looking around I found issue #20024. Although the toolchain I have only has gcc 8.3.0, it felt worthwhile to test it out, so retrying this with a completely clean clone but adding OPTIMIZE=-Os on the command line, the test suite runs fine. Again, see further comments for more information.

Steps to Reproduce
Hopefully anyone with Win11 can reproduce. I only have one Win11 box and so am unable to verify myself whether this particular machine is somehow culpable or if it's 'any' Win11.

Expected behavior
Working tests.

Perl configuration

Summary of my perl5 (revision 5 version 37 subversion 3) configuration:
   
  Platform:
    osname=MSWin32
    osvers=10.0.22000.856
    archname=MSWin32-x64-multi-thread
    uname=''
    config_args='undef'
    hint=recommended
    useposix=true
    d_sigaction=undef
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='gcc'
    ccflags =' -DWIN32 -DWIN64  -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY -DPERL_IMPLICIT_SYS -DUSE_PERLIO -D__USE_MINGW_ANSI_STDIO -fwrapv -fno-strict-aliasing -mms-bitfields'
    optimize='-O2'
    cppflags='-DWIN32'
    ccversion=''
    gccversion='8.3.0'
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='long long'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='g++'
    ldflags ='-s -L"e:\perl-win11-O2\lib\CORE" -L"c:\Strawberry\c\lib" -L"c:\Strawberry\c\x86_64-w64-mingw32\lib" -L"c:\Strawberry\c\lib\gcc\x86_64-w64-mingw32\8.3.0"'
    libpth=c:\Strawberry\c\lib c:\Strawberry\c\x86_64-w64-mingw32\lib c:\Strawberry\c\lib\gcc\x86_64-w64-mingw32\8.3.0
    libs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    perllibs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    libc=
    so=dll
    useshrplib=true
    libperl=libperl537.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs
    dlext=dll
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags='-shared -s -L"e:\perl-win11-O2\lib\CORE" -L"c:\Strawberry\c\lib" -L"c:\Strawberry\c\x86_64-w64-mingw32\lib" -L"c:\Strawberry\c\lib\gcc\x86_64-w64-mingw32\8.3.0"'

Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    HAVE_INTERP_INTERN
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_IMPLICIT_SYS
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Built under MSWin32
  Compiled at Aug 10 2022 23:38:33
  @INC:
    E:/perl-win11-O2/site/lib
    E:/perl-win11-O2/lib

More details to follow.

Thank you,

ken1

@kenneth-olwing
Copy link
Contributor Author

To hopefully shed more light on this I've made a couple builds as described above, e.g. with default -O2 and with -Os, on both Win10 and Win11, e.g. basically 4 builds.

Further, in order to try to provoke the error in more isolation I grabbed the t/test.pl and t/re/reg_mesg.t and placed them together in an empty directory. The reg_mesg.t file reset the INC in a BEGIN block, I commented that out.

I then created a simple script:

use strict;
use warnings;

print "EXECUTING TEST...\n";
print "=================\n";
my $xit = system($^X, 'reg_mesg.t') >> 8;
print "=================\n";
print "EXITCODE: $xit\n";

This was run with the 4 builds; in both cases for the builds done (on Win10 and Win11 respectively) using -O2, this fails with this output;

...<snip>...
ok 335 - ... and gave expected message
ok 336 - ... and no other warnings
ok 337 -  m/(?[ \p{Digit} & (?(?[ \p{Thai} | \p{Lao} ]))])/ died
ok 338 - ... and gave expected message
ok 339 - ... and no other warnings
Can't spawn "E:\perl-win11-O2\bin\perl.exe": No error at x.pl line 6.
=================
EXITCODE: 255

Using the -Os builds, they run to completion as expected:

...<snip>...
ok 3345 - ... and gave expected number (1) of warnings
ok 3346 - ... and gave expected warning
ok 3347 - ... and turning off 'experimental::vlb' warnings suppressed it
ok 3348 - ... and the warning is on by default
1..3348
=================
EXITCODE: 0

However, all 4 builds does work on my Win10 machine. So either my Win11 is bonkers in some way, or Win11 in general sets up the process in a way that hits whatever -O2 optimization does. The behavior from run to run is stable however, it always fails in the same spot.

I have not yet tried to simplify the reg_mesg.t script to a point which would provoke the error with less code. I'm not sure if it's doable but I'll try for a bit, on this or on any of the other failing test files. I'm not sure if the fact that there are several 're' related tests that error out would be significant...?

I've also tried the fix from PR #19912 by @sisyphus by simply forcing the fix even with gcc 8. It appears not to make a difference unfortunately.

Hope all this helps. While I'll also try to keep digging in identifying something simpler to provoke it with but I honestly don't think I have a prayer of actually finding the problem...:-/

I'm available for further tests if anyone has suggestions to try and help isolate it further. E.g. I also do get failures trying to build (at least) v5.36.0, although not quite the same ones.

ken1

@hvds
Copy link
Contributor

hvds commented Aug 11, 2022

From the exit codes, it seems that win32/popen.t is crashing while the others are dying.

For the crashing one it would be useful to get a C-level stacktrace if you are able to run it under a debugger. It may then be possible to home in on the issue by showing that just one of the source files mentioned in the stacktrace needs to be compiled with -Os to fix the problem.

A crash is usually a great source of information. But if we can't track down the underlying cause from that, then it would be useful to get a perl-level stacktrace for one or two examples of the dying test scripts using something like:

  use Carp (); local $SIG{__DIE__} = sub { Carp::confess(@_) };

@kenneth-olwing
Copy link
Contributor Author

Thanks, will look into that. Admittedly however, is there a shortish help on how I could interpose a debugger given the tools I have (or what I would need)? My C/C++ days are about 25 years in the mirror, and then it was Windows only with msvc...:-)...

But, I'll start digging on the popen front.

@xenu
Copy link
Member

xenu commented Aug 11, 2022

Do you have an antivirus or some kind of anti-cheat software running? They sometimes cause weird problems on Windows.

Anyway, to obtain a stacktrace, follow these instructions:

  1. Apply this patch and rebuild Perl. Make sure your checkout is fully clean, git clean -dxf is the usual way to do it.
diff --git a/win32/GNUmakefile b/win32/GNUmakefile
index b241991dae..989dd92591 100644
--- a/win32/GNUmakefile
+++ b/win32/GNUmakefile
@@ -609,8 +609,8 @@ OPTIMIZE	= -g -O2
 LINK_DBG	= -g
 DEFINES		+= -DDEBUGGING
 else
-OPTIMIZE	= -O2
-LINK_DBG	= -s
+OPTIMIZE	= -ggdb -O2
+LINK_DBG	= -ggdb
 endif
 
 EXTRACFLAGS	=
  1. Start the crashing test under gdb:
>gdb --args .\perl.exe .\t\win32\popen.t
  1. Inside gdb, type run and if it crashes, the bt command will give you the backtrace.

@xenu
Copy link
Member

xenu commented Aug 11, 2022

Actually, you don't have to edit the Makefile, just passing OPTIMIZE="-ggdb -O2" LINK_DBG="-ggdb" to gmake will do.

@kenneth-olwing
Copy link
Contributor Author

Thanks @xenu,

I'm using the builtin antivirues, i.e. Windows Security; turning off the realtime checks doesn't seem to make a difference.

As you said, an absolutely clean clone, and built with the params to gmake (no patch). Jumping directly to "... TEST_FILES=win32\popen.t test" was disappointing...it passed ;-). So running it with gdb should be uninteresting, although it looks a bit weird. Since fork() is involved maybe this looks normal? Running it without the harness is quite unexciting...

C:\ws\git\perl5-main\t>gdb --args .\perl.exe .\win32\popen.t
GNU gdb (GDB) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from .\perl.exe...done.
(gdb) run
Starting program: C:\ws\git\perl5-main\t\perl.exe .\win32\popen.t
[New Thread 9952.0x2044]
[New Thread 9952.0x8d0]
[New Thread 9952.0x4098]
[New Thread 9952.0x29cc]
warning: onecore\base\appmodel\processcreation\src\packagedcreateprocess.cpp(289)\kernelbase.dll!00007FFF9AB787E6: (caller: 00007FFF9AB798C2) ReturnHr(1) tid(2044) 80070002 The system cannot find the file specified.
warning: onecore\base\appmodel\processcreation\src\packagedcreateprocess.cpp(761)\kernelbase.dll!00007FFF9AB798E5: (caller: 00007FFF9AB799B9) ReturnHr(2) tid(2044) 80070002 The system cannot find the file specified.
warning: onecore\base\appmodel\processcreation\src\packagedcreateprocess.cpp(952)\kernelbase.dll!00007FFF9AB799DF: (caller: 00007FFF9ABA8FD0) LogHr(1) tid(2044) 80070002 The system cannot find the file specified.
# Test process timed out - terminating
[Thread 9952.0x4098 exited with code 9]
[Thread 9952.0x8d0 exited with code 9]
[Inferior 1 (process 9952) exited with code 011]
(gdb) bt
No stack.

However, re\reg_mesg.t still fails (same place as before) and possibly that will give a hint:

...<snip>...
ok 337 -  m/(?[ \p{Digit} & (?(?[ \p{Thai} | \p{Lao} ]))])/ died
ok 338 - ... and gave expected message
ok 339 - ... and no other warnings
gdb: unknown target exception 0xc00000ff at 0x7fff9d18fcad

Thread 1 received signal ?, Unknown signal.
0x00007fff9d18fcad in ntdll!RtlRaiseStatus () from C:\WINDOWS\SYSTEM32\ntdll.dll
(gdb) bt
#0  0x00007fff9d18fcad in ntdll!RtlRaiseStatus () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007fff9d1c0543 in ntdll!RtlNotifyFeatureUsage () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00007fff9d13523d in ntdll!RtlUnwind () from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x00007fff9b09686b in msvcrt!_setjmpex () from C:\WINDOWS\System32\msvcrt.dll
#4  0x000000006cf5e3d7 in Perl_die_unwind (my_perl=0x0, my_perl@entry=0x694fe8, msv=msv@entry=0x29b85f0)
    at ..\pp_ctl.c:1857
#5  0x000000006cfff25a in Perl_vcroak (my_perl=0x694fe8, pat=<optimized out>, args=<optimized out>) at ..\util.c:2011
#6  0x000000006cfff902 in Perl_croak (my_perl=0x690000, my_perl@entry=0x694fe8,
    pat=0x7fff9d207532 <ntdll!RtlRegisterSecureMemoryCacheCallback+14450> ",A\017,H\002",
    pat@entry=0x6d02bc7d <bodies_by_type+541> "%d%I64u%4p") at ..\util.c:2062
#7  0x000000006cee2102 in Perl_get_regclass_aux_data (my_perl=my_perl@entry=0x694fe8, prog=<optimized out>,
    node=node@entry=0x29b699c, doinit=doinit@entry=true, listsvp=listsvp@entry=0x0,
    only_utf8_locale_ptr=only_utf8_locale_ptr@entry=0x62f868, output_invlist=output_invlist@entry=0x0)
    at ..\regcomp.c:20958
#8  0x000000006ceea891 in S_reginclass (my_perl=my_perl@entry=0x694fe8, prog=<optimized out>, prog@entry=0x2958ee8,
    n=<optimized out>, n@entry=0x29b699c, p=<optimized out>, p_end=0x2892639 "", utf8_target=utf8_target@entry=false)
    at ..\regexec.c:10817
#9  0x000000006cef97d2 in S_find_byclass (my_perl=my_perl@entry=0x694fe8, prog=prog@entry=0x2958ee8,
    c=c@entry=0x29b699c, s=<optimized out>, s@entry=0x2892638 "x", strend=strend@entry=0x2892639 "",
    reginfo=reginfo@entry=0x62fa80) at ..\regexec.c:2260
#10 0x000000006cf019ca in Perl_regexec_flags (my_perl=0x694fe8, rx=0x2902ca0, stringarg=0x2892638 "x",
    strend=0x2892639 "", strbeg=0x2892638 "x", minend=0, sv=0x2690cf0, data=0x0, flags=97) at ..\regexec.c:4068
#11 0x000000006cf80906 in Perl_pp_match (my_perl=0x694fe8) at ../inline.h:389
#12 0x000000006cff4926 in Perl_runops_standard (my_perl=0x694fe8) at ..\run.c:42
#13 0x000000006cf9b2f0 in S_run_body (oldscope=<optimized out>, my_perl=<optimized out>) at perl.c:2750
#14 perl_run (my_perl=0x6d005cf0 <xs_init(PerlInterpreter*)>, my_perl@entry=0x694fe8) at perl.c:2678
#15 0x000000006d0093b8 in RunPerl (argc=<optimized out>, argv=<optimized out>, env=0x694d70) at perllib.c:201
#16 0x00000000004013c7 in __tmainCRTStartup ()
#17 0x00000000004014fb in mainCRTStartup ()
(gdb)

Rebuilding with standard settings does cause win32/popen.t to fail, so now I'm trying to establish whether it was a fluke or if the -ggdb stuff actually makes the error go away.

ken1

@kenneth-olwing
Copy link
Contributor Author

BTW, would it help or hinder any to build with CFG=Debug?

@kenneth-olwing
Copy link
Contributor Author

No, it seems that the win32\popen now fails in another build with the gdb settings made so...and the gdb run looks the same. IIUC, it's the watchdog that eventually kills the process. Not sure how to best proceed there...:-/

Also, for re\reg_mesg.t I'm unable to get a Carp:confess(), it appears to never reach that. The process dies a quick death I assume...

@xenu
Copy link
Member

xenu commented Aug 11, 2022

That reg_mesg.t stacktrace looks like yet another instance of #17521, which is a recurring issue. But it's weird, I didn't expect it to be triggered by running on a different OS version. I thought it was just about toolchain.

@kenneth-olwing
Copy link
Contributor Author

Yes, this does indeed seem to be the same underlying problem as in #17521, but twists it a little bit. The issue suggests that DEBUGGING 'fixes' the problem so I tried an out of the box build with just turning on that:
gmake -f GNUmakefile CCHOME=c:\Strawberry\c CFG=Debug test
It 'fixes' all the previous test problems, but unfortunately moves the problems elsewhere...:-)

Test Summary Report
-------------------
op/gv.t                                                            (Wstat: 65280 (exited 255) Tests: 176 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
op/method.t                                                        (Wstat: 65280 (exited 255) Tests: 7 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 162 tests but ran 7.
op/ref.t                                                           (Wstat: 65280 (exited 255) Tests: 199 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 254 tests but ran 199.
op/sort.t                                                          (Wstat: 65280 (exited 255) Tests: 154 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 203 tests but ran 154.
op/sub_lval.t                                                      (Wstat: 65280 (exited 255) Tests: 189 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 211 tests but ran 189.
uni/gv.t                                                           (Wstat: 65280 (exited 255) Tests: 153 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 206 tests but ran 153.
../dist/Storable/t/utf8.t                                          (Wstat: 65280 (exited 255) Tests: 5 Failed: 0)
  Non-zero exit status: 255
  Parse errors: Bad plan.  You planned 6 tests but ran 5.
../ext/IPC-Open3/t/IPC-Open3.t                                     (Wstat: 0 Tests: 45 Failed: 0)
  TODO passed:   25
Files=2719, Tests=1119520, 1940 wallclock secs (93.00 usr + 11.06 sys = 104.06 CPU)
Result: FAIL
gmake: *** [GNUmakefile:1811: test] Error 7

Repeating and turning on the gdb flags, shows the same set of errors, and running a gdb on op/gv.t shows a similar stack:

...<snip>...
ok 174
ok 175
ok 176
gdb: unknown target exception 0xc00000ff at 0x7fff9d18fcad

Thread 1 received signal ?, Unknown signal.
0x00007fff9d18fcad in ntdll!RtlRaiseStatus () from C:\WINDOWS\SYSTEM32\ntdll.dll
(gdb) bt
#0  0x00007fff9d18fcad in ntdll!RtlRaiseStatus () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007fff9d1c0543 in ntdll!RtlNotifyFeatureUsage () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00007fff9d13523d in ntdll!RtlUnwind () from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x00007fff9b09686b in msvcrt!_setjmpex () from C:\WINDOWS\System32\msvcrt.dll
#4  0x000000006d0114f4 in Perl_die_unwind (my_perl=0x0, my_perl@entry=0x714fe8, msv=msv@entry=0x28be6a8)
    at ..\pp_ctl.c:1857
#5  0x000000006d14c7cb in Perl_vcroak (my_perl=0x714fe8, pat=<optimized out>, args=<optimized out>) at ..\util.c:2011
#6  0x000000006d14e512 in Perl_croak (my_perl=0x0, my_perl@entry=0x714fe8,
    pat=0x8 <error: Cannot access memory at address 0x8>,
    pat@entry=0x6d34e6a0 <bodies_by_type+4032> "Cannot convert a reference to %s to typeglob") at ..\util.c:2062
#7  0x000000006d068b08 in Perl_gv_init_pvn (my_perl=my_perl@entry=0x714fe8, gv=0x2862b70, stash=0x7187f8,
    name=0x28d8720 "oonk", len=4, flags=flags@entry=2) at ..\gv.c:477
#8  0x000000006d06a498 in Perl_gv_fetchpvn_flags (my_perl=my_perl@entry=0x714fe8, nambeg=0x28d8720 "oonk",
    full_len=4, flags=<optimized out>, sv_type=sv_type@entry=SVt_PVCV) at ..\gv.c:2679
#9  0x000000006d07063a in Perl_gv_fetchsv (my_perl=my_perl@entry=0x714fe8, name=name@entry=0x28bdd00,
    flags=flags@entry=2, sv_type=sv_type@entry=SVt_PVCV) at ..\gv.c:1756
#10 0x000000006cf5a997 in Perl_ck_rvconst (my_perl=0x714fe8, o=0x28f7b08) at op.c:11986
#11 0x000000006cf63284 in Perl_newUNOP (my_perl=my_perl@entry=0x714fe8, type=<optimized out>, type@entry=16,
    flags=<optimized out>, first=0x28f7b40) at op.c:5449
#12 0x000000006cf64410 in Perl_newCVREF (my_perl=my_perl@entry=0x714fe8, flags=<optimized out>, o=<optimized out>)
    at op.c:11369
#13 0x000000006d10cc7c in Perl_yyparse (my_perl=my_perl@entry=0x714fe8, gramtype=gramtype@entry=258) at perly.y:1455
#14 0x000000006d00140a in S_doeval_compile (my_perl=my_perl@entry=0x714fe8, gimme=gimme@entry=2 '\002',
    outside=outside@entry=0x718ac8, seq=<optimized out>, hh=hh@entry=0x0) at ..\pp_ctl.c:3694
#15 0x000000006d0189e4 in Perl_pp_entereval (my_perl=0x714fe8) at ..\pp_ctl.c:4677
#16 0x000000006d09dbea in Perl_runops_debug (my_perl=0x714fe8) at ..\dump.c:2676
#17 0x000000006d08e094 in S_run_body (oldscope=0, my_perl=0x401519 <atexit+9>) at perl.c:2755
#18 perl_run (my_perl=0x401519 <atexit+9>, my_perl@entry=0x714fe8) at perl.c:2678
#19 0x000000006d15c758 in RunPerl (argc=<optimized out>, argv=<optimized out>, env=0x714d70) at perllib.c:201
--Type <RET> for more, q to quit, c to continue without paging--
#20 0x00000000004013c7 in __tmainCRTStartup ()
#21 0x00000000004014fb in mainCRTStartup ()
(gdb)

It's obviously (?) a toolchain issue and I'm assuming that Win11 somehow is more stringent/whatever in something and manages to entice the problem out of the woodwork where Win10 does not. But I think it would be extremely important to have at least one independent observation of this on another Win11 just to remove my particular machine as a potential source of problems. Still, having this issue reported much earlier makes this less likely.

According to #19912, there is a gcc 12 used. @sisyphus, any chance of getting a copy of that to try?

@sisyphus
Copy link
Contributor

@sisyphus, any chance of getting a copy of that to try?

I'm sorry that it has taken me a couple of days to notice that request.
I use the "MSVCRT Runtime" release of gcc-12.1.0 from https://winlibs.com :
https://github.com/brechtsanders/winlibs_mingw/releases/download/12.1.0-14.0.6-10.0.0-msvcrt-r3/winlibs-x86_64-posix-seh-gcc-12.1.0-llvm-14.0.6-mingw-w64msvcrt-10.0.0-r3.7z

(There's also a version available there that doesn't have LLVM/Clang/LLD/LLDB if you prefer.)

Cheers,
Rob

@kenneth-olwing
Copy link
Contributor Author

Thanks for the winlibs link; the short version is that I've tried as-is...and get other errors than with 8.3.0 from Strawberry. I decided not to pursue this at this point since it becomes so many variants of problems to keep track of. A guess is that there's a fair chance that the root cause is the same with the newer gcc, it's just moving to another place. When I've come to a stable point with 8.3.0 I will retry with those options.

And that is a problem: the results are mostly stable as long as I don't touch settings, but not completely, making it hard to come up with truly understanding it. Especially since it appears to be from a code-generation/optimization issue which I have no experience in.

So while it eventually would be great to actually find and fix the real problem, I've instead tried to find a stable workaround using more of brute force...:-)

The hypothesis is that switching -O2 to -Os makes things work.

However, I've found that this occasionally still left the win32\popen.t test failing. Running it manually however, it usually works...:-/. As seen above when compared with the other test failures, there was never any evidence of this dying so it was a bit of an anomaly anyway. I realized that it's setting a watchdog for 20 secs to kill the test if it's not finished. This should leave ample time for the fork and stuff it does.

However, I'm wondering if part of the problem becomes the watchdog stuff itself...no actual known reason, but since I commented out the watchdog, it never fails. Hence, I would claim this is a change to be made regardless of the optimization issue. Unless anything else crops up I'll probably make a PR out of it.

I'm using this content in the tests I show below. I also use the fix in #20033.

Trying to find the offending optimization

Reading https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/Optimize-Options.html, I read it as stating that -Os enables all the -O2 options except a few and adding one more (this differs slightly between versions). Given that, I've tried to figure out which of the option(s) makes things go awry and have arrived at using the following:

OPTIMIZE = -Os -falign-functions -falign-jumps -falign-labels -falign-loops -freorder-blocks -freorder-blocks-algorithm=stc -freorder-blocks-and-partition.

IIUC, the one thing different from -O2 is then the -Os addition of -finline-functions, and a lack of -fprefetch-loop-arrays (and this flag is also outright disallowed when using -Os).

This configuration has been run repeatedly and seems to be stable in that it has so far never given any spurious errors.

Logically, it should then be possible to go the other way around, e.g. use -O2 but specifically reverse the differences to above. Unfortunately, this have not met with success so far.

I may be missing something though...checking with -Q --help=optimizers to perhaps gain an insight yields that the difference between -O2 and -Os is that the latter enables -finline-functions and disables -foptimize-strlen. These are the only differences a diff lists. And, -fprefetch-loop-arrays lists as enabled with -Os despite warning me when I tru to manually turn it on with -Os. So I'm confused as the docs seem to be saying different things...:-(. I'm retrying with only those options as the differences now.

Any thoughts?

@sisyphus
Copy link
Contributor

Any thoughts?

I have a suspicion that "threads" is also playing a part - at least in the weirdness that I'm seeing on Windows 7.
For example, if I build perl unthreaded (USE_MULTI=undef USE_ITHREADS=undef USE_IMP_SYS=undef) I can specify -O2 optimization && build with gcc-12.1.
The test suite then runs nicely and passes all tests.

The downside to unthreaded perls on Windows is that you lose the fork() function - and that, for example, breaks the cpan utility as its make test phase assumes that fork() is available.
This has been reported at https://rt.cpan.org/Ticket/Display.html?id=143500
As mentioned there, my workaround is to install cpanm (cpan -iT App::cpanminus) and use it to install modules.

Anyway, @kenneth-olwing, that's just another variant you could consider ... though I'm not sure that's exactly what you you were looking for ;-)

Just in case it is (or becomes) relevant, be mindful of the fact that the older StrawberryPerl mingw-w64 compilers define _WIN32_WINNT to a different value than the newer winlibs ones.

For me, Visual Studio (MSVC142) built perl-5.37.2 on Windows 7 without any problem at all. It might (or might not) be helpful to know whether Visual Studio provides the same trouble-free result on Windows 11.

Cheers,
Rob

@kenneth-olwing
Copy link
Contributor Author

Right. I'm sure you're right in that it's somehow thread-related (also). However, I need threading so disregarding that is not an option for me :-/

In short, I have a fairly large toolset in Perl at work which needs to work on both Linux & Windows. It's on the order of +20 years old and when I a few years ago was able to lay my hands on the project, I've worked hard to improve and modernize it in all the ways I can. A very important step was to get a common Perl version - before me it was constrained to whatever RHEL shipped, and an antique hand-modded (and somewhat broken) ActiveState 5.8.9 package on Windows...completely unworkable, I set up a build system to build Perl from source on Linux, use Strawberry on Windows, and then outfit them with the 'same' set of modules.

So, that's my ultimate need, i.e. upgrade to a later Perl on both platforms. While this experience makes me consider building from source on Windows too, my secondary goal is to poke around so Strawberry can move forward - it's what I'd choose for my home projects obviously.

Rather than mess around with the zillion optimization flags in GCC I've so far had zero problems with just plain '-Os' (with the caveat that as described above the win32\popen.t watchdog is removed, and the Errno fix is applied) so I'm considering placing PR's for them so it's at least stable for everyone (?), and someone with the means can worry about -O2 later. Maybe I'd try to get clarification on how the optim flags actually are supposed to be combined from the gcc folks.

For the simple reason that 8.3.0 is a known quantity in Strawberry I figure it's reasonable to start with that and move on to 12.1+ later. Also, since Strawberry comes with a large amount of extra libs that I'm assuming is needed/useful for many things that also helps. Then again, if a coming Strawberry incorporates a later gcc, that would make sense to use then of course.

Again however, getting independent confirmation on having issues on Win11 would be really great...:-/

Thanks,
ken1

@sisyphus
Copy link
Contributor

There's some comments in t/win32/popen.t that are a little obscure:

# [perl #77672] backticks capture text printed to stdout when working
# with multiple threads on windows
watchdog(20); # before the fix this would often lock up

#77672 (which I haven't yet located) is referenced in perl5200delta - and the claim seems to be that it fixed the issue you are now seeing.
Maybe #77672 now needs some further adjustment.
Maybe the watchdog can simply be removed.

I can't find any written record of having problems with that test file on my Windows 7 machine ... though it (vaguely) rings a bell ... maybe some unusual perl configuration for which I've kept no record.

Note that the OPTIMIZE setting is already configurable via the command line (OPTIMIZE=-Os).
I think it would also be a good idea if the GNUmakefile was amended so that its leading "Build configuration" section provided an OPTIMIZE option that could be hand edited.

Having a default value of -Os seems sane to me.
Of courser, that would break a long history of the default being -O2, which might cause some annoyance to someone, somewhere.

Cheers,
Rob

Cheers,
Rob

@kenneth-olwing
Copy link
Contributor Author

Standard disclaimer by now: It's still possible that all this is a 'me' problem in my context only...

Re: win32\popen.t

The tricky part is of course that the watchdog doesn't 'fix' any problem, just acknowledges and takes us out of a hang. Actually, just a possible hang - I've had a few instances where running the test manually takes an unusual long time, e.g. far longer than 20 secs, but still completed. I know, makes no sense either since it should be very quick. Some weird thread scheduling? Buffer sizes on the pipes for the qx process causing some kind of almost-but-not-quite race condition? Just on Win11?
In principle, the watchdog could be set to a very much larger timeout, but there's no evidence that that would make a difference (and really doesn't address whatever the underlying issue really is). Considering that the watchdog itself is a non-trivial piece of code that might interact non-favorably with the test, I personally would - at this point at least - just remove it.

Hmmm...or perhaps better, it could be a configurable thing, e.g. for those of us seeing this particular problem (it is only relevant during a local build test anyway, different from the optimize thing), setting an envvar, e.g. WIN32_POPEN_T_WATCHDOG to a value which is the secs, where '0' turns it off completely would be sufficient. Sort of eat the cake and have it.

Re: OPTIMIZE

My chief reason to de facto changing the default to -Os would be that, as it stands, -O2 appears to work fine with a build+test on a Win10 machine, but not on Win11. The worst scenario with requiring a change on commandline or local patching of GNUmakefile is that someone unsuspecting of the problem with -O2 would build with factory settings on Win10, and then distribute to Win11 users that would/could get weird errors in odd situations. Not that I know how big (if any) performance impact it produces but rather a slow working build than a fast crashing one...:-).

I guess there could be provisions for keeping -O2 when not using threads of course...

Thanks for your (and everyone elses) insight and work with Perl,
ken1

@khwilliamson
Copy link
Contributor

Is this fixed by the merge of #20136 ?

@kenneth-olwing
Copy link
Contributor Author

Yes, it is, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants