-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t/op/fork.t blocks in external process test on Win32 #11267
Comments
From @tonycozCreated by @tonycozt/op/fork.t blocks running the test code: system $^X, "-e", "if (\$pid=fork){kill(9, \$pid)} else {sleep 5}"; This leaves a process running that Ctrl-C does not kill and needs to be Simply reproducable with: ..\perl -I..\lib -le "if ($pid=fork){ kill 9, $pid; } else { sleep 5;}" Loading a non-trivial module prevents the lockup: ..\perl -I..\lib -MEncode -le "if ($pid=fork){ kill 9, $pid; } else { sleep 5;}" Modifying win32.c to include Sleep(1) immediately before the call to Perl Info
|
From @bulk88I am using Characteristics of this binary (from libperl): C:\> The /op/fork.t freeze is this being run "perl -e "if ($pid=fork){kill(9,
In vmem.h at line 158. m_hLib is 0x000007ff7fc00000 which is C:\WINDOWS\system32\msvcrt.dll So I got private symbol global variable "LdrpLoaderLock" (google it) |
From @bulk88 |
The RT System itself - Status changed from 'new' to 'open' |
From @bulk88The only thread in the process is PID 45692, which is parent thread So I put a breakpoint on TerminateThread in win32_kill in win32.c and Ldr critical section is owned by the child thread (the PID in the For more on the behavior of the DLL loader lock, read this by MS |
From @bulk88 |
From @bulk88 |
From @bulk88The simplest (it can get much more complicated, I discuss later) fix is if(! child_perl->Isys_intern->psuedo_id){ The problem is, I dont think that the PerlInterpreter * of child interp Reason it is better to check if the child interp was ever scheduled at A more complicated alternative/higher overhead is creating a 1 time use A more complicated solution is to try to obtain the dll loader lock |
From @bulk88A half baked solution is to not have perlhost.h load a 2nd C std Lib |
From @bulk88I am done, I am waiting for comments from the rest of P5P now. |
From @bulk88I've been wondering, has this test always failed? or it only fails on |
From @tonycozOn Fri, Aug 03, 2012 at 12:59:18PM -0700, bulk 88 via RT wrote:
This is my suspicion for the cause.
Because no-one with sufficient Win32 knowledge cares enough about Win32 isn't the only platform where blead fails, and has failed for a Tony |
From @tonycozOn Thu, Aug 02, 2012 at 06:03:33PM -0700, bulk 88 via RT wrote:
The problem is other libraries (eg. any XS module) loads DLLs at It doesn't need to be an early kill to cause the lockup. I can see ways to fix this specific instance of the bug - eg. have Tony |
From @nwc10On Sun, Aug 05, 2012 at 05:09:20PM +1000, Tony Cook wrote:
Because it's not a regression, Ricardo has no way to compel anyone to fix
There actually aren't very many people working on the perl core. Even out into the "long tail" there is very little knowledge of Win32 And as there's no hiring organisation in "charge" of Perl 5, it means that
And whilst we'd love it to be otherwise, it's really not obvious how to There's no hostility to Win32 (or any other platform). It's just that there Nicholas Clark |
From @janduboisOn Sun, 05 Aug 2012, Tony Cook wrote:
I suspect it also doesn't happen when you link Perl against MSVCRT.dll Cheers, |
From @janduboisOn Sun, 05 Aug 2012, Tony Cook wrote:
I wonder if there isn't a simpler solution: Just add #undef malloc after all the Perl headers have been #included, and then later m_pfree = (LPFREE)free; etc and get rid of the whole LoadLibrary/FreeLibrary stuff. That way VMem There are more chances for code cleanup, but someone who observes the
Yes, the whole fork() emulation stuff is rather brittle and can never cease Cheers, |
From @bulk88Another more rare way of deadlocking, child thread was killed while
|
From @bulk88https://rt.cpan.org/Ticket/Display.html?id=66016 one of these is a |
From @bulk88On Mon Aug 06 21:15:09 2012, bulk88 wrote:
|
From @bulk88Wrote up a patch. fork.t/"fast kill 9 after fork" stopped deadlocking |
From @bulk88 |
From @bulk88On Wed Aug 08 17:59:27 2012, bulk88 wrote:
My last post made it to Perl RT but not P5P. This one hopefully will. |
From @tonycozOn Wed, Aug 08, 2012 at 05:59:28PM -0700, bulk 88 via RT wrote:
I tried this and ran t/op/fork.t in a loop. First time it deadlocked after 7 runs, second it deadlocked after 27 I increased delay to 10 ms and it successfully ran the test 500 times. Also, please use git format-patch to produce patches. Tony |
From @bulk88On Fri Aug 10 00:33:07 2012, tonyc wrote:
I wrote a script If it deadlocked for you, I am very interested in that, since my patch
________________________________________________________
|
From @bulk880001-fix-RT-88840-don-t-terminate-a-child-fork-psuedo-pro.patchFrom 2d681a6d132537b5febb79b96d5169b822b510cc Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Wed, 8 Aug 2012 20:46:31 -0400
Subject: [PATCH] fix RT#88840, don't terminate a child fork psuedo process in
DLL Loader Lock
TerminateThread will terminate a thread but leaks all resources
of that thread, and all locks will never be released, as documented in MSDN.
There is no alternative to locks not being released that I see, but atleast
-e "if ($pid=fork){kill(9,$pid)} else {sleep 5}"
in fork.t won't deadlock with this patch since win32_start_child be reached before
TerminateThread happens. The 5 ms timeout can be increased if problems
arise in the future. The HWND of the child is delivered by win32_start_child
very early, before any perl bytecode is executed, therefore the delay is
keeping in spirit with a kill 9. In any case, if the child thread fails
to schedule, (a DllMain in DLL_THREAD_ATTACH of some DLL in the process
deadlocks or does very long (over 5 ms right now) sync IO), the parent interp
will bail out.
---
win32/win32.c | 88 ++++++++++++++++++++++++++++++++++++++------------------
1 files changed, 60 insertions(+), 28 deletions(-)
diff --git a/win32/win32.c b/win32/win32.c
index 211ca6f..54159ca 100644
--- a/win32/win32.c
+++ b/win32/win32.c
@@ -1271,6 +1271,46 @@ my_kill(int pid, int sig)
return retval;
}
+
+
+#ifdef USE_ITHREADS
+/*
+ will get a child psuedo process hwnd, with retrying and sleeping
+ success is hwnd != INVALID_HANDLE_VALUE, so NULL HWND can be returned
+ ms is milliseconds to sleep/tries, each try is 1 millisec, fatally
+ errors if child psuedo process doesn't schedule and deliver a HWND in the
+ time period specified, 0 milliseconds causes only Sleep(0) to be used
+ with "no" OS delay being given to the calling thread, 0 msis not recommended
+*/
+static HWND
+get_hwnd_delay(pTHX, long child, DWORD ms){
+ HWND hwnd = w32_pseudo_child_message_hwnds[child];
+/* pseudo-process has not yet properly initialized if hwnd isn't set */
+ if (hwnd != INVALID_HANDLE_VALUE) return hwnd;
+/*fast sleep, on some NT Kernels/systems, a Sleep(0) won't deschedule a
+thread 100% of the time since threads are sticked to a CPU for NUMA
+and caching reasons, and the child thread was stickied to a different CPU
+therefore there is no workload on that CPU, and Sleep(0) returns control
+without yielding the time slot
+https://rt.perl.org/rt3/Ticket/Display.html?id=88840
+*/
+ Sleep(0);
+ win32_async_check(aTHX);
+ hwnd = w32_pseudo_child_message_hwnds[child];
+ if (hwnd != INVALID_HANDLE_VALUE) return hwnd;
+ {
+ int count = 0;
+ while (count++ < ms){ /*ms=0 no Sleep(1),just fail by now*/
+ Sleep(1);
+ win32_async_check(aTHX);
+ hwnd = w32_pseudo_child_message_hwnds[child];
+ if (hwnd != INVALID_HANDLE_VALUE) return hwnd;
+ }
+ }
+ Perl_croak(aTHX_ "panic: child psuedo process was never scheduled");
+}
+#endif
+
DllExport int
win32_kill(int pid, int sig)
{
@@ -1281,15 +1321,16 @@ win32_kill(int pid, int sig)
/* it is a pseudo-forked child */
child = find_pseudo_pid(-pid);
if (child >= 0) {
- HWND hwnd = w32_pseudo_child_message_hwnds[child];
HANDLE hProcess = w32_pseudo_child_handles[child];
switch (sig) {
case 0:
/* "Does process exist?" use of kill */
return 0;
- case 9:
+ case 9: {
/* kill -9 style un-graceful exit */
+/*do a wait to make sure child starts and isnt in DLL Loader Lock*/
+ HWND hwnd = get_hwnd_delay(aTHX, child, 5);/*XXX change delay*/
if (TerminateThread(hProcess, sig)) {
/* Allow the scheduler to finish cleaning up the other thread.
* Otherwise, if we ExitProcess() before another context switch
@@ -1301,36 +1342,27 @@ win32_kill(int pid, int sig)
remove_dead_pseudo_process(child);
return 0;
}
+ }
break;
default: {
- int count = 0;
- /* pseudo-process has not yet properly initialized if hwnd isn't set */
- while (hwnd == INVALID_HANDLE_VALUE && count < 5) {
- /* Yield and wait for the other thread to send us its message_hwnd */
- Sleep(0);
- win32_async_check(aTHX);
- hwnd = w32_pseudo_child_message_hwnds[child];
- ++count;
- }
- if (hwnd != INVALID_HANDLE_VALUE) {
- /* We fake signals to pseudo-processes using Win32
- * message queue. In Win9X the pids are negative already. */
- if ((hwnd != NULL && PostMessage(hwnd, WM_USER_KILL, sig, 0)) ||
- PostThreadMessage(-pid, WM_USER_KILL, sig, 0))
- {
- /* Don't wait for child process to terminate after we send a SIGTERM
- * because the child may be blocked in a system call and never receive
- * the signal.
- */
- if (sig == SIGTERM) {
- Sleep(0);
- w32_pseudo_child_sigterm[child] = 1;
- }
- /* It might be us ... */
- PERL_ASYNC_CHECK();
- return 0;
+ HWND hwnd = get_hwnd_delay(aTHX, child, 5);/*XXX change delay*/
+ /* We fake signals to pseudo-processes using Win32
+ * message queue. In Win9X the pids are negative already. */
+ if ((hwnd != NULL && PostMessage(hwnd, WM_USER_KILL, sig, 0)) ||
+ PostThreadMessage(-pid, WM_USER_KILL, sig, 0))
+ {
+ /* Don't wait for child process to terminate after we send a SIGTERM
+ * because the child may be blocked in a system call and never receive
+ * the signal.
+ */
+ if (sig == SIGTERM) {
+ Sleep(0);
+ w32_pseudo_child_sigterm[child] = 1;
}
+ /* It might be us ... */
+ PERL_ASYNC_CHECK();
+ return 0;
}
break;
}
--
1.7.9.msysgit.0
|
From @tonycozOn Fri, Aug 10, 2012 at 09:42:22AM -0700, bulk 88 via RT wrote:
I rebuilt from scratch and ran my test again, and couldn't reproduce Sorry for the noise. So I'm happy with this change. Tony |
From @bulk88Can someone apply the patch to perl git? |
From @steve-m-haybulk 88 via RT wrote on 2012-08-14:
Apologies for the delay. I will test it and expect to apply it in a few hours. Thanks for your contributions! |
From @steve-m-hayOn Tue Aug 14 01:15:45 2012, Steve.Hay@verosoftware.com wrote:
Thanks again, now committed in d903973. |
@steve-m-hay - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#88840 (status was 'resolved')
Searchable as RT88840$
The text was updated successfully, but these errors were encountered: