Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEADLOCK: fork from multi-threaded parent => locks inconsistent #239

Open
derekbruening opened this issue Nov 27, 2014 · 3 comments
Open

Comments

@derekbruening
Copy link
Contributor

From derek.br...@gmail.com on December 01, 2009 11:49:58

when a multi-threaded process forks, the fork child is alone in its new
process but other threads could have held locks at the fork point. we need
to have a way to iterate over all locks held in release build and free them.

I added cleanup of other thread data structs as part of issue #237

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=239

@derekbruening
Copy link
Contributor Author

From derek.br...@gmail.com on April 06, 2010 10:38:56

alternatively, we need to do a synchall at pre-fork and leave the
threads suspended for the syscall: if it fails we then resume them.

@derekbruening
Copy link
Contributor Author

From rnk@google.com on May 24, 2012 09:49:37

This bites us when running chrome's unit_tests binary on Linux:

(gdb) bt
#0 syscall_ready () at /home/rnk/disk/dynamorio/core/x86/x86.asm:1082
#1 0x0000000000040fc0 in ?? ()
#2 0x000000007129cd63 in futex_wait (futex=0x774ff098, mustbe=1) at /home/rnk/disk/dynamorio/core/linux/os.c:2695
#3 0x00000000712ac911 in mutex_wait_contended_lock (lock=0x774ff098) at /home/rnk/disk/dynamorio/core/linux/os.c:8098
#4 0x000000007110c514 in mutex_lock (lock=0x774ff098) at /home/rnk/disk/dynamorio/core/utils.c:856
#5 0x000000007110ce09 in write_lock (rw=0x774ff098) at /home/rnk/disk/dynamorio/core/utils.c:1171
#6 0x00000000712ac7a0 in all_memory_areas_lock () at /home/rnk/disk/dynamorio/core/linux/os.c:8046
#7 0x00000000711b87d0 in dynamo_vm_areas_lock () at /home/rnk/disk/dynamorio/core/vmareas.c:3457
#8 0x0000000071198ccb in heap_free_unit (unit=0x781ad000, dcontext=0x77520400) at /home/rnk/disk/dynamorio/core/heap.c:2705
#9 0x000000007119e040 in common_heap_free (tu=0x77520d28, p_void=0x781ad040, size=262220, which=ACCT_IBLTABLE) at /home/rnk/disk/dynamorio/core/heap.c:3527
#10 0x000000007119f2d2 in nonpersistent_heap_free (dcontext=0x77520400, p=0x781ad040, size=262220, which=ACCT_IBLTABLE) at /home/rnk/disk/dynamorio/core/heap.c:3737
#11 0x00000000710898d7 in hashtable_ibl_free_table (alloc_dc=0x77520400, table_unaligned=0x781ad040, flags=150, capacity=16385) at /home/rnk/disk/dynamorio/core/hashtablex.h:576
#12 0x0000000071089aac in hashtable_ibl_free (dcontext=0x77520400, table=0x77521a28) at /home/rnk/disk/dynamorio/core/hashtablex.h:617
#13 0x00000000710947aa in hashtable_ibl_myfree (dcontext=0x77520400, table=0x77521a28) at /home/rnk/disk/dynamorio/core/fragment.c:815
#14 0x0000000071099217 in fragment_thread_reset_free (dcontext=0x77520400) at /home/rnk/disk/dynamorio/core/fragment.c:2179
#15 0x0000000071099768 in fragment_thread_exit (dcontext=0x77520400) at /home/rnk/disk/dynamorio/core/fragment.c:2265
#16 0x000000007108542f in dynamo_thread_exit_pre_client (dcontext=0x77520400, id=27224) at /home/rnk/disk/dynamorio/core/dynamo.c:2191
#17 0x0000000071085674 in dynamo_thread_exit_common (dcontext=0x77520400, id=27224, other_thread=true) at /home/rnk/disk/dynamorio/core/dynamo.c:2311
#18 0x0000000071085fa9 in dynamo_other_thread_exit (tr=0x7751db38) at /home/rnk/disk/dynamorio/core/dynamo.c:2425
#19 0x000000007108321b in dynamorio_fork_init (dcontext=0x7751df80) at /home/rnk/disk/dynamorio/core/dynamo.c:765
#20 0x00000000712a8264 in post_system_call (dcontext=0x7751df80) at /home/rnk/disk/dynamorio/core/linux/os.c:6123
#21 0x00000000710ffccf in handle_post_system_call (dcontext=0x7751df80) at /home/rnk/disk/dynamorio/core/dispatch.c:1872
#22 0x00000000710f7304 in dispatch_enter_dynamorio (dcontext=0x7751df80) at /home/rnk/disk/dynamorio/core/dispatch.c:737
#23 0x00000000710f3c4b in dispatch (dcontext=0x7751df80) at /home/rnk/disk/dynamorio/core/dispatch.c:142

There's only one thread in the process I mentioned above, so the mutex will never be released.

In a typical app there are a few approaches to fork and threads:

  1. exec right away before acquiring any locks (no malloc, no real code)
  2. acquire all needed locks before the fork, release them after the fork and continue
  3. reinitialize all locks and accept that the data they guarded will be in an undefined state

1 isn't an option for DR.

Synchronizing threads in DR is similar to acquiring a really big global lock, which is pretty close to solution 2.

We could try to pursue 3, which is what we try to do now, if 2 isn't feasible.

@derekbruening
Copy link
Contributor Author

From rnk@google.com on May 30, 2012 10:10:28

I implemented the synchall solution here: http://codereview.appspot.com/6247048/ It works most of the time, but it's actually hitting issue #26 (receiving signals during thread init) now.

If an app is spawning threads and forking at the same time, then it is very likely that we will be unable to synch with the child thread. Currently, we call add_thread in the child. Here are the possible places where we might try to synch:

  • Between thread entry and add_thread(), the child thread will not be in all_threads, so we won't try to synch with it. It will run freely across the fork, and we'll have potential leaks and deadlocks.
  • Between add_thread() and siginfo initialization, the child thread will receive the signal, but it will only print an error syslog and drop it on the floor.
  • After siginfo initialization, the child will receive the suspend signal and synch successfully.

Despite these issues, the CL as written improves on the current situation for apps that don't spawn threads and fork concurrently. IMO we should commit the current change, but leave this issue open and blocked on issue #26 .

Owner: rnk@google.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant