pantsd-runner hang in graph acquisition #5169

stuhood · 2017-12-05T21:24:07Z

The current thread is hung trying to acquire the scheduler lock.

I'm going to try pushing the scheduler lock down to the rust side.

Thread 0x000070000b276000 (most recent call first):
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/reporting/report.py", line 34 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x000070000ae73000 (most recent call first):
  File "/opt/ee/python/2.7/lib/python2.7/multiprocessing/pool.py", line 389 in _handle_results
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x000070000aa70000 (most recent call first):
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 340 in wait
  File "/opt/ee/python/2.7/lib/python2.7/Queue.py", line 168 in get
  File "/opt/ee/python/2.7/lib/python2.7/multiprocessing/pool.py", line 336 in _handle_tasks
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x000070000a66d000 (most recent call first):
  File "/opt/ee/python/2.7/lib/python2.7/multiprocessing/pool.py", line 327 in _handle_workers
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x0000700002d95000 (most recent call first):
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 340 in wait
  File "/opt/ee/python/2.7/lib/python2.7/Queue.py", line 168 in get
  File "/Users/stuhood/.pex/install/futures-3.0.5-py2-none-any.whl.ab865f3c2a112dab372f8017be358c0c2f575561/futures-3.0.5-py2-none-any.whl/concurrent/futures/thread.py", line 65 in _worker
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x0000700002992000 (most recent call first):
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 340 in wait
  File "/opt/ee/python/2.7/lib/python2.7/Queue.py", line 168 in get
  File "/Users/stuhood/.pex/install/futures-3.0.5-py2-none-any.whl.ab865f3c2a112dab372f8017be358c0c2f575561/futures-3.0.5-py2-none-any.whl/concurrent/futures/thread.py", line 65 in _worker
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Current thread 0x000070000258f000 (most recent call first):
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 174 in acquire
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/engine_initializer.py", line 96 in create_build_graph
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/goal_runner.py", line 115 in _init_graph
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/goal_runner.py", line 180 in _setup_context
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/goal_runner.py", line 208 in setup
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/local_pants_runner.py", line 77 in _run
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/local_pants_runner.py", line 37 in run
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/daemon_pants_runner.py", line 171 in post_fork_child
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/process_manager.py", line 451 in daemonize
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/daemon_pants_runner.py", line 132 in run
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/pailgun_server.py", line 58 in _run_pants
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/pailgun_server.py", line 81 in handle
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/pailgun_server.py", line 41 in handle_request
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/pailgun_server.py", line 135 in process_request
  File "/opt/ee/python/2.7/lib/python2.7/SocketServer.py", line 295 in _handle_request_noblock
  File "/opt/ee/python/2.7/lib/python2.7/SocketServer.py", line 280 in handle_request
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/service/pailgun_service.py", line 94 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x000070000218c000 (most recent call first):
  File "/Users/stuhood/.pex/install/pywatchman-1.4.1-cp27-cp27m-macosx_10_4_x86_64.whl.176c2b1fae2f3b26d3f824138b0754ed86f60f9a/pywatchman-1.4.1-cp27-cp27m-macosx_10_4_x86_64.whl/pywatchman/__init__.py", line 339 in readBytes
  File "/Users/stuhood/.pex/install/pywatchman-1.4.1-cp27-cp27m-macosx_10_4_x86_64.whl.176c2b1fae2f3b26d3f824138b0754ed86f60f9a/pywatchman-1.4.1-cp27-cp27m-macosx_10_4_x86_64.whl/pywatchman/__init__.py", line 695 in receive
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/watchman_client.py", line 50 in stream_query
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/watchman.py", line 178 in subscribed
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/service/fs_event_service.py", line 116 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x0000700001d89000 (most recent call first):
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/service/scheduler_service.py", line 71 in _process_event_queue
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/service/scheduler_service.py", line 101 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 763 in run
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007fffaa3cf340 (most recent call first):
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 359 in wait
  File "/opt/ee/python/2.7/lib/python2.7/threading.py", line 960 in join
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/pants_daemon.py", line 259 in _run_services
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/pants_daemon.py", line 283 in run_sync
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/pantsd/pants_daemon.py", line 329 in launch
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/pants_loader.py", line 58 in load_and_execute
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/pants_loader.py", line 65 in run
  File "/Users/stuhood/.pex/install/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl.46445b2c7acbf09afd084ac31d3da3e261ae5636/pantsbuild.pants-1.4.0.dev16+899381073-cp27-none-macosx_10_8_x86_64.whl/pants/bin/pants_loader.py", line 69 in main
  File ".bootstrap/_pex/pex.py", line 484 in execute_pkg_resources
  File ".bootstrap/_pex/pex.py", line 466 in execute_entry
  File ".bootstrap/_pex/pex.py", line 408 in _execute
  File ".bootstrap/_pex/pex.py", line 325 in _wrap_profiling
  File ".bootstrap/_pex/pex.py", line 293 in _wrap_coverage
  File ".bootstrap/_pex/pex.py", line 365 in execute
  File ".bootstrap/_pex/pex_bootstrapper.py", line 83 in bootstrap_pex
  File "__main__.py", line 25 in <module>
  File "/opt/ee/python/2.7/lib/python2.7/runpy.py", line 72 in _run_code
  File "/opt/ee/python/2.7/lib/python2.7/runpy.py", line 162 in _run_module_as_main

The text was updated successfully, but these errors were encountered:

stuhood · 2017-12-05T23:16:11Z

I suspect that this is because on the python side, Scheduler._lock ends up used all over the place because the PantsDaemon class is bootstrapped with a reference to it. It seems like that should be a separate lock.

stuhood · 2017-12-08T23:44:45Z

After implementing #5178, I was still seeing this, but inside of a lock lower in the stack.

I bisected with:

git bisect start HEAD 7ac68c6b8fe473a1a218d7e54f78cf6c344d8b59 -- src/python/pants/engine/ build-support src/rust/engine/

while :; do ./pants list :: | wc -l ; done
while :; do touch 3rdparty/jvm/commons-io/BUILD ; sleep 1 | wc -l ; done

...and came to #4931.

stuhood · 2017-12-09T01:03:01Z

Dagnabit... tried removing the comment from the PR, but it didn't work. As mentioned in the commit message, this one isn't fixed yet.

### Problem #5169 describes a deadlock that occurs when any native-engine lock is acquired by a non-main thread during a fork. Pre #4931 the `context_lock` was acquired for the entire length of a request, but during that change that lock acquisition was removed. Kris fixed this as part of #5156, but there is not yet a test confirming that the issue is fixed. ### Solution Add a test to cover #5169. ### Result Covers #5169 with an integration test.

stuhood · 2017-12-13T00:50:51Z

Fixed in #5156, and covered in #5192.

### Problem The scheduler lock existed for a few reasons: 1. To prevent `Nodes` from being invalidated out of the `Graph` while they were being consumed by other `Nodes`. Because `Context` holds an `EntryId` for the running `Node`, the `Node` would fail (...panic, actually) at the next attempt to resolve a dependency. 2. To protect access to the `roots` field of the Scheduler, which was mutated in order to statefully build up the set of roots via the `add_root_select` method, and then capture results later. 3. We used to walk the graph from the python side (but no longer), which meant that `Node`s needed their lifecycle to be tightly controlled. The API no longer allows for walking inner `Node`s, so only roots are relevant now. ### Solution Introduce a new `Failure::Invalidated` type to act as a signal that a `Node`'s `Context`/`EntryId` was invalidated while it was running (due to a file change on disk). The `Scheduler` will retry creating a `Node` a few times to allow execution to recover in the common case of minor file edits. Move the `roots` field off of `Scheduler` and onto a disposable `ExecutionRequest` struct, and have all relevant native functions take a reference to the `ExecutionRequest`. ### Result The scheduler lock is removed. This does _not_ fix pantsbuild#5169, but it should clear up the python-side locking situation to assist in debugging it.

…d#5192) ### Problem pantsbuild#5169 describes a deadlock that occurs when any native-engine lock is acquired by a non-main thread during a fork. Pre pantsbuild#4931 the `context_lock` was acquired for the entire length of a request, but during that change that lock acquisition was removed. Kris fixed this as part of pantsbuild#5156, but there is not yet a test confirming that the issue is fixed. ### Solution Add a test to cover pantsbuild#5169. ### Result Covers pantsbuild#5169 with an integration test.

stuhood added bug pantsd labels Dec 5, 2017

stuhood added this to the Daemon Beta milestone Dec 5, 2017

stuhood self-assigned this Dec 5, 2017

stuhood mentioned this issue Dec 6, 2017

Remove scheduler lock #5178

Merged

stuhood closed this as completed in b0ae261 Dec 9, 2017

stuhood reopened this Dec 9, 2017

stuhood mentioned this issue Dec 11, 2017

Add integration test for forking during file invalidation #5192

Merged

stuhood added a commit to twitter/pants that referenced this issue Dec 12, 2017

Add integration test to cover the fix for pantsbuild#5169.

b0e178a

stuhood added a commit to twitter/pants that referenced this issue Dec 12, 2017

Add integration test to cover the fix for pantsbuild#5169.

e578a2f

stuhood closed this as completed Dec 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pantsd-runner hang in graph acquisition #5169

pantsd-runner hang in graph acquisition #5169

stuhood commented Dec 5, 2017

stuhood commented Dec 5, 2017

stuhood commented Dec 8, 2017

stuhood commented Dec 9, 2017

stuhood commented Dec 13, 2017

pantsd-runner hang in graph acquisition #5169

pantsd-runner hang in graph acquisition #5169

Comments

stuhood commented Dec 5, 2017

stuhood commented Dec 5, 2017

stuhood commented Dec 8, 2017

stuhood commented Dec 9, 2017

stuhood commented Dec 13, 2017