[pantsd] Launch the daemon via the thin client. #4931

kwlzn · 2017-10-02T19:18:51Z

Problem

Currently, the daemon is launched in the middle of a pants run using a Subsystem. Because this happens in the middle of the run and relies on things like Subsystem and options initialization the first run is always "throw away" and doesn't use the daemon. Furthermore, having the lifecycle in the middle of the run vs at the beginning hobbles our ability to cleanly perform lifecycle operations like inline daemon restarts.

Solution

Move all daemon related options to bootstrap options to enable use of the previous daemon-supporting Subsystem instances earlier in the run. Move the responsibility of launching the daemon to the thin client runner, so that it can synchronously start and use the daemon inline.

Result

The thin client can now launch the daemon itself and use it for the first run.

kwlzn · 2017-10-09T20:30:50Z

noting that some changes here further perpetuate #4917 by moving options from subsystem scopes to bootstrap options. my intent is to circle back to #4917 ASAP to resolve this unilaterally for all affected options - but we've declared a "fixit week" at Twitter this week so I likely won't be able to get started on that until next week.

stuhood

Regarding your most recent comment: is the idea to land this before fixing #4917, or to fix #4917 first?

Over all, this looks good. But I'd like to see a solution for deprecating the options before we land this much option movement.

stuhood · 2017-10-09T22:01:09Z

src/python/pants/core_tasks/register.py

@@ -59,7 +59,7 @@ def register_goals():
  # Pantsd.
  kill_pantsd = task(name='kill-pantsd', action=PantsDaemonKill)
  kill_pantsd.install()
-  kill_pantsd.install('clean-all')
+  kill_pantsd.install('clean-all', first=True)


Can you comment here as to why this is necessary to run first?

stuhood · 2017-10-09T22:05:35Z

src/python/pants/option/global_options.py

@@ -36,6 +37,8 @@ def register_bootstrap_options(cls, register):
    status as "bootstrap options" is only pertinent during option registration.
    """
    buildroot = get_buildroot()
+    default_distdir = os.path.join(buildroot, 'dist')
+    default_rel_distdir = '/{}/'.format(fast_relpath(default_distdir, get_buildroot()))


This is computed a few lines above, so using fast_relpath to tear it back apart doesn't make sense.

stuhood · 2017-10-09T22:23:17Z

src/python/pants/pantsd/pants_daemon.py

@@ -164,6 +165,9 @@ def _run_services(self, services):
        self.shutdown(service_thread_map)
        raise self.StartupFailure('service {} failed to start, shutting down!'.format(service))

+    # Once all services are strted, write our pid.


started

Also, does it make sense to wait until everything starts successfully before writing the PID? If you fail to start up, you'd still want to be killable.

today, yes. any service failure at or just after startup will automatically cause the daemon to teardown by design. once the pid is written, the thin client will immediately try to connect - so it's important to wait until everything is successfully started otherwise we'll be racing a pailgun run vs the teardown.

stuhood · 2017-10-09T22:24:51Z

src/python/pants/pantsd/pants_daemon_launcher.py

+    self._subproject_roots = bootstrap_options.subproject_roots
+    self._metadata_base_dir = bootstrap_options.pants_subprocessdir
+
+    # TODO(kwlzn): Thread filesystem path ignores here to Watchman's subscription registration.


Can link to #3479 here.

stuhood · 2017-10-09T22:28:22Z

src/python/pants/pantsd/watchman.py

@@ -147,8 +149,12 @@ def watch_project(self, path):
    try:
      return self.client.query('watch-project', os.path.realpath(path))
    finally:
-      self.client.setTimeout(self._timeout)
-      self._logger.debug('setting post-startup watchman timeout to %s', self._timeout)
+      try:


nested try-catches not terribly clear... maybe break into another method?

stuhood · 2017-10-09T22:31:48Z

(...or perhaps this option movement is less egregious than the breakage on #4917 anyway, and so whether we land this or #4917 first doesn't matter...)

kwlzn · 2017-10-17T20:40:55Z

I'd be ok with landing this as-is and then circling back to unilaterally fix #4917 as a follow-on, since its already broken in master. looking into #4917 now.

stuhood · 2017-10-18T00:06:32Z

@kwlzn : Fine with me!

kwlzn · 2017-10-18T17:10:39Z

merging this based on a green CI mod 1 master breakage and 1 flaky test

### Problem #5169 describes a deadlock that occurs when any native-engine lock is acquired by a non-main thread during a fork. Pre #4931 the `context_lock` was acquired for the entire length of a request, but during that change that lock acquisition was removed. Kris fixed this as part of #5156, but there is not yet a test confirming that the issue is fixed. ### Solution Add a test to cover #5169. ### Result Covers #5169 with an integration test.

…d#5192) ### Problem pantsbuild#5169 describes a deadlock that occurs when any native-engine lock is acquired by a non-main thread during a fork. Pre pantsbuild#4931 the `context_lock` was acquired for the entire length of a request, but during that change that lock acquisition was removed. Kris fixed this as part of pantsbuild#5156, but there is not yet a test confirming that the issue is fixed. ### Solution Add a test to cover pantsbuild#5169. ### Result Covers pantsbuild#5169 with an integration test.

kwlzn force-pushed the kwlzn/externalize_daemon branch 14 times, most recently from 354bc37 to 1daf133 Compare October 9, 2017 19:04

kwlzn changed the title ~~[WIP] [pantsd] Launch the daemon via the thin client.~~ [pantsd] Launch the daemon via the thin client. Oct 9, 2017

kwlzn requested review from stuhood, jsirois and dotordogh October 9, 2017 19:50

stuhood approved these changes Oct 9, 2017

View reviewed changes

kwlzn added 5 commits October 17, 2017 13:18

[pantsd] Launch the daemon via the thin client.

8a45ff8

Comment.

cb22840

Relpath.

031ca16

Comments.

1d4a966

Unnest.

d03e488

kwlzn force-pushed the kwlzn/externalize_daemon branch from 1daf133 to d03e488 Compare October 17, 2017 20:34

kwlzn merged commit 0d01035 into pantsbuild:master Oct 18, 2017

kwlzn mentioned this pull request Oct 25, 2017

[pantsd] more graceful failure mode for watchman crash #4409

Closed

This was referenced Dec 8, 2017

pantsd-runner hang in graph acquisition #5169

Closed

Add integration test for forking during file invalidation #5192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pantsd] Launch the daemon via the thin client. #4931

[pantsd] Launch the daemon via the thin client. #4931

kwlzn commented Oct 2, 2017 •

edited

Loading

kwlzn commented Oct 9, 2017

stuhood left a comment

stuhood Oct 9, 2017

kwlzn Oct 17, 2017

stuhood Oct 9, 2017

kwlzn Oct 17, 2017

stuhood Oct 9, 2017

kwlzn Oct 17, 2017

stuhood Oct 9, 2017

kwlzn Oct 17, 2017

stuhood Oct 9, 2017

kwlzn Oct 17, 2017

stuhood commented Oct 9, 2017

kwlzn commented Oct 17, 2017

stuhood commented Oct 18, 2017

kwlzn commented Oct 18, 2017

[pantsd] Launch the daemon via the thin client. #4931

[pantsd] Launch the daemon via the thin client. #4931

Conversation

kwlzn commented Oct 2, 2017 • edited Loading

Problem

Solution

Result

kwlzn commented Oct 9, 2017

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood commented Oct 9, 2017

kwlzn commented Oct 17, 2017

stuhood commented Oct 18, 2017

kwlzn commented Oct 18, 2017

kwlzn commented Oct 2, 2017 •

edited

Loading