Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pantsd] Launch the daemon via the thin client. #4931

Merged
merged 5 commits into from
Oct 18, 2017

Conversation

kwlzn
Copy link
Member

@kwlzn kwlzn commented Oct 2, 2017

Problem

Currently, the daemon is launched in the middle of a pants run using a Subsystem. Because this happens in the middle of the run and relies on things like Subsystem and options initialization the first run is always "throw away" and doesn't use the daemon. Furthermore, having the lifecycle in the middle of the run vs at the beginning hobbles our ability to cleanly perform lifecycle operations like inline daemon restarts.

Solution

Move all daemon related options to bootstrap options to enable use of the previous daemon-supporting Subsystem instances earlier in the run. Move the responsibility of launching the daemon to the thin client runner, so that it can synchronously start and use the daemon inline.

Result

The thin client can now launch the daemon itself and use it for the first run.

@kwlzn kwlzn force-pushed the kwlzn/externalize_daemon branch 14 times, most recently from 354bc37 to 1daf133 Compare October 9, 2017 19:04
@kwlzn kwlzn changed the title [WIP] [pantsd] Launch the daemon via the thin client. [pantsd] Launch the daemon via the thin client. Oct 9, 2017
@kwlzn
Copy link
Member Author

kwlzn commented Oct 9, 2017

noting that some changes here further perpetuate #4917 by moving options from subsystem scopes to bootstrap options. my intent is to circle back to #4917 ASAP to resolve this unilaterally for all affected options - but we've declared a "fixit week" at Twitter this week so I likely won't be able to get started on that until next week.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your most recent comment: is the idea to land this before fixing #4917, or to fix #4917 first?

Over all, this looks good. But I'd like to see a solution for deprecating the options before we land this much option movement.

@@ -59,7 +59,7 @@ def register_goals():
# Pantsd.
kill_pantsd = task(name='kill-pantsd', action=PantsDaemonKill)
kill_pantsd.install()
kill_pantsd.install('clean-all')
kill_pantsd.install('clean-all', first=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment here as to why this is necessary to run first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -36,6 +37,8 @@ def register_bootstrap_options(cls, register):
status as "bootstrap options" is only pertinent during option registration.
"""
buildroot = get_buildroot()
default_distdir = os.path.join(buildroot, 'dist')
default_rel_distdir = '/{}/'.format(fast_relpath(default_distdir, get_buildroot()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is computed a few lines above, so using fast_relpath to tear it back apart doesn't make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -164,6 +165,9 @@ def _run_services(self, services):
self.shutdown(service_thread_map)
raise self.StartupFailure('service {} failed to start, shutting down!'.format(service))

# Once all services are strted, write our pid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started

Also, does it make sense to wait until everything starts successfully before writing the PID? If you fail to start up, you'd still want to be killable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

today, yes. any service failure at or just after startup will automatically cause the daemon to teardown by design. once the pid is written, the thin client will immediately try to connect - so it's important to wait until everything is successfully started otherwise we'll be racing a pailgun run vs the teardown.

self._subproject_roots = bootstrap_options.subproject_roots
self._metadata_base_dir = bootstrap_options.pants_subprocessdir

# TODO(kwlzn): Thread filesystem path ignores here to Watchman's subscription registration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can link to #3479 here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -147,8 +149,12 @@ def watch_project(self, path):
try:
return self.client.query('watch-project', os.path.realpath(path))
finally:
self.client.setTimeout(self._timeout)
self._logger.debug('setting post-startup watchman timeout to %s', self._timeout)
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nested try-catches not terribly clear... maybe break into another method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@stuhood
Copy link
Member

stuhood commented Oct 9, 2017

(...or perhaps this option movement is less egregious than the breakage on #4917 anyway, and so whether we land this or #4917 first doesn't matter...)

@kwlzn
Copy link
Member Author

kwlzn commented Oct 17, 2017

I'd be ok with landing this as-is and then circling back to unilaterally fix #4917 as a follow-on, since its already broken in master. looking into #4917 now.

@stuhood
Copy link
Member

stuhood commented Oct 18, 2017

@kwlzn : Fine with me!

@kwlzn
Copy link
Member Author

kwlzn commented Oct 18, 2017

merging this based on a green CI mod 1 master breakage and 1 flaky test

@kwlzn kwlzn merged commit 0d01035 into pantsbuild:master Oct 18, 2017
stuhood pushed a commit that referenced this pull request Dec 13, 2017
### Problem

#5169 describes a deadlock that occurs when any native-engine lock is acquired by a non-main thread during a fork. Pre #4931 the `context_lock` was acquired for the entire length of a request, but during that change that lock acquisition was removed.

Kris fixed this as part of #5156, but there is not yet a test confirming that the issue is fixed.

### Solution

Add a test to cover #5169.

### Result

Covers #5169 with an integration test.
baroquebobcat pushed a commit to twitter/pants that referenced this pull request Dec 28, 2017
…d#5192)

### Problem

pantsbuild#5169 describes a deadlock that occurs when any native-engine lock is acquired by a non-main thread during a fork. Pre pantsbuild#4931 the `context_lock` was acquired for the entire length of a request, but during that change that lock acquisition was removed.

Kris fixed this as part of pantsbuild#5156, but there is not yet a test confirming that the issue is fixed.

### Solution

Add a test to cover pantsbuild#5169.

### Result

Covers pantsbuild#5169 with an integration test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants