Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic plugin cleanups #3679

Merged
merged 24 commits into from
May 5, 2020

Conversation

rustyrussell
Copy link
Contributor

This patch series is about unifying the dynamic and startup plugin paths. In doing so it clears up some latent bugs, and reduces code duplication.

Comment on lines -853 to -855
/* Prevent a race between was_pending and still_pending */
new_reltimer(c->ld->timers, rpc_hook, time_from_msec(1),
call_rpc_command_hook, rpc_hook);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Contributor

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice cleanup ! Will test soon

@@ -456,7 +495,12 @@ static struct io_plan *plugin_write_json(struct io_conn *conn,
static void plugin_conn_finish(struct io_conn *conn, struct plugin *plugin)
{
plugin->stdout_conn = NULL;
if (plugin->start_cmd) {
plugin_cmd_succeeded(plugin->start_cmd, plugin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we succeed in this case ? I would have expected us to return an error if plugin dies before completing the command
EDIT: It's for plugin stop, right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I wonder what would happen if a plugin started with plugin start exit() after receving init then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated: modified in a following commit

@@ -190,6 +190,7 @@ void plugin_kill(struct plugin *plugin, char *fmt, ...)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the dynamic case though, it's 60 seconds to both complete getmanifest and init, which seems fair.

Given that normally-functioning plugins usually return is less than 2sec, I find it huge. But I can live with it (or with ctrl+c) :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I came across this comment in another commit, so I'll precise: this a comment for the commit message of 62f0916

@darosior
Copy link
Contributor

Travis TL;DR :

diff --git a/lightningd/lightningd.c b/lightningd/lightningd.c
index ddfa141d7..2013501be 100644
--- a/lightningd/lightningd.c
+++ b/lightningd/lightningd.c
@@ -123,8 +123,8 @@ static struct lightningd *new_lightningd(const tal_t *ctx)
         * is a nod to keeping it minimal and explicit: we need this code for
         * testing, but its existence means we're not actually testing the
         * same exact code users will be running. */
-       ld->dev_debug_subprocess = NULL;
 #if DEVELOPER
+       ld->dev_debug_subprocess = NULL;
        ld->dev_disconnect_fd = -1;
        ld->dev_subdaemon_fail = false;
        ld->dev_allow_localhost = false;

Copy link
Contributor

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests nit and a small addition, other than that lgtm : I did some tests, including against plugins in lightningd/plugins and esplora Bitcoin backend (--disable-plugin bcli).

tests/test_plugin.py Outdated Show resolved Hide resolved
tests/plugins/fail_by_itself.py Outdated Show resolved Hide resolved
@@ -1240,7 +1384,7 @@ void plugins_config(struct plugins *plugins)
{
struct plugin *p;
list_for_each(&plugins->plugins, p, list) {
if (p->plugin_state == UNCONFIGURED)
if (p->plugin_state == NEEDS_INIT)
plugin_config(p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add a comment directly in the function, but you don't set the AWAITING_INIT_RESPONSE in plugin_config(), after sending the init request.

Copy link
Contributor

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 4fe882f

@cdecker cdecker added this to the v0.9.0 milestone May 2, 2020
@darosior
Copy link
Contributor

darosior commented May 3, 2020

nit : You never set the AWAITING_INIT_RESPONSE state after sending the init request.

@rustyrussell
Copy link
Contributor Author

Yep, fixed...

Copy link
Contributor

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1392bf8

The symptom (under heavy load and valgrind) in test_plugin_command:

	lightningd: common/json_stream.c:237: json_stream_output_: Assertion `!js->reader' failed.

This is because we try to call `getmanifest` again on `pay` which has not yet
responded to init.

The minimal fix for this is to keep proper state, so we can tell the
difference between "not yet called getmanifest" and "not yet finished
init".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice cleanup, just found some minor nits and questions for my own curiosity 🙂

lightningd/jsonrpc.c Show resolved Hide resolved
Comment on lines -1225 to -1226
/* FIXME(cdecker) This just builds a string for the request because
* the json_stream is tightly bound to the command interface. It
* should probably be generalized and fixed up. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@@ -87,7 +87,6 @@ struct plugin {
*/
struct plugins {
struct list_head plugins;
size_t pending_manifests;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer state tracking than just counting missing manifests 👍

def run(self):
time.sleep(1)
print("Exiting!")
os._exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I usually use sys.exit(1) for these, but the docs explicitly suggest using os._exit(1) after a fork call.


@plugin.init()
def init(options, configuration, plugin):
FailThread()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same could be achieved by the following:

def init(options, configuration, plugin):
    threading.Timer(1, lambda: os._exit(1))

or if the print is actually needed we can do this instead:

def init(options, configuration, plugin):
    def die():
        print("Exiting!")
        os._exit(1)
    threading.Timer(1.0, die)

😉

Comment on lines 74 to 172
tal_del_destructor2(dp->plugin, plugin_dynamic_crash, dp);

list_for_each(&dp->plugin->plugins->plugins, p, list) {
if (p->plugin_state != INIT_COMPLETE)
return;
}

/* No plugin unconfigured left, return the plugin list */
was_pending(plugin_dynamic_list_plugins(dp->cmd, dp->plugin->plugins));
}

/**
* Send the init message to the plugin. We don't care about its response,
* but it's considered the last part of the handshake : once it responds
* it is considered configured.
*/
static void plugin_dynamic_config(struct dynamic_plugin *dp)
{
struct jsonrpc_request *req;

req = jsonrpc_request_start(dp->plugin, "init", dp->plugin->log,
plugin_dynamic_config_callback, dp);
plugin_populate_init_request(dp->plugin, req);
jsonrpc_request_end(req);
dp->plugin->plugin_state = AWAITING_INIT_RESPONSE;
plugin_request_send(dp->plugin, req);
}

static void plugin_dynamic_manifest_callback(const char *buffer,
const jsmntok_t *toks,
const jsmntok_t *idtok,
struct dynamic_plugin *dp)
{
if (!plugin_parse_getmanifest_response(buffer, toks, idtok, dp->plugin))
return was_pending(plugin_dynamic_error(dp, "Gave a bad response to getmanifest"));

if (!dp->plugin->dynamic)
return was_pending(plugin_dynamic_error(dp, "Not a dynamic plugin"));

/* We got the manifest, now send the init message */
dp->plugin->plugin_state = NEEDS_INIT;
plugin_dynamic_config(dp);
}

/**
* This starts a plugin : spawns the process, connect its stdout and stdin,
* then sends it a getmanifest request.
*/
static struct command_result *plugin_start(struct dynamic_plugin *dp)
{
int stdin, stdout;
mode_t prev_mask;
char **p_cmd;
struct jsonrpc_request *req;
struct plugin *p = dp->plugin;

p->dynamic = false;
p_cmd = tal_arrz(NULL, char *, 2);
p_cmd[0] = p->cmd;
/* In case the plugin create files, this is a better default. */
prev_mask = umask(dp->cmd->ld->initial_umask);
p->pid = pipecmdarr(&stdin, &stdout, &pipecmd_preserve, p_cmd);
umask(prev_mask);
if (p->pid == -1)
return plugin_dynamic_error(dp, "Error running command");
else
log_debug(dp->cmd->ld->plugins->log, "started(%u) %s", p->pid, p->cmd);
tal_free(p_cmd);
p->buffer = tal_arr(p, char, 64);
p->stop = false;
/* Give the plugin 20 seconds to respond to `getmanifest`, so we don't hang
* too long on the RPC caller. */
p->timeout_timer = new_reltimer(dp->cmd->ld->timers, dp,
time_from_sec((20)),
plugin_dynamic_timeout, dp);

/* Besides the timeout we could also have the plugin crash before
* completing the handshake. In that case we'll get notified and we
* can clean up the `struct dynamic_plugin` and return an appropriate
* error.
*
* The destructor is deregistered in the following places:
*
* - plugin_dynamic_error in case of a timeout or a crash
* - plugin_dynamic_config_callback if the handshake completes
*/
tal_add_destructor2(p, plugin_dynamic_crash, dp);

/* Create two connections, one read-only on top of the plugin's stdin, and one
* write-only on its stdout. */
io_new_conn(p, stdout, plugin_stdout_conn_init, p);
io_new_conn(p, stdin, plugin_stdin_conn_init, p);
req = jsonrpc_request_start(p, "getmanifest", p->log,
plugin_dynamic_manifest_callback, dp);
jsonrpc_request_end(req);
plugin_request_send(p, req);
p->plugin_state = AWAITING_GETMANIFEST_RESPONSE;
return command_still_pending(dp->cmd);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice cleanup 👍

/**
* Called when trying to start a plugin through RPC, it starts the plugin and
* will give a result 20 seconds later at the most.
*/
static struct command_result *
plugin_dynamic_start(struct command *cmd, const char *plugin_path)
{
struct dynamic_plugin *dp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still references 20 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't add the review comment on that line since it'd confuse GH to have review comments on unchanged lines.

lightningd/plugin.c Show resolved Hide resolved
@@ -482,7 +483,8 @@ static struct io_plan *plugin_read_json(struct io_conn *conn,

if (err) {
plugin_kill(plugin, "%s", err);
return io_close(plugin->stdout_conn);
/* plugin_kill frees plugin */
return io_close(NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, what's the effect of io_close(NULL)? Is it just intended as a no-op?

Copy link
Contributor Author

@rustyrussell rustyrussell May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just like the one above it, it's a cheats way of getting an io_plan object to return

Comment on lines +1284 to +1283
err = plugin_send_getmanifest(p);
if (!err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this leak err?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost: it's allocated off of the plugin which we'd be freeing if there is an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I use this pattern through this code quite a bit.

@rustyrussell rustyrussell force-pushed the guilt/plugin-fixes branch from 1392bf8 to 41dfd66 Compare May 5, 2020 00:24
… not.

This will allow us to simplify the caller's command handling.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we know whether the command completed or not, we can correctly
call command_still_pending() if it didn't complete.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have this now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
… for state.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will let us unify the startup and runtime-started infrastructure.

Note that there are two kinds of notifications:
1. Starting a single plugin (i.e. `plugin start`)
2. Starting multiple plugins (i.e. `plugin rescan` or `plugin startdir`).

In the latter case, we want the command to complete only once *all*
the plugins are dead/finished.

We also call plugin_kill() in all cases, and correctly return afterwards
(it matters once we use the same paths for dynamic plugins, which don't
cause a fatal error if they don't startup).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This actually passes fine, but it's an interesting case to test.

Fixed-by: Darosior <darosior@protonmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will allow the dynamic starting code to use them too.

Also lets us move dev_debug_subprocess under #if DEVELOPER.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We modify the slow_init() so it doesn't go too slowly for this test.

This demonstrates a crash, where we currently try to fail a command
multiple times.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we now clean up options in startup plugins (that was only
done by dynamic code!), and now they both share the 60 second timeout
instead of 20 seconds for dynamic.

For the dynamic case though, it's 60 seconds to both complete
getmanifest and init, which seems fair.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of calling plugin_kill() and returning, have them
uniformly return an error string or NULL, and have the top
level (plugin_read_json) do the plugin_kill() call.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Once again, this unifies plugin_kill() into the caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is what I expected from plugin_kill, and now all the callers do the
equivalent anywat, it's easy.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They are children of the plugin, so this Just Works.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This way the caller doesn't have to "know" that it should use strerror().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
That's more convenient for most callers, which don't need a fmt.

Fixed-by: Darosior <darosior@protonmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's not needed now that plugin_kill frees the plugin immediately.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Make the destructor call check_plugins_resolved(),
   unless it was uninitialized (`opt_disable_plugin`).
2. Remove redundant list_del (destructor already does it).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The previous implementation was a bit lazy: in particular, since we didn't
remember the disabled plugins, we would load them on rescan.

Changelog-Changed: config: the `plugin-disable` option works even if specified before the plugin is found.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's no reason to assign the plugin vars inside the callback, so do
that outside, and the tal_steal() is redundant (the plugin is already
the conn parent).

And reduce duplication by making plugin_conn_finish call plugin_kill:
just make sure we don't call plugin_conn_finish again if plugin_kill
is called externally.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We unified this timeout with the 60 second startup timeout.

Reported-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell force-pushed the guilt/plugin-fixes branch from 41dfd66 to b2ec70a Compare May 5, 2020 01:21
@rustyrussell
Copy link
Contributor Author

Self-ack for @cdecker feedback minor changes (and rebase)

Ack b2ec70a

@rustyrussell rustyrussell merged commit 55cd61e into ElementsProject:master May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants