-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lightningd question: Can we expect db_write's after a "shutdown" notification? #4785
Comments
Yes, lightning doesn't wait for an answer here, so at this point, you can receive all the types of sequences, and it is not deterministic at all. The
I'm wondering if we need to admit all the types of calls during the "shutdown", I mean, if we have a datatore, in my opinion, the plugin needs to have time to store the data (but not much time here, imagine that the plugin have a while true {}, it is not good at all). |
TLDR; I think plugins that subscribe to Ok I think I get the intention of the "shutdown" notification. But if the BTW when DEVELOPER=1, all build-in plugin Lines 639 to 642 in cdb93ee
This already gave me already one deadlock in
Another example: using current backup plugin (which doesn't subscribe "shutdown and is thus immediately killed at shutdown) in combination with another plugin that subscribes "shutdown" and (at shutdown) waits a few seconds and then calls because: Lines 2087 to 2091 in cdb93ee
and lightning/lightningd/plugin_hook.c Line 63 in cdb93ee
|
So to answer
I have been running pytests on a test branch, which logs dirty db transactions when they happen right after
note: needed modify
returned none. So I guess that's good news and probably explains there have not been issues with it so far with nodes running the lightning/lightningd/lightningd.c Lines 1137 to 1146 in 7401b26
Anyway, inside So I think plugins-that-registered-db_write-hook should be kept alive a little longer. At least until other plugins are killed and maybe until after the last |
Can be a lucky order inside the array of plugins?
Why preserve only a specific hook like the? IMO, with this new plugin notification |
so the ones that registered db_write hook are kept alive until after the last db transaction issue ElementsProject#4785 TODO: - maybe revert PR ElementsProject#3867, i.e. remove obsolete db wrapper arround freeing jsonrpc the wrapper was added in PR ElementsProject#1755, but since PR ElementsProject#3867 nothing touches the db at this point?
Note that db plugin hooks already get called before init, so calling after shutdown is probably minor. But we should definitely not kill these until last, and ideally put an assert in if we do a db operation after that! |
Yes, but 'before init' the plugin is already running. At shutdown it could've been killed, which also un-registers its hooks, even for important-plugins. Thanks for sharing that perspective, considered how plugin processes are started quite early, wouldn't it make sense to kill them late? edit lightning/lightningd/lightningd.c Lines 1165 to 1171 in 7401b26
The |
so that plugins that registered db_write hook are kept alive until after the last db transaction individual plugins can now be in state SHUTDOWN issue ElementsProject#4785
*first* call keeps db_write plugins alive, subscribed plugins have 30s to finish their business *second* call notifies subscribed db_write plugins for 2nd time and gives then 5s to self-terminate issue ElementsProject#4785
*first* call keeps db_write plugins alive, subscribed plugins have 30s to finish their business *second* call notifies subscribed db_write plugins for 2nd time and gives then 5s to self-terminate issue ElementsProject#4785
*first* call keeps db_write plugins alive, subscribed plugins have 30s to finish their business *second* call notifies subscribed db_write plugins for 2nd time and gives then 5s to self-terminate issue ElementsProject#4785
Ideally, a backup plugin should not terminate on a Since not responding to Basically, my understanding is that a backup plugin should ignore
|
Correct, the exact reason for In this sense I do agree that any subscriber to the |
*first* call keeps db_write plugins alive, subscribed plugins have 30s to finish their business *second* call notifies subscribed db_write plugins for 2nd time and gives then 5s to self-terminate issue ElementsProject#4785
… in shutdown Here important-plugin implies `important hook`. Before this commit, when in shutdown: - existing in-flight hooks where abandoned, cutting the hook-chain and never call hook_final_cb - hooks where removed when its plugin died, even important-plugin because `shutdown` overrules - but hook events can be called while waiting for plugins to self-terminate (up to 30s) and subdaemons still alive and it looks as if no plugin ever registered the hook. After this commit, when in shutdown: - existing in-flight hook (chains) are honoured and can finalize, same semantics as LD_STATE_RUNNING - important-plugins are kept alive until after shutdown_subdaemons, so they don't miss hooks - JSON RPC commands are functional, but anything unimportant-plugin related cannot be relied on TODO: - Run tests -> hangs forever on test_closing, so skip them - Q. Does this open a can of worms or races when (normal) plugins with hooks die randomly? A. Yes, for example htlc_accepted calls triggers hook invoice_payment, but plugin (fetchinvoice?) already died ** * CONCLUSION: If you want to give more control over shutdown, I think there could be * a plugin `shutdown_clean.py` with RPC method `shutdown_clean`. When called, that * plugin starts additional (important) plugin(s) that register relevant hooks and, for example, hold-off * new htcl's and wait for existing inflight htlc's to resolve ... and finally call RPC `stop`. * * Note: --important-plugin only seems to work at start, not via `plugin start shutdown_clean.py` * maybe we can add? Or do something with disable? * * Some parts of this commit is stil good, i.e. hook semantics of important plugins should be consistent * untill the very last potential hook call. ** - What if important-plugin dies unexpectatly and lightningd_exit() calls io_break() is that bad? - What are the benefits? Add example where on shutdown inflight htlc's are resolved/cleared and new htlc's blocked, see ElementsProject#4842 - Split commit into hook-related stuff and others, for clarity of reasoning - Q. How does this relate (hook-wise) to db_write plugins? A. Looks like this hook is treated like any other hook: when plugin dies, hook is removed, so to be safe backup needs to be `important`. Hook documentation does not mention `important-plugin` but BACKUP.md does. TODO: Tested this -> `plugin stop backup.py` -> "plugin-backup.py: Killing plugin: exited during normal operation" In fact, running current backup.py with current master misses a couple of writes in shutdown (because its hook is removed, see issue ElementsProject#4785).
Until these issues with shutdown are resolved: ElementsProject/lightning#4785 ElementsProject/lightning#4883
Fixes: ElementsProject#4785 Fixes: ElementsProject#4883 Changelog-Changed: Plugins: `shutdown` notification is now send when lightningd is almost completely shutdown, making RPC calls fail with error code -5.
Fixes: ElementsProject#4785 Fixes: ElementsProject#4883 Changelog-Changed: Plugins: `shutdown` notification is now send when lightningd is almost completely shutdown, RPC calls then fail with error code -5.
So a db_write plugin can still self-terminate when its ready for it. Credit to ZmnSCPxj for the EOF idea, see issue ElementsProject#4785
So a db_write plugin can still self-terminate when its ready for it. Credit to ZmnSCPxj for the EOF idea, see issue ElementsProject#4785
…er closing db in second shutdown_plugin call, trigger EOF in plugin stdin. This correctly handles corner cases in shutdown_plugin: - in a non-dev build, buildin plugins don't subscribe shutdown, but we still need a (brief) io_loop to notify interested db_write plugins - only an awaited-for plugin should call io_break - allways send EOF to db_write plugins in the final call (subscribed or not) This adds two helper functions: plugin_registered_db_write_hook jsonrpc_command_del and a new plugin state: SHUTDOWN inspired by ElementsProject#4790 and also credit to ZmnSCPxj for mentioning the EOF idea in ElementsProject#4785
So a db_write plugin can still self-terminate when its ready for it. Credit to ZmnSCPxj for the EOF idea, see issue ElementsProject#4785
…er closing db in second shutdown_plugin call, trigger EOF in plugin stdin. This correctly handles corner cases in shutdown_plugin: - in a non-dev build, buildin plugins don't subscribe shutdown, but we still need a (brief) io_loop to notify interested db_write plugins - only an awaited-for plugin should call io_break - allways send EOF to db_write plugins in the final call (subscribed or not) This adds two helper functions: plugin_registered_db_write_hook jsonrpc_command_del and a new plugin state: SHUTDOWN inspired by ElementsProject#4790 and also credit to ZmnSCPxj for mentioning the EOF idea in ElementsProject#4785 Hopefully this fixes the "Nope" from prev commit, these: ERROR tests/test_invoices.py::test_amountless_invoice - ValueError: ERROR tests/test_misc.py::test_htlc_out_timeout - ValueError: ERROR tests/test_pay.py::test_sendpay_msatoshi_arg - ValueError: ERROR tests/test_pay.py::test_pay_peer - ValueError: ERROR tests/test_pay.py::test_listpays_with_filter_by_status - ValueError: ERROR tests/test_pay.py::test_channel_receivable - ValueError: where still triggering db_write's: lightningd: FATAL SIGNAL 11 (version v22.11.1-149-g93f1458-modded) 0x555e78c30d0f send_backtrace common/daemon.c:33 0x555e78c30db4 crashdump common/daemon.c:46 0x7fd20cc60d5f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 0x555e78c53665 db_begin_transaction_ db/exec.c:124 0x555e78bc7557 read_json lightningd/jsonrpc.c:1090 0x555e78ce8bc3 next_plan ccan/ccan/io/io.c:59 0x555e78ce976d do_plan ccan/ccan/io/io.c:407 0x555e78ce97ab io_ready ccan/ccan/io/io.c:417 0x555e78ceb9c1 io_loop ccan/ccan/io/poll.c:453 0x555e78bfdcb2 shutdown_plugins lightningd/plugin.c:2179 0x555e78bcae76 main lightningd/lightningd.c:1281 0x7fd20cc4bd09 __libc_start_main ../csu/libc-start.c:308 0x555e78b9d3c9 ??? ???:0 0xffffffffffffffff ??? ???:0 Log dumped in crash.log.20230126160704 Not from plugin_response_handle !!! What are they, seems every parse_request() in read_json() is wrapped in a db transaction: ``` if (!in_transaction) { db_begin_transaction(jcon->ld->wallet->db); in_transaction = true; } parse_request(jcon, jcon->input_toks); ``` But wasn't jsonrpc_stop_listening supposed to block new requests? <== Well sort-off, it only blocks new rpc connections But (suprise) the buildin plugins reuse their rpc connection and don't don't close it, so their request go through!
While working on the
backup
plugin here to use the "shutdown" notification for cleanup, a questions arose: Can there bedb_write
's after the "shutdown" notification?The code below suggest it can:
lightning/lightningd/lightningd.c
Lines 1150 to 1159 in cdb93ee
The separate freeing of
ld->jsonrpc
inside a db transaction was added long time ago in PR #1755 for a particular case of un-reserving the utxos (in destructordestroy_utxos
) of a fundchannel command that didn't return when the remote peer hanged.But since
v0.9.1
(PR #3867) that destructor (destroy_utxos
) is not used anymore, AFAIU utxos are now unreserved by blockheight.Are there other destructors hanging on
ld->jsonrpc
that do db transaction? If not, then PR #1755 can be reverted?Maybe the documentation can be more explicit what plugins can expect, such as no rpc calls (e.g.
datastore
) after "shutdown".The text was updated successfully, but these errors were encountered: