Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] fixes FATAL SIGNAL 11 on loading gossmap #6005

Merged

Conversation

vincenzopalazzo
Copy link
Contributor

@vincenzopalazzo vincenzopalazzo commented Feb 13, 2023

This will fix a crash that I caused on armv7
and by looking inside the coredump with gdb
(by adding an assert on n that must be
different from null) I get the following stacktrace

(gdb) bt
#0  0x00000000 in ?? ()
#1  0x0043a038 in send_backtrace (why=0xbe9e3600 "FATAL SIGNAL 11") at common/daemon.c:36
#2  0x0043a0ec in crashdump (sig=11) at common/daemon.c:46
#3  <signal handler called>
#4  0x00406d04 in node_announcement (map=0x938ecc, nann_off=495146) at common/gossmap.c:586
#5  0x00406fec in map_catchup (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:643
#6  0x004073a4 in load_gossip_store (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:697
#7  0x00408244 in gossmap_load (ctx=0x0, filename=0x4e16b8 "gossip_store", num_channel_updates_rejected=0xbe9e3a40) at common/gossmap.c:976
#8  0x0041a548 in init (p=0x93831c, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., config=0x939cdc) at plugins/topology.c:622
#9  0x0041e5d0 in handle_init (cmd=0x938934, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., params=0x939c8c)
    at plugins/libplugin.c:1208
#10 0x0041fc04 in ld_command_handle (plugin=0x93831c, toks=0x939bec) at plugins/libplugin.c:1572
#11 0x00420050 in ld_read_json_one (plugin=0x93831c) at plugins/libplugin.c:1667
#12 0x004201bc in ld_read_json (conn=0x9391c4, plugin=0x93831c) at plugins/libplugin.c:1687
#13 0x004cb82c in next_plan (conn=0x9391c4, plan=0x9391d8) at ccan/ccan/io/io.c:59
#14 0x004cc67c in do_plan (conn=0x9391c4, plan=0x9391d8, idle_on_epipe=false) at ccan/ccan/io/io.c:407
#15 0x004cc6dc in io_ready (conn=0x9391c4, pollflags=1) at ccan/ccan/io/io.c:417
#16 0x004cf8cc in io_loop (timers=0x9383c4, expired=0xbe9e3ce4) at ccan/ccan/io/poll.c:453
#17 0x00420af4 in plugin_main (argv=0xbe9e3eb4, init=0x41a46c <init>, restartability=PLUGIN_STATIC, init_rpc=true, features=0x0, commands=0x6167e8 <commands>, num_commands=4, notif_subs=0x0, num_notif_subs=0, hook_subs=0x0, num_hook_subs=0, notif_topics=0x0, num_notif_topics=0) at plugins/libplugin.c:1891
#18 0x0041a6f8 in main (argc=1, argv=0xbe9e3eb4) at plugins/topology.c:679

I do not know if this is a solution because I do not know when I can parse a node announcement for a node that is no longer in the gossip map and why this can happen.

So, I hope this is just useful for @rustyrussell and the my gossip store is available here

Changelog-Fixes: fixes FATAL SIGNAL 11 on gossmap node announcement parsing.

Signed-off-by: Vincenzo Palazzo vincenzopalazzodev@gmail.com

@vincenzopalazzo vincenzopalazzo force-pushed the macros/stack_overflow branch 4 times, most recently from c62a3eb to af537e1 Compare February 13, 2023 13:54
@rustyrussell rustyrussell added this to the v23.02 milestone Feb 13, 2023
@rustyrussell
Copy link
Contributor

Can you send me your gossip store? In theory this can happen, but in practice it should never: we don't allow node_announcements unless there's a channel announcement before it, so the gossmap is invalid.

However, there's nothing wrong with this fix, it makes us more robust.

Ack af537e1

@rustyrussell
Copy link
Contributor

OK, channel_announcement is indeed a zombie, but node_announcement isn't. Not sure how this happened...

(You have to fix devtools/dump-gossipstore to see this!)

@vincenzopalazzo
Copy link
Contributor Author

Not sure how this happened...

No idea, I just run it normally and it has run for a while! Could be that happens some partial write of the gossip map? but not sure, I'm planning to drive in the gossip map decoding at the end of the months, for not I had no idea what can be a possible case for this error in my node

This will fix a crash that I caused on armv7
and by looking inside the coredump with gdb
(by adding an assert on n that must be
different from null) I get the following stacktrace

```
(gdb) bt
\#0  0x00000000 in ?? ()
\#1  0x0043a038 in send_backtrace (why=0xbe9e3600 "FATAL SIGNAL 11") at common/daemon.c:36
\#2  0x0043a0ec in crashdump (sig=11) at common/daemon.c:46
\#3  <signal handler called>
\#4  0x00406d04 in node_announcement (map=0x938ecc, nann_off=495146) at common/gossmap.c:586
\#5  0x00406fec in map_catchup (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:643
\#6  0x004073a4 in load_gossip_store (map=0x938ecc, num_rejected=0xbe9e3a40) at common/gossmap.c:697
\ElementsProject#7  0x00408244 in gossmap_load (ctx=0x0, filename=0x4e16b8 "gossip_store", num_channel_updates_rejected=0xbe9e3a40) at common/gossmap.c:976
\ElementsProject#8  0x0041a548 in init (p=0x93831c, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., config=0x939cdc) at plugins/topology.c:622
\ElementsProject#9  0x0041e5d0 in handle_init (cmd=0x938934, buf=0x9399d4 "\n\n{\"jsonrpc\":\"2.0\",\"id\":\"cln:init#25\",\"method\":\"init\",\"params\":{\"options\":{},\"configuration\":{\"lightning-dir\":\"/home/vincent/.lightning/testnet\",\"rpc-file\":\"lightning-rpc\",\"startup\":true,\"network\":\"te"..., params=0x939c8c)
    at plugins/libplugin.c:1208
\ElementsProject#10 0x0041fc04 in ld_command_handle (plugin=0x93831c, toks=0x939bec) at plugins/libplugin.c:1572
\ElementsProject#11 0x00420050 in ld_read_json_one (plugin=0x93831c) at plugins/libplugin.c:1667
\ElementsProject#12 0x004201bc in ld_read_json (conn=0x9391c4, plugin=0x93831c) at plugins/libplugin.c:1687
\ElementsProject#13 0x004cb82c in next_plan (conn=0x9391c4, plan=0x9391d8) at ccan/ccan/io/io.c:59
\ElementsProject#14 0x004cc67c in do_plan (conn=0x9391c4, plan=0x9391d8, idle_on_epipe=false) at ccan/ccan/io/io.c:407
\ElementsProject#15 0x004cc6dc in io_ready (conn=0x9391c4, pollflags=1) at ccan/ccan/io/io.c:417
\ElementsProject#16 0x004cf8cc in io_loop (timers=0x9383c4, expired=0xbe9e3ce4) at ccan/ccan/io/poll.c:453
\ElementsProject#17 0x00420af4 in plugin_main (argv=0xbe9e3eb4, init=0x41a46c <init>, restartability=PLUGIN_STATIC, init_rpc=true, features=0x0, commands=0x6167e8 <commands>, num_commands=4, notif_subs=0x0, num_notif_subs=0, hook_subs=0x0, num_hook_subs=0, notif_topics=0x0, num_notif_topics=0) at plugins/libplugin.c:1891
\ElementsProject#18 0x0041a6f8 in main (argc=1, argv=0xbe9e3eb4) at plugins/topology.c:679
```

I do not know if this is a solution because I do not know
when I can parse a node announcement for a node that
it is not longer in the gossip map.

So, I hope this is just usefult for @rustyrussell

Changelog-Fixed: fixes `FATAL SIGNAL 11` on gossmap node announcement parsing.

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
@vincenzopalazzo
Copy link
Contributor Author

force push to fix a typo inside the changelog!

@endothermicdev
Copy link
Collaborator

I like the additional check, but can we log this with a BROKEN as well? Or should that be done by topology?

@rustyrussell
Copy link
Contributor

I like the additional check, but can we log this with a BROKEN as well? Or should that be done by topology?

Unfortunately this is common code. We don't have a decent way of getting such messages out, it would take a more serious re-engineering (this wasn't the only place we might want such a logging hook).

@endothermicdev endothermicdev merged commit a104380 into ElementsProject:master Feb 13, 2023
@vincenzopalazzo vincenzopalazzo deleted the macros/stack_overflow branch February 23, 2023 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants