Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cln v24.08 crash #7689

Closed
JssDWt opened this issue Sep 21, 2024 · 5 comments · Fixed by #7729
Closed

cln v24.08 crash #7689

JssDWt opened this issue Sep 21, 2024 · 5 comments · Fixed by #7729
Assignees
Milestone

Comments

@JssDWt
Copy link
Contributor

JssDWt commented Sep 21, 2024

lightning_gossipd: gossip_store: get delete entry offset 5098973/19323010780 (version v24.08-4-gbc9e4f5-modded)
0x556cf571e570 send_backtrace
        common/daemon.c:33
0x556cf5727c3f status_failed
        common/status.c:221
0x556cf5715ca5 gossip_store_get_with_hdr
        gossipd/gossip_store.c:466
0x556cf57161f7 gossip_store_set_timestamp
        gossipd/gossip_store.c:592
0x556cf571783e process_channel_update
        gossipd/gossmap_manage.c:777
0x556cf5718190 gossmap_manage_channel_update
        gossipd/gossmap_manage.c:901
0x556cf5714a5a handle_recv_gossip
        gossipd/gossipd.c:215
0x556cf5714b45 connectd_req
        gossipd/gossipd.c:307
0x556cf571e85b handle_read
        common/daemon_conn.c:35
0x556cf586cd8c next_plan
        ccan/ccan/io/io.c:60
0x556cf586d217 do_plan
        ccan/ccan/io/io.c:422
0x556cf586d2d0 io_ready
        ccan/ccan/io/io.c:439
0x556cf586ebbc io_loop
        ccan/ccan/io/poll.c:455
0x556cf5714e0c main
        gossipd/gossipd.c:672
0x7f719eee1c89 ???
        ???:0
0x7f719eee1d44 ???
        ???:0
0x556cf5711ae0 ???
        ???:0
0xffffffffffffffff ???
        ???:0

The crash was observed on this branch: https://github.com/breez/lightning/tree/cln-v24.08-breez with commit breez@bc9e4f5

The branch contains changes compared to v24.08, namely

#7628
#7611
#7636
But I don't think they were related to the crash.

Notable thing:
The gossip store file was 18GB

@ShahanaFarooqui
Copy link
Collaborator

Another gossip crash report from v24.08.1: #7685 (comment)

@ShahanaFarooqui
Copy link
Collaborator

Reported on Telegram by +steepdawn974:

...
2024-10-02T08:58:33.820Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2024-10-02T08:58:43.407Z **BROKEN** gossipd: gossip_store: checksum verification failed? 32536bf2 should be 67132a62 (offset 3972). Moving to gossip_store.corrupt and truncating
2024-10-02T08:58:43.407Z UNUSUAL 025651f2193a89a44a80d833f0a82da668a3af8438eff2e9633fabb3f6a3748be6-chan#15523: gossipd lost track of announced channel: re-announcing!
2024-10-02T08:58:43.408Z UNUSUAL 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018-chan#15522: gossipd lost track of announced channel: re-announcing!
2024-10-02T08:58:43.408Z UNUSUAL 024a8228d764091fce2ed67e1a7404f83e38ea3c7cb42030a2789e73cf3b341365-chan#15524: gossipd lost track of announced channel: re-announcing!
2024-10-02T08:58:43.464Z INFO    plugin-clnrest: REST server running at https://127.0.0.1:3010
2024-10-02T08:58:43.548Z INFO    lightningd: --------------------------------------------------
2024-10-02T08:58:43.548Z INFO    lightningd: Server started with public key xxxxx, alias xxxxx (color #0362df) and lightningd v24.08
2024-10-02T08:59:37.638Z UNUSUAL lightningd: Bad gossip order: could not find channel 9999999x475x0 for peer's channel update
2024-10-02T09:02:32.335Z **BROKEN** gossipd: Dying channel 863308x1674x0 already deleted?
2024-10-02T09:02:32.335Z **BROKEN** gossipd: gossip_store: bad checksum offset 451:  (version v24.08)
2024-10-02T09:02:32.335Z **BROKEN** gossipd: backtrace: common/daemon.c:38 (send_backtrace) 0x55793fd3051b
2024-10-02T09:02:32.335Z **BROKEN** gossipd: backtrace: common/status.c:221 (status_failed) 0x55793fd39bac
2024-10-02T09:02:32.335Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:480 (gossip_store_get_with_hdr) 0x55793fd27d90
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:491 (check_msg_type) 0x55793fd27dbe
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:509 (gossip_store_set_flag) 0x55793fd27f41
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:561 (gossip_store_del) 0x55793fd28187
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:1216 (gossmap_manage_new_block) 0x55793fd2a82f
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:477 (new_blockheight) 0x55793fd260ff
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:588 (recv_req) 0x55793fd26529
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: common/daemon_conn.c:35 (handle_read) 0x55793fd307c6
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x55793fdc0056
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x55793fdc04e1
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x55793fdc059a
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: ccan/ccan/io/poll.c:455 (io_loop) 0x55793fdc1ee7
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:672 (main) 0x55793fd26ead
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x7f4cb5aa1d09
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x55793fd23d29
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0xffffffffffffffff
2024-10-02T09:02:32.337Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: gossip_store: bad checksum offset 451:

@steepdawn974
Copy link

#7689 (comment)

This was on v24.08

endothermicdev added a commit to endothermicdev/lightning that referenced this issue Oct 4, 2024
This addresses a crash where a deleted channel is added to the
txout_failures map, and temporarily ignored, but a reference is still
kept in the gossmap.  When the channel expires fromt the txout_failures
map, it is once again looked up in the gossmap, leading to a crash while
accessing a deleted entry.

Fixes ElementsProject#7689

Changelog-Fixed: Fixes a crash in gossipd accessing a deleted channel.
endothermicdev added a commit to endothermicdev/lightning that referenced this issue Oct 4, 2024
This addresses a crash where a deleted channel is added to the
txout_failures map, and temporarily ignored, but a reference is still
kept in the gossmap.  When the channel expires from the txout_failures
map, it is once again looked up in the gossmap, leading to a crash while
accessing a deleted entry.

Fixes ElementsProject#7689

Changelog-Fixed: Fixes a crash in gossipd accessing a deleted channel.
@rustyrussell
Copy link
Contributor

rustyrussell commented Oct 7, 2024

#7689 (comment)

This was on v24.08

Wow, this is completely broken. Is this some weird OS? You seem to be getting bad checksums all the time...

Also, please can you send me gossip_store.corrupt?

@rustyrussell
Copy link
Contributor

We use 32-bit file offsets, but since we stopped filtering gossip spam, the store can grow much larger. I suspect this is causing all kinds of weirdness.

The workaround is to restart (which compacts the gossip store), but I'll simply switch to 64 bit offsets for the point release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment