Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 24.11.2 #8195

Merged

Conversation

rustyrussell
Copy link
Contributor

Updates for crash fixes for v24.11. The only real difference between this and the v25.02.1 is that we don't do the db fixup and rescan, because adding migrations to the db in a branch is dangerous (complicating future upgrades!)

chanbackup with many peers can do more than 128 concurrent rpc commands.
autoclean is the other plugin which can do many requests at once, so I
expect a similar issue there.

I tested this by rebuilding with `MAX_ACTIVE_SPANS` 1, which autoclean
tests triggered immediately.

The real fix is probably to use a hash table with a large initial size.

```
Mar 24 06:30:45 mlbb2 sh[28000]: chanbackup: common/trace.c:190: trace_span_slot: Assertion `s' failed.
Mar 24 06:30:45 mlbb2 sh[28000]: chanbackup: FATAL SIGNAL 6 (version v25.02)
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232bac4f send_backtrace
Mar 24 06:30:45 mlbb2 sh[28000]:         common/daemon.c:33
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232baceb crashdump
Mar 24 06:30:45 mlbb2 sh[28000]:         common/daemon.c:78
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cd851f ???
Mar 24 06:30:45 mlbb2 sh[28000]:         ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958d2c9fc __pthread_kill_implementation
Mar 24 06:30:45 mlbb2 sh[28000]:         ./nptl/pthread_kill.c:44
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958d2c9fc __pthread_kill_internal
Mar 24 06:30:45 mlbb2 sh[28000]:         ./nptl/pthread_kill.c:78
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958d2c9fc __GI___pthread_kill
Mar 24 06:30:45 mlbb2 sh[28000]:         ./nptl/pthread_kill.c:89
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cd8475 __GI_raise
Mar 24 06:30:45 mlbb2 sh[28000]:         ../sysdeps/posix/raise.c:26
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cbe7f2 __GI_abort
Mar 24 06:30:45 mlbb2 sh[28000]:         ./stdlib/abort.c:79
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958cbe71a __assert_fail_base
Mar 24 06:30:45 mlbb2 sh[28000]:         ./assert/assert.c:94
Mar 24 06:30:45 mlbb2 sh[28000]: 0x7f2958ccfe95 __GI___assert_fail
Mar 24 06:30:45 mlbb2 sh[28000]:         ./assert/assert.c:103
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232ab7fa trace_span_slot
Mar 24 06:30:45 mlbb2 sh[28000]:         common/trace.c:190
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232abc9f trace_span_start
Mar 24 06:30:45 mlbb2 sh[28000]:         common/trace.c:267
Mar 24 06:30:45 mlbb2 sh[28000]: 0x5575232a7c34 send_outreq
Mar 24 06:30:45 mlbb2 sh[28000]:         plugins/libplugin.c:1112
```

Changelog-Fixed: autoclean/chanbackup: fixed tracepoint crash on large number of requests.
Fixes: ElementsProject#8177
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's not the *outgoing* HTLC which sets the deadline, it's the incoming.

Reported-by: @whitslack
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Egregious anchor fee paid for unilateral close txs due to HTLC timeouts; it's not as urgent as our code made out!
…struct.

Use the indirect-free trick, otherwise this can happen:

```
2025-03-28T10:46:16.437Z BROKEN lightningd: FATAL SIGNAL 6 (version v25.02)
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: common/daemon.c:41 (send_backtrace) 0x6447525af68c
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: common/daemon.c:78 (crashdump) 0x6447525af6db
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 ((null)) 0x7783e2c4532f
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ./nptl/pthread_kill.c:44 (__pthread_kill_implementation) 0x7783e2c9eb2c
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ./nptl/pthread_kill.c:78 (__pthread_kill_internal) 0x7783e2c9eb2c
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ./nptl/pthread_kill.c:89 (__GI___pthread_kill) 0x7783e2c9eb2c
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ../sysdeps/posix/raise.c:26 (__GI_raise) 0x7783e2c4527d
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7783e2c288fe
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ccan/ccan/tal/tal.c:95 (call_error) 0x644752675535
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ccan/ccan/tal/tal.c:169 (check_bounds) 0x6447526755de
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ccan/ccan/tal/tal.c:180 (to_tal_hdr) 0x644752675618
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: ccan/ccan/tal/tal.c:525 (tal_free) 0x644752676001
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: lightningd/bitcoind.c:509 (getrawblockbyheight_callback) 0x64475252c01b
2025-03-28T10:46:16.437Z BROKEN lightningd: backtrace: lightningd/plugin.c:661 (plugin_response_handle) 0x64475257be0a
```

Changelog-Fixed: lightningd: occasional crash on bitcoind callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…esn't support OPT_SHUTDOWN_ANYSEGWIT.

We select the close key index at opening time, but the non-DF code didn't correctly register the
address as possibly used for P2WPKH for older nodes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: wallet: we could miss our own returned outputs on mutual closes if peer doesn't support option_shutdown_anysegwit (you will still need to rescan after update, if this happened to you!)
Reported-by: Grubles
We pre-close incoming under some circumstances, so this does happen (it
didn't when this code was written).  Don't walk all the HTLCs complaining
about them in this case, and don't freak out.

Changelog-Fixed: lightningd: incorrect spamming of log and potential crash on testnet case of duplicate HTLCs and slow closing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: ElementsProject#8176
These workflows are failing due to `ubuntu-latest` being updated from version `22.04` to `24.04`.

Reference: actions/runner-images#10636

Changelog-None.
Poetry will no longer include the `poetry-plugin-export` plugin by default, which is essential for exporting dependencies. So, we now need to install it explicitly.
Fix for `The Poetry configuration is invalid:  - project must contain ['name'] properties`
@endothermicdev
Copy link
Collaborator

cherry picked the CI fixes from #7975.

This happens with autoclean, which does a datastore request then frees
the parent command without waiting for a response (see clean_finished).

This leaks a trace, and causes a crash if the pointer is later reused.

My solution is to create a trace variant which declares the trace key
to be a tal ptr and then we can clean up in the destructor if this happens.
This fixes the issue for me.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: autoclean: fixed occasional crash when tracepoints compiled in.
@endothermicdev
Copy link
Collaborator

Added #8198 and update changelog accordingly.

@endothermicdev
Copy link
Collaborator

test_askrene_fake_channeld is failing consistently under CI here and I'm not sure what we changed to cause this, but seems to pass fine when I run through it a dozen times on my own machine. I also don't see any post-24.11 CI flake fixes related to this. It seems like a timeout at:

                    # Worst-case timeout is 1 second per hop, + 60 seconds if MPP timeout!
                    try:
>                       if l1.rpc.waitsendpay(hash_hex, timeout=TIMEOUT + len(p) + 60, partid=i + 1, groupid=1):
error: {'code': 200, 'message': 'Timed out while waiting'

I think this branch is ready to go otherwise.

endothermicdev and others added 2 commits April 4, 2025 11:36
Note that we didn't backport the db changes, since that would complicate upgrades to v25.02.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@endothermicdev endothermicdev merged commit f99c3c2 into ElementsProject:release-v24.11.1 Apr 4, 2025
31 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants