-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channels fix: Resolve lts_compatibility test failures #3035
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
eddyashton
commented
Sep 30, 2021
lts_compat_fixup@33604 aka 20211001.7 vs main ewma over 20 builds from 33210 to 33585 Click to see table
|
Looks reasonable to me. |
eddyashton
changed the title
[DRAFT] Channels fix: Resolve lts_compatibility test failures
Channels fix: Resolve lts_compatibility test failures
Oct 1, 2021
achamayou
approved these changes
Oct 1, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since #2801 we've had a high rate of failures in the
lts_compatibility
build. After investigating the logs I believe I've found the root cause. Opening as a PR with a forced repro initially, and then I'll add the resolution.A healthy join of a new-node to old-node looks like this:
The failures we're seeing look like this:
The key diagnostic is that repeated
Initiating node channel
from Node 0. It attempts to initiate many times before the node is ready, but if it makes multiple attempts late enough (when the target is actually listening and processing them), then they get out-of-sync.Looking at another repro with verbose logs, this is what's happening on the receiving node (trimmed and [DESCRIBED] for readability):
In short, the peer makes multiple initiation attempts with the same key share (ie, same context), while the new code resets its context (regenerates its own key share) every time it receives. The fix is to stick with the old context for local state, potentially using a new share from their end, and only ingest it destructively when we come to deriving the shared secret.