Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid panics when there are multiple failures on the same connection #1600

Merged
merged 4 commits into from
Jan 18, 2021

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Jan 15, 2021

Motivation

zebrad keeps panicking occasionally because it calls fail_with multiple times on the same connection.

We can fix one obvious cause of this bug, but we can't rule out correctly written code also encountering it.

Solution

  • fix the overload handling code, which calls fail_with then continues to process the current request
  • downgrade the panic to a warning
  • improve logging for connection failures and state errors

The code in this pull request has:

  • Existing Documentation Comments
  • Manual testing

Review

@dconnolly is familiar with this bug.

Let's try to get this fix merged by the end of the sprint.

Related Issues

The original bug #1510
A partial initial fix #1531
Closes #1599

Follow Up Work

We should refactor this code so invariant violations like this are caught by the compiler, rather than happening under very specific network conditions - #1610

For bugs that we can't make into compilation errors, we should create better tests for the network code.

zebra-network's Connection expects that `fail_with` is only called once
per connection, but the overload handling code continues to process the
current request after an overload error, potentially leading to further
failures.

Closes ZcashFoundation#1599
We can't rule out the connection state changing between the state checks
and any eventual failures, particularly in the presence of async code.

So we turn this panic into a warning.
@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code C-cleanup Category: This is a cleanup labels Jan 15, 2021
@teor2345 teor2345 added this to the 2021 Sprint 1 milestone Jan 15, 2021
@teor2345 teor2345 requested a review from dconnolly January 15, 2021 06:35
@teor2345 teor2345 self-assigned this Jan 15, 2021
@teor2345 teor2345 changed the title Avoid panics on multiple connection failures Avoid panics when there are multiple failures on the same connection Jan 15, 2021
dconnolly
dconnolly previously approved these changes Jan 18, 2021
Copy link
Contributor

@dconnolly dconnolly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can re-approve if/when you have more changes

@teor2345
Copy link
Contributor Author

I can re-approve if/when you have more changes

Thanks!

I changed the PR to keep the panic. I think this change is good to merge now.

If I end up doing some refactoring, I'll open a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug C-cleanup Category: This is a cleanup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic: called fail_with on already-failed connection state, from msg=GetData
2 participants