Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privval: duplicate SignerListener: Listening for new connection module=privval during chain upgrades #3828

Closed
tony-iqlusion opened this issue Aug 21, 2024 · 5 comments · Fixed by #3830
Labels
bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team.

Comments

@tony-iqlusion
Copy link

tony-iqlusion commented Aug 21, 2024

Bug Report

Setup

  • CometBFT version: v0.38.9
  • Have you tried the latest version: no
  • ABCI app: gaiad
  • Environment:
    • OS: Debian 12
    • Install tools:
    • Others:

Config

priv_validator_laddr = "X.X.X.X:Y"

What happened?

During a chain upgrade, CometBFT seemingly attempts to accept two privval connections. TMKMS only opens one such connection, which otherwise works/remains stable. But in the logs (see below), the node continues to print the following in a loop, which is confusing to users during a chain upgrade and spammy in logs during a time when people are trying to keep an eye on them to figure out what happened.

What did you expect to happen?

These lines do not appear in the logs, though they seem to be a symptom of a deeper problem.

How to reproduce it

This is the tough part: we only witness this during a chain upgrade. It has impacted several recent chain upgrades however, including today's gaia v19 upgrade as well as recent Neutron upgrades.

It seems to be happening with every recent chain upgrade, however, when TMKMS is being used as a remote signer.

The problem persists after the chain has started, but can be fixed after a chain upgrade has completed by restarting the node.

Logs

2:36PM INF SignerListener: Listening for new connection module=privval
2:36PM ERR SignerListener: Error accepting connection err="accept tcp [::]:<port>: i/o timeout" module=privval
2:36PM INF SignerListener: Listening for new connection module=privval
... [etc] ...

The TMKMS side shows a stable connection without problems:

INFO tmkms::session: [cosmoshub-4@tcp://x.x.x.x:y] connected to validator successfully
WARN tmkms::session: [cosmoshub-4@tcp://x.x.x.x:y]: unverified validator peer ID! (ac9e6866a2b448a4444514b808f564bffe9d2b5d)
INFO tmkms::session: [cosmoshub-4@tcp://x.x.x.x:y]: signed Prevote:<nil> at h/r/s 21835201/0/1 (101 ms)

Anything else we need to know

This doesn't actually break anything, though it is very confusing during chain upgrades, and introduces a red herring when people are otherwise trying to debug TMKMS issues (e.g. people interpreted this error as potentially being a problem with enabling extension vote signing, even though everything was working)

@tony-iqlusion tony-iqlusion added bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team. labels Aug 21, 2024
@tony-iqlusion
Copy link
Author

Several validators experienced this problem during the recent cosmoshub v19/gaiad upgrade.

If you're a validator who experienced this problem, 👍 this message.

We also experienced it on the most recent Neutron upgrade, and I believe the one before that.

If you've experienced it on other chain upgrades, please leave a comment and perhaps we can help narrow down the CometBFT version where this bug was introduced.

@Azeuh
Copy link

Azeuh commented Aug 21, 2024

Can confirm I had the same issue with recent Neutron upgrades

@cason
Copy link
Contributor

cason commented Aug 21, 2024

Are you upgrading from which version to which version?

@tony-iqlusion
Copy link
Author

Here's more logs of the full node startup:

2:16PM INF starting node with ABCI CometBFT in-process module=server
2:16PM INF service start impl=multiAppConn module=proxy msg="Starting multiAppConn service"
2:16PM INF service start connection=query impl=localClient module=abci-client msg="Starting localClient service"
2:16PM INF service start connection=snapshot impl=localClient module=abci-client msg="Starting localClient service"
2:16PM INF service start connection=mempool impl=localClient module=abci-client msg="Starting localClient service"
2:16PM INF service start connection=consensus impl=localClient module=abci-client msg="Starting localClient service"
2:16PM INF service start impl=EventBus module=events msg="Starting EventBus service"
2:16PM INF service start impl=PubSub module=pubsub msg="Starting PubSub service"
2:16PM INF service start impl=SignerListenerEndpoint module=privval msg="Starting SignerListenerEndpoint service"
2:16PM INF SignerListener: Blocking for connection module=privval
2:16PM INF SignerListener: Listening for new connection module=privval
2:16PM INF SignerListener: Connected module=privval
2:16PM INF SignerListener: Listening for new connection module=privval
2:16PM INF ABCI Handshake App Info hash=50C34F0029E8AF19A25FE2F6A17CC8CDDEFDA77576BC270BE5116F2AEE0B726F height=21835199 module=consensus protocol-version=0 software-version=v19.1.0
2:16PM INF ABCI Replay Blocks appHeight=21835199 module=consensus stateHeight=21835199 storeHeight=21835200
2:16PM INF Replay last block using real app module=consensus
2:16PM INF applying upgrade "v19" at height: 21835200 module=x/upgrade
2:16PM INF Starting module migrations... module=server
2:16PM INF adding a new module: 07-tendermint module=server
2:16PM INF migrating module feeibc from version 1 to version 2 module=server
2:16PM INF migrating module ibc from version 4 to version 5 module=server
2:16PM INF successfully migrated client to self-manage params module=x/ibc/client
2:16PM INF successfully migrated connection to self-manage params module=x/ibc/connection
2:16PM INF migrating module ibc from version 5 to version 6 module=server
2:16PM INF successfully migrated ibc channel params module=x/ibc/channel
2:16PM INF migrating module interchainaccounts from version 2 to version 3 module=server
2:16PM INF successfully migrated ica/host submodule to self-manage params module=x/ibc-interchainaccounts
2:16PM INF successfully migrated ica/controller submodule to self-manage params module=x/ibc-interchainaccounts
2:16PM INF migrating module provider from version 6 to version 7 module=server
2:16PM INF starting provider legacy params migration module=server
2:16PM INF successfully migrated legacy provider parameters module=x/ibc-provider
2:16PM INF migrating module slashing from version 3 to version 4 module=server
2:16PM ERR SignerListener: Error accepting connection err="accept tcp [::]:5007: i/o timeout" module=privval
2:16PM INF SignerListener: Listening for new connection module=privval
2:16PM ERR SignerListener: Error accepting connection err="accept tcp [::]:5007: i/o timeout" module=privval
2:16PM INF SignerListener: Listening for new connection module=privval
2:17PM ERR SignerListener: Error accepting connection err="accept tcp [::]:5007: i/o timeout" module=privval
2:17PM INF SignerListener: Listening for new connection module=privval
2:17PM ERR SignerListener: Error accepting connection err="accept tcp [::]:5007: i/o timeout" module=privval

This part in particular seems a bit weird:

2:16PM INF SignerListener: Blocking for connection module=privval
2:16PM INF SignerListener: Listening for new connection module=privval
2:16PM INF SignerListener: Connected module=privval
2:16PM INF SignerListener: Listening for new connection module=privval

Right after it successfully accepts a privval connection, it seems to start listening for a new one.

@tony-iqlusion
Copy link
Author

tony-iqlusion commented Aug 21, 2024

Are you upgrading from which version to which version?

@cason this was upgrading to v0.38.9. I'm not sure what the previous release of gaiad was using. Will take a look.

Edit: I think the previous CometBFT version was v0.37.6

github-merge-queue bot pushed a commit that referenced this issue Aug 29, 2024
Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Daniel <daniel.cason@informal.systems>
mergify bot pushed a commit that referenced this issue Aug 29, 2024
Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 5562f11)
mergify bot pushed a commit that referenced this issue Aug 29, 2024
Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 5562f11)

# Conflicts:
#	privval/signer_listener_endpoint.go
mergify bot pushed a commit that referenced this issue Aug 29, 2024
Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 5562f11)

# Conflicts:
#	.changelog/v0.38.3/bug-fixes/3828-privval-drop-duplicate-listen.md
mergify bot added a commit that referenced this issue Aug 29, 2024
…port #3830) (#3895)

Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
<hr>This is an automatic backport of pull request #3830 done by
[Mergify](https://mergify.com).

Co-authored-by: Chill Validation <92176880+chillyvee@users.noreply.github.com>
mergify bot added a commit that referenced this issue Aug 30, 2024
…port #3830) (#3896)

Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
<hr>This is an automatic backport of pull request #3830 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Chill Validation <92176880+chillyvee@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mergify bot added a commit that referenced this issue Aug 30, 2024
…port #3830) (#3896)

Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
<hr>This is an automatic backport of pull request #3830 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Chill Validation <92176880+chillyvee@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit 221c744)
mergify bot added a commit that referenced this issue Aug 30, 2024
…port #3830) (backport #3896) (#3900)

Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
<hr>This is an automatic backport of pull request #3830 done by
[Mergify](https://mergify.com).<hr>This is an automatic backport of pull
request #3896 done by [Mergify](https://mergify.com).

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
mergify bot added a commit that referenced this issue Aug 30, 2024
…port #3830) (#3897)

Resolve #3828

On cometbft startup, duplicate privval request can enter the channel
between the time listen begins and a send is attempted.

The duplicate listen request will timeout since there is not another
signer connecting, filling logs with errors. A duplicate listen request
is undesired.

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
<hr>This is an automatic backport of pull request #3830 done by
[Mergify](https://mergify.com).

Co-authored-by: Chill Validation <92176880+chillyvee@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants