Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected E2E test failure #1480

Closed
fedekunze opened this issue Oct 19, 2018 · 10 comments · Fixed by #1495 or #1505
Closed

unexpected E2E test failure #1480

fedekunze opened this issue Oct 19, 2018 · 10 comments · Fixed by #1495 or #1505
Assignees
Labels
bug 🐛 issues related to unhandled errors in the code that need to be fixed

Comments

@fedekunze
Copy link
Contributor

Waiting for first block on node 2
-- App crashed --
E[10-19|12:41:45.443] Error dialing peer                           module=p2p err="duplicate CONN<127.0.0.1:26656>: %!s(<nil>)"

(node:248) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 2)
Node 2 is running
Too long with no output (exceeded 2m0s)
@NodeGuy
Copy link
Contributor

NodeGuy commented Oct 25, 2018

This looks like a Tendermint bug:

From: https://9475-99653950-gh.circle-artifacts.com/0/home/circleci/project/testArtifacts/node_home_2/process.log

I[10-20|16:43:03.588] Inbound Peer rejected                        module=p2p err="auth failure: secrect conn failed: EOF" numPeers=1

From
https://9475-99653950-gh.circle-artifacts.com/0/home/circleci/project/testArtifacts/node_home_2/process.log:

I[10-20|16:42:04.778] Will dial address                            module=p2p addr=39b7b1e17e9fac30803366da9dc7e54b74cd5a40@127.0.0.1:26656
I[10-20|16:42:04.778] Dialing peer                                 module=p2p address=39b7b1e17e9fac30803366da9dc7e54b74cd5a40@127.0.0.1:26656
I[10-20|16:42:04.779] Starting Peer                                module=p2p peer=0xa5e8e0 impl="Peer{MConn{127.0.0.1:26656} 39b7b1e17e9fac30803366da9dc7e54b74cd5a40 out}"
I[10-20|16:42:04.779] Starting MConnection                         module=p2p peer=0xa5e8e0 impl=MConn{127.0.0.1:26656}
I[10-20|16:42:04.779] Added peer                                   module=p2p peer="Peer{MConn{127.0.0.1:26656} 39b7b1e17e9fac30803366da9dc7e54b74cd5a40 out}"
I[10-20|16:42:05.089] Executed block                               module=state height=1 validTxs=0 invalidTxs=0
I[10-20|16:42:05.090] Committed state                              module=state height=1 txs=0 appHash=47FB9D1E979742CCA1179934162168316C70C91C
I[10-20|16:42:05.090] Recheck txs                                  module=mempool numtxs=0 height=1
I[10-20|16:42:05.091] Indexed block                                module=txindex height=1
I[10-20|16:42:05.531] Dialing peer                                 module=p2p address=39b7b1e17e9fac30803366da9dc7e54b74cd5a40@127.0.0.1:26656
E[10-20|16:42:05.531] Error dialing peer                           module=p2p err="duplicate CONN<127.0.0.1:26656>: %!s(<nil>)"

@jbibla
Copy link
Collaborator

jbibla commented Oct 25, 2018

I reproduced this as well but rerunning the tests resulted in a pass.

@ebuchman
Copy link
Contributor

ebuchman commented Oct 26, 2018

Need to set allow_duplicate_ip = true under [p2p] in the config

@faboweb
Copy link
Collaborator

faboweb commented Oct 30, 2018

thank you @ebuchman for fixing this

@jbibla
Copy link
Collaborator

jbibla commented Oct 30, 2018

this is still an issue 😭

@jbibla jbibla reopened this Oct 30, 2018
@ebuchman
Copy link
Contributor

So there's a known bug in Tendermint that's causing us to dial peers we're already connected to and fail to connect to them (tendermint/tendermint#2716), but that shouldn't be causing issues with the chain halting.

I'm not sure how to understand the output we're seeing in CI, eg. from the link David posted with the failure:

Redirecting node 2 output to /home/circleci/project/testArtifacts/node_home_2/process.log
Waiting for first block on node 2
-- App crashed --
E[10-20|16:42:05.531] Error dialing peer                           module=p2p err="duplicate CONN<127.0.0.1:26656>: %!s(<nil>)"

(node:252) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 2)
Node 2 is running
Too long with no output (exceeded 2m0s)

Failing to dial the peer wouldn't crash the app. What does App crashed imply here and why/how does it output just a single line of logs from Tendermint? Is it possible to see the output (eg. /home/circleci/project/testArtifacts/node_home_2/process.log in this case) when this happens?

@faboweb
Copy link
Collaborator

faboweb commented Oct 30, 2018

Here you go: https://9475-99653950-gh.circle-artifacts.com/0/home/circleci/project/testArtifacts/node_home_2/process.log

We record all the logs and node folders in the e2e tests

@ebuchman
Copy link
Contributor

So I'm not seeing any issues in that log.

Waiting for first block on node 2
Too long with no output (exceeded 2m0s)

It committed 240 blocks over the 2-minutes in that log, so I'm not sure how to understand what the test is saying actually went wrong?

@faboweb
Copy link
Collaborator

faboweb commented Oct 30, 2018

I see the error. We fail the script if the outputs see an error. There are errors, but those are not breaking the node. My apologies. I will change this today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 issues related to unhandled errors in the code that need to be fixed
Projects
None yet
5 participants