Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Secret Store Nodes eventually disconnects #10266

Closed
jcortejoso opened this issue Jan 30, 2019 · 10 comments · Fixed by #10303
Closed

Secret Store Nodes eventually disconnects #10266

jcortejoso opened this issue Jan 30, 2019 · 10 comments · Fixed by #10303
Assignees
Labels
F2-bug 🐞 The client fails to follow expected behavior.
Milestone

Comments

@jcortejoso
Copy link

Before filing a new issue, please provide the following information.

  • Parity Ethereum version: v2.3.0-nightly
  • Operating system: Linux
  • Installation: built from source
  • Fully synchronized: yes
  • Network: PoA
  • Restarted: yes

Your issue description goes here below. Try to include actual vs. expected behavior and steps to reproduce the issue.

Our setup consist in a PoA Network with 3 nodes that have the secret store API enabled, and also these nodes act as Authorities in the PoA. Also we have other clients connected to the Ethereum network. The problem we are observing is that everything work as expected, but after some time running (order of hours), we see some errors in the secret store logs and the encryption calls fail.
After restarting the parity process in the secret store nodes, the problem disappears for some hours. I have checked the status of the instance, and try to make the connections manually when the nodes are failing, and everything seems good and working.

These are the logs we can see in the secret store nodes:

Jan 29 07:21:49 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:21:49 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:21:59 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:21:59 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:06 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:06 UTC    7/25 peers      6 MiB chain   87 MiB db  0 bytes queue  235 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Jan 29 07:22:09 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:09 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:19 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:19 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:29 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:29 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:36 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:36 UTC    7/25 peers      6 MiB chain   87 MiB db  0 bytes queue  235 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Jan 29 07:22:39 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:39 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
...

And other node with the logger options -l jsonrpc,secretstore_net,privatetx=trace:

Jan 27 21:18:05 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:05 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Jan 27 21:18:07 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:07 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAlive from 0xfc10…b0ab
Jan 27 21:18:10 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:10 UTC Verifier #3 INFO import  Imported #50952 0x9aec…4c45 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-3 WARN secretstore  GenerateServerKey request /shadow/85c28c30810d4d00821f39e55513d935f6273a6fd620479db645a7e05db8afb2/1794541472c108396084922b527b114a563a08e6d46d71c1cbc85595f4b8f2f855bea0b69a936dfe7941657e09aeb632c3f0ba30e1f404143fdb405dca6e36bc01/0 has failed with: node required for this operation is currently disconnected
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 INFO secretstore_net  0xb98e…fec0: generation session completed
Jan 27 21:18:15 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:15 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-2 WARN secretstore  GenerateServerKey request /shadow/3113f6b38f5c414f8e9eb9263b413c410051d34634a04ea28d3aba56b8c2f404/af40958ddbdb71abcb8b4e5a33a2bdd8c494edd81f3ee346dfec38e0d6fe0ecb0b6f829a7409966db7fa9edf236f9fc7e2c7a409d332a99c577ca23fd566aa7800/0 has failed with: node required for this operation is currently disconnected
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-0 INFO secretstore_net  0xb98e…fec0: generation session completed

This is the configuration file we are using for the secret store nodes:

[parity]
chain = "/etc/parity/secretstore/chain.json"
base_path = "/parity_data"

[ui]
disable = true

[rpc]
disable = true
interface = "0.0.0.0"
cors = ["all"]
hosts = ["all"]
apis = ["web3", "eth", "net", "parity", "traces", "rpc", "personal", "parity_accounts", "signer", "parity_set"]

[ipc]
disable = true

[dapps]
disable = true

[websockets]
disable = true
port = 8546
interface = "local"
origins = ["none"]
apis = ["all"]
hosts = ["all"]

[mining]
engine_signer = "0xf4a2dbd5d67ef3019a7e4b188b92128c86d76bf9"
reseal_on_txs = "none" # Authorities reseal automatically
usd_per_tx = "0" # Allows for free transactions.

[network]
nat = "extip:<public_ip>"
port = 30303
bootnodes = [
  "enode://e90d77ad3065d0767abe08cdd47746c29fae578e8d1c06aacc30da720dd7fd8c8aa0e1e2ca2bb20c4739edff82e5e7405a382cf17af6a4366e2540f72e46e829@172.31.48.123:30303",
  "enode://204b11794d16f382bab24a7cdde5ec1273b8c6c2b23e64560d7ef1707708158a45300e920607aa413633e45c5eea472c17f474866ce5901515249abfbe9c4e1c@172.31.0.123:30303",
  "enode://6881fbbf8f2400c05c61b696af7bb2be04ab22ce64d5221906b9c1c67f2bf976b71cd1d98be7743dfc139a7b590fe230212195714e71f2204252caf68f6f94c0@172.31.16.123:30303"
]
discovery = true

[ipfs]
enable = false

[snapshots]
disable_periodic = true

[secretstore]
self_secret = "f4a2dbd5d67ef3019a7e4b188b92128c86d76bf9"
disable = false
disable_http = false      # This node will expose a Secret Store HTTP API
http_interface = "all"  # The HTTP API is available locally only
http_port = 12001          # The Secret Store HTTP API is available at this port
acl_contract = "6bd5fdc37b9c87ba73dda230e5dc18e9fda71ff9"
server_set_contract = "none"
nodes = [
  "7cb22f9a17e8880f8009b519b3506fa9d52fe0ff315f68d0dc0be9576f116818be5f3843c3e4e57aecea40bdd14c8539f5714a51d3254c2955d8a9f1ea9723dc@<public_ips>:12000",
  "b98ec1ea6bd4cb64ac2b73fdda87a4276d027c879237490589a475dae71ebfad7598c7355a5afe2f45f9f007b502563769699e722ed3eba0d1478adbf832fec0@<public_ips>:12000",
  "fc10afc336daf751ea236acaf8053cb709d5a55b1379db7aeb2a9fa76604d2d837ccda4d348bbc5a41fc8a270db8c9feae19d4f86757ffc9789da4e455dbb0ab@<public_ips>:12000"
]
interface = "all"
port = 12000               # port used to communicated with other Secret Store nodes
path = "/parity_data/secret_store"

[account]
...

If I can provide any further information please do not hesitate to ask.
Thanks!

@jam10o-new jam10o-new added M6-rpcapi 📣 RPC API. F2-bug 🐞 The client fails to follow expected behavior. labels Jan 30, 2019
@jam10o-new jam10o-new added this to the 2.4 milestone Jan 30, 2019
@svyatonik svyatonik removed the M6-rpcapi 📣 RPC API. label Feb 1, 2019
@svyatonik
Copy link
Collaborator

Hey!

According to this log line:

Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab

, it is the node 0xfc10…b0ab who has lost its connection to the 0x7cb2…23dc. So it'd be good to see logs from these nodes, if this is possible (you've attached the log from 0xb98e…fec0, which well-connected to both nodes).

Could you also please specify - which git commit are you using for your tests?

@svyatonik
Copy link
Collaborator

Also - if you leave nodes for some time after disconnect, are they reconnecting again? Does (at least) pinging works during disconnect period?

@jcortejoso
Copy link
Author

jcortejoso commented Feb 4, 2019

Hi @svyatonik!, thanks for the answer.

  • The commit is 4fec2f2

  • About the connection, I would say internal pings works (as seen in the logs of 0xb98e…fec0). Only had the logger configured for this node, but I have configured in the same level for the other two.
    After restarting the parity process in the three nodes, everything started working again.

Logs for node 0xb98e…fec0:

Feb 04 08:45:15 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:15 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-0 WARN secretstore  GenerateServerKey request /shadow/c7d9abcb6c9bca27c91c98dee17ca59de350b891f6a332c4e10a6b2b25d47052/fbe03244a10cf33cb1f8d0a59f0d9002586adc87d94b177d4dd3c7a0ba7d5b471c9a40001782d0ac861c917c94645872a73a780ddda254f0716fbf5d5e496dd001/1 has failed with: node required for this operation is currently disconnected
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 INFO secretstore_net  0xb98e…fec0: generation session completed
Feb 04 08:45:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:25 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:35 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:35 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:36 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:36 UTC IO Worker #0 INFO import     5/25 peers      2 MiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:45 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:45 UTC Verifier #0 INFO import  Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:45 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:45 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0xfc10…b0ab
Feb 04 08:46:05 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:05 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:06 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:06 UTC IO Worker #1 INFO import     5/25 peers      2 MiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:15 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:15 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:25 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:25 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAlive from 0x7cb2…23dc
Feb 04 08:46:26 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:26 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAlive from 0xfc10…b0ab
Feb 04 08:46:35 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:35 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:36 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:36 UTC IO Worker #2 INFO import     5/25 peers      2 MiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:40 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:40 UTC Verifier #2 INFO import  Imported #67736 0x2d8d…c111 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)

Logs for node 0xfc10…b0ab:

Feb 04 08:45:06 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:06 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:16 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:16 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:18 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:18 UTC    5/25 peers      3 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:23 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:23 UTC 0xfc10…b0ab: generation session read error 'node required for this operation is currently disconnected' when requested for session from node 0xb98e…fec0
Feb 04 08:45:26 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:26 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:36 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:36 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:45 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:45 UTC Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:46 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:46 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:48 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:48 UTC    5/25 peers      3 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:56 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:56 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:06 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:06 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:16 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:16 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:18 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:18 UTC    5/25 peers      3 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:26 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:26 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:36 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:36 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:40 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:40 UTC Imported #67736 0x2d8d…c111 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)

Logs for node 0x7cb2…23dc:

Feb 04 08:45:05 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:05 UTC Imported #67734 0x389a…32cf (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:21 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:21 UTC    5/25 peers      2 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:45 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:45 UTC Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:51 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:51 UTC    5/25 peers      2 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:21 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:46:21 UTC    5/25 peers      2 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:25 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:46:25 UTC 0x7cb2…23dc: generation session failed with timeout

Thanks!

@svyatonik
Copy link
Collaborator

@jcortejoso Could you, please, share the whole (since restart) log of node 0xfc10…b0ab (the node who gets a lot of network error 'early eof' when reading message from node 0x7cb2…23dc errors)? I'd like to see - what has caused this in first place.

@jcortejoso
Copy link
Author

Hi @svyatonik. I attach the logs file. Now the nodes that seems to disconnect is the 0xb98e…fec0. I think these are the more relevant lines:

Feb 04 11:21:15 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:15 UTC IO Worker #3 INFO import     5/25 peers    176 KiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 11:21:24 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:24 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 11:21:25 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:25 UTC Verifier #1 INFO import  Imported #67960 0xcda4…b8bf (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 WARN secretstore_net  0xb98e…fec0: keep alive timeout for node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: removing connection to 0x7cb2…23dc at 18.210.120.241:40230
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 WARN ethcore_secretstore::key_server_cluster::client_sessions::generation_session  0xb98e…fec0: generation session failed because 0x7cb2…23dc connection has timeouted
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 WARN secretstore  GenerateServerKey request /shadow/7f00dfe7756149c492bcd603a8d1cc8c25e77f3d5cc14464a106deea545a2653/7022c89dfd2fbe82bdcc148b0e31ca6d51f83493ee546b9efcc6e372a54c3c814384bf7cc12486f212f252f41d13f9b4a55fcc8c0d0436df548fef645ab4b00101/0 has fail
ed with: node required for this operation is currently disconnected
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.PublicKeyShare from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 WARN secretstore_net  0xb98e…fec0: generation session read error 'no active session with given id' when requested for session from node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: inserting connection to 0x7cb2…23dc at 18.210.120.241:12000. Connected to 2 of 2 nodes
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 WARN secretstore_net  0xb98e…fec0: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: removing connection to 0x7cb2…23dc at 18.210.120.241:12000

Thanks!
log_0x7cb2…23dc.txt
log_0xb98e…fec0.txt
log_0xfc10…b0ab.txt

@svyatonik
Copy link
Collaborator

Thanks, @jcortejoso - I think that's enough info for now. Definitely seems like issues are in SS code: (1) that node gets disconnected because of timeout (though seems like KeepAliveResponse messages are queued && processed right after disconnect) and (2) that node is unable to recover from disconnect issue (i.e. reconnect).

@svyatonik svyatonik self-assigned this Feb 5, 2019
@svyatonik
Copy link
Collaborator

Seems like I found a reason - the SS' HTTP listener is currently written in a way that it won't work well with runtime shared with other components (i.e. it blocks the entire runtime thread until request is served). And since runtime is now (since #9888 - my bad :/) shared between ethereum sync + SS HTTP listener + SS network => some messages required for session completion could not be processed until session is completed (you could see it in logs - just after session is completed with timeout error, queued messages are processed).

@jcortejoso Could. you please try if everything works with this commit 1fdfa1e6c6833ec18e171e67e5925e17a243a21b?

@jcortejoso
Copy link
Author

jcortejoso commented Feb 5, 2019

Hi @svyatonik. We have running our SS cluster built from that commit since 2 hours ago. So far so good. I will update if there is any issue or if everything works as expected.

Thanks!

@jcortejoso
Copy link
Author

We have been testing our POA more intensively these days (running from commit 1fdfa1e) and I have the feeling that Secret Store reconnects correctly but Ethereum network does not. This is an example of the logs:

Feb 22 00:42:00 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:00 UTC Imported #87438 0x308c…cc5e (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:42:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:29 UTC    5/25 peers      5 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:42:59 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:59 UTC    5/25 peers      5 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:43:25 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:43:25 UTC Imported #87439 0xcbe5…52ca (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:43:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:43:29 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:44:04 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:04 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:44:15 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:15 UTC Imported #87440 0xad07…dbff (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:44:39 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:39 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:45:14 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:14 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:45:25 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:25 UTC Imported #87441 0x5886…7813 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:45:49 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:49 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 01:00:06 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:06 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:07 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:07 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:17 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:17 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:18 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:18 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:29 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:30 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:30 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:44 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:44 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:45 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:45 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:01:35 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:01:35 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:01:36 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:01:36 UTC 0xfc10…b0ab: encryption session completed
...

Maybe because it is an old commit there is some already solved bug that is causing the issue. We will test now from branch https://github.com/paritytech/parity-ethereum/tree/fix_ss_blocking_wait

@svyatonik
Copy link
Collaborator

Hey, @jcortejoso ! There could have been fixes for PoA sync, for sure, since that commit. Could you please share results of fix_ss_blocking_wait testing (if they're available already)?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants