-
Notifications
You must be signed in to change notification settings - Fork 1
cannot restart validator: not connecting to peers #26
Comments
This looks like a "swap" on the AMM was being performed. Do we know what was being traded? |
I extracted the cranks from block 55576, which appears to include the "rights were not conserved" rejection. Here's a map of vatIDs on this chain:
and here are the deliveries in block 55576:
(that may be slightly easier to read if you paste it into a file, then pipe it into |
These errors are a red herring. They are completely harmless (except to the people expecting those user-level messages to succeed) and are printed only because we haven't yet explicitly silenced them when running a validator. The output is debug logging that is completely isolated from the correct functioning of a validator. I agree that they are misleading, and we really should silence them. But the more important question is: why didn't the node ever connect to any of its peers? I would expect to see several network failures after a restart, but eventually things should catch up. Patience is important here. |
For reference, this particular instance of a "rights not conserved" error is sent from the AMM contract (v18) at crank 1346190, in block 55576. I've found 196 instances so far, starting with crank 1322587 at time epoch+1629762640. |
Comment from Chris at Chainflow says he's still out of commission pending a resolution on this issue. We had one more question from Decentralizer with apparently the same issue. What's the guidance on troubleshooting this? |
It appears I'm not picking up any peers at all. Note that the logs above are from a restart after I cleared my addressbook. If I do a restart w/o clearing the addressbook, the logs start here -
|
More context, it looks like I'm slowly picking up peers. I have three peers now and appear to be syncing. FWIW, the
|
I'm splitting out the rights-conservation error into a separate ticket #29, and I'll copy my preliminary analysis over there. |
@michaelfig @dtribble what do you know about our network topology? What is the peer discovery algorithm? |
On this testnet we ran 4 public persistent peers. They were also RPC nodes, so that was interfering with them being able to process blocks. We also ran 2 public seed nodes (which gossip other peers to all nodes that connect to the seeds). My understanding is that this should be adequate, if the nodes could keep up. The absence of "state sync" (Agoric/agoric-sdk#3769) is what makes starting a new validator from scratch take so long, and the problem with ABCI queries being blocked by computing blocks (tendermint/tendermint#6899) caused the public peers to be slow. There may be a deeper underlying problem, but we won't discover that until the above are fixed and we have more testing. Closing for now. |
Describe the bug
After tweaking
commit_timeout
inconfig.toml
, chainflow is getting errors when trying to restart.Additional context
logs: https://hackmd.io/@KFEZk8oMTz6vBlwADz0M4A/S1XlYT-ZK
esp this bit about block 55575:
discord chat: https://discord.com/channels/585576150827532298/819073555446759444/879535698984714290
cc @michaelfig @katelynsills @warner
The text was updated successfully, but these errors were encountered: