Skip to content
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.

Unexpected log output - appears to double sign #314

Closed
mdyring opened this issue Jul 26, 2019 · 14 comments
Closed

Unexpected log output - appears to double sign #314

mdyring opened this issue Jul 26, 2019 · 14 comments
Labels
security Security-critical issues

Comments

@mdyring
Copy link
Contributor

mdyring commented Jul 26, 2019

We're testing an active/active setup of two validators connecting to a single tmkms v0.6.0-rc1 process on the gaia-13004 testnet.

AFAICS in the source, "signed at height" is only logged after a signature is created:https://github.com/tendermint/kms/blob/e5dadf1a7bef351ea78c082e1aefab3db44aedfc/src/session.rs#L221

So it appears tmkms is double signing for the same chain id?

tmkms[9826]: 12:49:39 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height: 55157 (0 ms)
tmkms[9826]: 12:49:39 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height: 55157 (0 ms)
tmkms[9826]: 12:49:39 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height: 55157 (0 ms)
tmkms[9826]: 12:49:39 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height: 55157 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height: 55158 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height: 55158 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height: 55158 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height: 55158 (0 ms)

Our validator is not getting slashed so far.. Maybe because we have yet to propose a block and have been lucky so far (prevoting/commiting on same data on both vals).

@tarcieri
Copy link
Contributor

I should probably have it print out the entire consensus state here rather than just the height. There's also the round and step to consider, which is my guess of what's happening here.

@tarcieri
Copy link
Contributor

PR to add height/round/step logging here: https://github.com/tendermint/kms/pull/316/files

@tarcieri
Copy link
Contributor

@mdyring that's merged to master now... are you capable of building from that to test this out again?

@mdyring
Copy link
Contributor Author

mdyring commented Jul 26, 2019

Thanks, I've tested with master and it seems something is fishy:

tmkms[26754]: 20:25:02 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:02 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:02 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:02 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:20 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59810/0/6 (0 ms)
tmkms[26754]: 20:25:20 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59810/0/6 (0 ms)

I am testing this with softsign, but don't believe it makes a difference. Also happy to share config if you'd like.

@mdyring
Copy link
Contributor Author

mdyring commented Jul 26, 2019

Interestingly, on our current production setup, with a single connection from kms -> val, we see two log files for each block - so also not a line for each step as you'd expect:

tmkms[25012]: 20:31:24 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreVote" at height: 1192058 (142 ms)
tmkms[25012]: 20:31:25 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreCommit" at height: 1192058 (142 ms)
tmkms[25012]: 20:31:31 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreVote" at height: 1192059 (142 ms)
tmkms[25012]: 20:31:32 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreCommit" at height: 1192059 (142 ms)
tmkms[25012]: 20:31:38 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreVote" at height: 1192060 (142 ms)
tmkms[25012]: 20:31:38 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreCommit" at height: 1192060 (142 ms)

@tarcieri
Copy link
Contributor

I'm now guessing both validators are submitting an identical block and requesting it be signed, which is allowed under the current logic (and results in an identical signature, as Ed25519 is deterministic):

https://github.com/tendermint/kms/blob/master/src/chain/state.rs#L97

I can add the block ID being signed to the log output to confirm this.

I think this should be allowed, since it tolerates faults where the KMS signs a block for a validator which goes down. If another validator asks to sign the same block as the original, and the signature algorithm is deterministic, it's really just recomputing the original signature.

Interestingly, on our current production setup, with a single connection from kms -> val, we see two log files for each block

Those are the steps of Tendermint consensus: PreVote -> PreCommit -> Commit (where the first two each require a signature)

@tarcieri tarcieri added the security Security-critical issues label Jul 26, 2019
@tarcieri
Copy link
Contributor

tarcieri commented Jul 26, 2019

PR to add block ID logging and a [dup] flag when the KMS resigns the same block here: #317

@tarcieri
Copy link
Contributor

tarcieri commented Jul 26, 2019

@mdyring want to try it out again with the logging changes from #317 and see what the output is? They now look something like this and will add a [dup] flag when the KMS recomputes the same signature as the last block it signed:

22:47:37 [info] [test_chain_id@127.0.0.1:60313] signed PreVote:736F6D65 h/r/s:12345/2/6 (0 ms)

@mdyring
Copy link
Contributor Author

mdyring commented Jul 27, 2019

Happy to report it is as guessed (identical blocks) :-)

Minor detail: it shows [dup] for same PreVote/PreCommit. Since block id is now logged, I think we can safely remove the [dup] logging as it doesn't add that much value.

tmkms[18934]: 08:52:28 [info] [gaia-13004@35.156.81.246:26659] signed PreVote:48620B69 h/r/s:67611/0/6 (0 ms)
tmkms[18934]: 08:52:28 [info] [gaia-13004@35.158.126.240:26659] signed PreVote:48620B69 h/r/s:67611/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:28 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:48620B69 h/r/s:67611/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:28 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:48620B69 h/r/s:67611/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.156.81.246:26659] signed PreVote:51A5AB0D h/r/s:67612/0/6 (0 ms)
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.158.126.240:26659] signed PreVote:51A5AB0D h/r/s:67612/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:51A5AB0D h/r/s:67612/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:51A5AB0D h/r/s:67612/0/6 (0 ms) [dup]

Thanks for the work on this and it appears we can safely do active/active now with multiple vals and a single KMS. I will increase number of vals for further testing, but feel free to close this issue.

@mdyring
Copy link
Contributor Author

mdyring commented Jul 27, 2019

Btw would be great to add block id logging when there is a problem, such as these:

tmkms[19431]: 11:30:11 [warn] [gaia-13004:35.158.126.240:26659] attempt to double sign at height/round/step: 69251/0/3
tmkms[19431]: 11:30:11 [error] [gaia-13004@tcp://18.196.63.108:26659] attempted double sign: step regression: round regression at height:69251 round:0 last step:6 new step:3

@tarcieri
Copy link
Contributor

Yeah, the logging isn't correct, and making it correct would require changes to the state file, so I think I'll follow your recommendation and remove it for now, but add more block ID logging.

@tarcieri
Copy link
Contributor

@mdyring #319 implemented your suggestions if you want to give it one more go before a final KMS release

@mdyring
Copy link
Contributor Author

mdyring commented Jul 28, 2019

Looks great, thanks!

Some minor details for consistency:

  1. added tcp:// prefix in the "attempted double sign" lines with step regression:
  2. warn is emitted for trying to double sign different blocks, error given for step regression
  3. "attempt to double sign" != "attempted double sign" (can make monitoring easier if consistent)
tmkms[3137]: 09:29:25 [info] [gaia-13004@35.156.81.246:26659] signed Proposal:DE23ECA605 at h/r/s 82953/0/3 (0 ms)
tmkms[3137]: 09:29:25 [warn] [gaia-13004:18.196.63.108:26659] attempt to double sign at h/r/s: 82953/0/3 (DE23ECA605 != AEFFF4498D)
tmkms[3137]: 09:29:25 [warn] [gaia-13004:35.158.126.240:26659] attempt to double sign at h/r/s: 82953/0/3 (DE23ECA605 != 1B867E0652)
tmkms[3137]: 09:29:25 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:(none) at h/r/s 82953/0/6 (0 ms)
tmkms[3137]: 09:29:25 [info] [gaia-13004@18.196.63.108:26659] signed PreCommit:(none) at h/r/s 82953/0/6 (0 ms)
tmkms[3137]: 09:29:25 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:(none) at h/r/s 82953/0/6 (0 ms)
tmkms[3137]: 09:29:25 [error] [gaia-13004@tcp://54.93.169.244:26659] attempted double sign: step regression: round regression at height:82953 round:0 last step:6 new step:3
tmkms[3137]: 09:29:25 [error] [gaia-13004@tcp://18.196.17.223:26659] attempted double sign: step regression: round regression at height:82953 round:0 last step:6 new step:3

@tarcieri
Copy link
Contributor

@mdyring those should all be taken care of in #322

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
security Security-critical issues
Projects
None yet
Development

No branches or pull requests

2 participants