Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomicity bug while adding validators using validator client API #4984

Open
michaelsproul opened this issue Dec 6, 2023 · 4 comments
Open
Labels
bug Something isn't working val-client Relates to the validator client binary

Comments

@michaelsproul
Copy link
Member

Description

Found by @chong-he:

If the certificate for a web3signer validator is invalid while adding a validator using /lighthouse/validators/web3signer then the validator will get added in-memory but will not be activated and will not be persisted in the validator definitions on disk. Attempting to re-add the same key with correct values then fails with a duplicate key error 💀

E.g.

Start the VC:

lighthouse vc --http

Create a file named request.json:

[
    {
        "enable": true,
        "description": "validator_one",
        "graffiti": "Mr F was here",
        "suggested_fee_recipient": "0xa2e334e71511686bcfe38bb3ee1ad8f6babcc03d",
        "voting_public_key": "0xa062f95fee747144d5e511940624bc6546509eeaeae9383257a9c43e7ddc58c17c2bab4ae62053122184c381b90db380",
        "url": "http://path-to-web3signer.com",
        "root_certificate_path": "/path/on/vc/filesystem/to/certificate.pem",
        "request_timeout_ms": 12000
    }
]

Make the request once (ensuring certificate.pem does not exist or is invalid):

curl -X POST --data @request.json -H "Content-Type: application/json" -H "Authorization: Bearer "(cat ~/.lighthouse/mainnet/validators/api-token.txt) "http://localhost:5062/lighthouse/validators/web3signer"
{"code":500,"message":"INTERNAL_SERVER_ERROR: failed to initialize validator: \"Unable to add definition: InvalidWeb3SignerRootCertificate(reqwest::Error { kind: Builder, source: Error { code: -50, message: \\\"One or more parameters passed to a function were not valid.\\\" } })\"","stacktraces":[]}

Make the request again:

curl -X POST --data @request.json -H "Content-Type: application/json" -H "Authorization: Bearer "(cat ~/.lighthouse/mainnet/validators/api-token.txt) "http://localhost:5062/lighthouse/validators/web3signer"
{"code":500,"message":"INTERNAL_SERVER_ERROR: failed to initialize validator: \"Unable to add definition: DuplicatePublicKey\"","stacktraces":[]}

We shouldn't get duplicate validator here, because the validator didn't get added successfully. The VC still logs that no validators are active:

INFO No validators present msg: see lighthouse vm create --help or the HTTP API documentation, service: notifier

However we see that the validator is stored in memory, which is why we get the duplicate error:

curl -s -H "Content-Type: application/json" -H "Authorization: Bearer "(cat ~/.lighthouse/mainnet/validators/api-token.txt) "http://localhost:5062/eth/v1/remotekeys" | jq
{
  "data": [
    {
      "pubkey": "0xa062f95fee747144d5e511940624bc6546509eeaeae9383257a9c43e7ddc58c17c2bab4ae62053122184c381b90db380",
      "url": "http://path-to-web3signer.com",
      "readonly": false
    }
  ]
}

Version

Lighthouse v4.5.0

Steps to resolve

Harden the atomicity of adding validators. We could use copy-on-write followed by compare-and-set (my favourite), or something like adding the validator initially disabled, then switching it to enabled once everything has succeeded. Or we could just apply stricter pre-validation before we go mutating anything (although this could still cause a failure if a disk op fails, etc).

@michaelsproul michaelsproul added bug Something isn't working val-client Relates to the validator client binary labels Dec 6, 2023
@chong-he
Copy link
Member

Upon testing, client_identity_path and client_identity_password are having the same bug, i.e., if the first import via HTTP API fails, a second try will show DuplicatePublicKey.

The HTTP API works well if the first import is successful and it gets added to the validator_definitions.yml file as expected.

@v4lproik
Copy link
Contributor

I would love to take this up if some help is needed :)

@michaelsproul
Copy link
Member Author

@v4lproik This issue is quite open-ended and would require a bit of exploration of the design space to solve fully, I think. My current thinking is that we should adopt some approach like software transaction memory or copy-on-write to make our implementation more obviously correct. This would require overhauling some of the datastructures to make them cheaper to copy (persistent data structures).

Then the transaction process would be something like:

  1. Start an in-memory transaction by copying the initialized validators (through an upgradeable rwlock perhaps)
  2. Make the desired mutations; abort if any fail (all changes reverted)
  3. Write to disk
  4. Commit the in-memory transaction by writing the new version of the data (through the upgraded rwlock).

I'm a big fan of CoW though, so I'm a bit biased. It would also be possible to solve this issue by just handling the specific cases highlighted above better, e.g. by doing all operations that can fail (e.g. reading from disk) prior to updating anything permanently in-memory.

Another issue that's tangentially related (and might make testing these code paths more straight-forward) is #4854. That could be a good one to start with to dip your toes in the VC API, perhaps with an eye to solving this issue too.

@v4lproik
Copy link
Contributor

Sounds good! Thank you very much for the clear explanation! I will get on with #4854 instead!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working val-client Relates to the validator client binary
Projects
None yet
Development

No branches or pull requests

3 participants