Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper Connection Management for ETH JSON-RPC Client for Startup and Runtime #10498

Merged
merged 16 commits into from
Apr 9, 2022

Conversation

rauljordan
Copy link
Contributor

@rauljordan rauljordan commented Apr 7, 2022

What type of PR is this?

Bug fix

What does this PR do? Why is it needed?

With the merge around the corner, Prysm will need a connection to an execution client, even on startup. Currently, the connection to the execution client is something that happens in the background and is mostly used to retrieve deposit lgos from the Validator Deposit Contract. This needs to come front and center for the merge. However, we cannot afford to make a breaking change to current Prysm users until we ship Prysm v3.

Moreover, we need to make sure that whatever solution we pick, we do not crash nor shutdown Prysm. That is, Prysm should be able to continue even if the RPC connection goes down. If this occurs, we should just error where the calls are made to the execution client, and attempt a reconnection and proceed normally.

This PR is tested on both kiln and mainnet. On mainnet, the behavior remains the same as today. On Kiln, if the RPC connection is not found, the node can continue as normal, but will typically get stuck in a sync loop until the connection is restored. The node can fully sync after the connection is restored.

IMPORTANT CHANGES

  • For this approach to work, Prysm needs to at least define a default value for an execution endpoint. This PR sets it as localhost:8545

@rauljordan rauljordan requested a review from a team as a code owner April 7, 2022 22:47
@@ -0,0 +1,166 @@
package powchain
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file consolidates connection management functions and refactors for nicer separation of concerns

)

func (s *Service) setupExecutionClientConnections(ctx context.Context, currEndpoint network.Endpoint) error {
client, err := newRPCClientWithAuth(currEndpoint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialization of an RPC client does not dial until we make an actual RPC call, so we then set the client as part of the service struct and avoid panics, as it will not be nil

@rauljordan rauljordan self-assigned this Apr 8, 2022
@rauljordan rauljordan added Ready For Review Priority: High High priority item Merge PRs related to the great milestone the merge labels Apr 8, 2022
@rauljordan rauljordan requested a review from nisdas April 8, 2022 02:17
Copy link
Member

@nisdas nisdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me for the most part, in the future we probably will need to remove support for multiple execution nodes as it becomes tricky to handle.

@@ -15,7 +15,7 @@ var (
HTTPWeb3ProviderFlag = &cli.StringFlag{
Name: "http-web3provider",
Usage: "A mainchain web3 provider string http endpoint. Can contain auth header as well in the format --http-web3provider=\"https://goerli.infura.io/v3/xxxx,Basic xxx\" for project secret (base64 encoded) and --http-web3provider=\"https://goerli.infura.io/v3/xxxx,Bearer xxx\" for jwt use",
Value: "",
Value: "http://localhost:8545",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to change this ? it will break the case when we have a provided genesis state and want to sync the beacon node from scratch without an eth1 endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved offline: we discussed this will not break the case with the provided genesis state, and confirmed Prysm can still sync fine on mainnet to align with current behavior

@nisdas nisdas merged commit dd5995b into develop Apr 9, 2022
@delete-merged-branch delete-merged-branch bot deleted the robust-connection-management branch April 9, 2022 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge PRs related to the great milestone the merge Priority: High High priority item
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants