Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for external coordinator mode #13

Conversation

bendanzhentan
Copy link
Contributor

@bendanzhentan bendanzhentan commented Jun 28, 2023

The Optimism sequencer is single-entity and the current implementation does not include a fail-over mechanism. Which means that the current Optimism sequencer is a single point of failure. If it goes down, the whole optimistic rollup stops producing blocks.

External Coordinator comes to prevent this from happening, it ensures that only one instance of Optimism sequencer is producing and if it goes down, it will be replaced by another instance.

This PR adds support for optional external coordinator mode, by providing the following flags,

   --sequencer.coordinator.enabled                       Enable the external coordinator mode [$OP_NODE_COORDINATOR_ENABLED]
   --sequencer.coordinator.addr value                    Coordinator listening address [$OP_NODE_COORDINATOR_ADDR]
   --sequencer.coordinator.sequencer-id value            the sequencer id configured in the Coordinator, used by Coordinator to distinguish different sequencers [$OP_NODE_COORDINATOR_SEQUENCER_ID]

e.g.

    --sequencer.coordinator.enabled=true
    --sequencer.coordinator.sequencer-id="seq0"
    --sequencer.coordinator.addr="http://127.0.0.1:9762"

In order to produce new unsafe blocks, the sequencer must request building-block permission from the coordinator first.

op-node/node/node.go Outdated Show resolved Hide resolved
op-node/node/config.go Outdated Show resolved Hide resolved
op-node/rollup/driver/sequencer.go Outdated Show resolved Hide resolved
@bendanzhentan bendanzhentan requested a review from bnoieh June 30, 2023 02:20
@bendanzhentan bendanzhentan force-pushed the opbnb-testnet-op-coordinator branch from 17113eb to 6ed426b Compare June 30, 2023 03:32
@bendanzhentan
Copy link
Contributor Author

I have dropped the previous first one commit 845168fae7527aa3e8b327dbf244d12002d7deb6 "feat: add API admin_sequencerStopped" since the Optimism team has added a similar API admin_sequencerActive yesterday ethereum-optimism/optimism#6190

Comment on lines +252 to +256
if err := d.CanStartBuildingBlock(); err != nil {
d.log.Error("sequencer is not allowed to build block", "err", err)
d.nextAction = d.timeNow().Add(time.Second)
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have some questions about coordinator logic:

  1. should cancel the whole sequencer action(not just startBuildingBlock step) if cooridinatorClient return error?
  2. need move this part in front of case <-sequencerCh:? in other word, we should not run into case <-sequencerCh: if node is inactive sequencer
  3. if cooridinatorClient return error, what's the proper frequency to check if cooridinatorClient return ok in future?
  4. if the node is not active sequencer, seems the SequencerActive api will return true?
  5. there's a case could result in blockchain fork, if the active sequencer was turned inactive by coordinator in the middle of a block building(startBuildingBlock finished, completeBuildingBlock not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bnoieh, thanks for your review.

1/2. I agree with your comments on (1) and (2). If we move the check coordinator_startBuildingBlock to the front of case <- sequencerCh, we can achieve the behavior of "inactive sequencer not entering case <- sequencerCh" and cancel all sequencer actions. This suggestion is great, and I will update it as soon as possible.
3. The change is related to 1/2. I will make some experiments and then reply.
4. admin_sequencerActive should return false for inactive sequencers.
5. It's a big question. In simple terms, the current HA mechanism cannot ensure to prevent fork 100% effectively.

if err != nil {
return fmt.Errorf("failed to call coordinator_requestBuildingBlock: %w", err)
}
if respErr != nil {
Copy link
Collaborator

@owen-reorg owen-reorg Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error is passed by value in golang. So I guess respErr will always be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about it. Would you be able to confirm?

@bendanzhentan bendanzhentan marked this pull request as draft July 5, 2023 04:22
Returns true if the node is actively sequencing, otherwise false. This is essentially a parallel with the `eth_mining` API from execution clients.
@github-actions
Copy link
Contributor

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jul 21, 2023
@github-actions github-actions bot closed this Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants