Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-msgs: Replace implicit execution with multisig protocol #453

Closed
adlrocha opened this issue Mar 6, 2023 · 12 comments
Closed

Cross-msgs: Replace implicit execution with multisig protocol #453

adlrocha opened this issue Mar 6, 2023 · 12 comments
Assignees
Labels

Comments

@adlrocha
Copy link
Contributor

adlrocha commented Mar 6, 2023

Background

In the MVP of IPC over Eudico, the execution of cross-net messages was performed through implicit execution. The reason for requiring the implicit execution of cross-net messages is that in order to authenticate a cross-net message being proposed in a block, validators (and full-nodes) had to verify that the message was final in the source subnet. Implicit execution as a way to introduce an explicit check for specific messages before being accepted in a block, and conveniently executed through the gateway. This is what we used to verify the validity before being included in a block and executed.

The role of this step in the execution of messages is for all peers to agree on the finality and correctness of the messages to be executed. By including this checks as part of the consensus we remove the need of a succinct proof from validators that can be verified on-chain by actors.

Unfortunately, this design is consensus breaking, so this change would require an upgrade to support cross-net message execution in Filecoin (and other networks). This approach also introduces several issues with the new integration of Lotus with Mir, as even after a cross-net messages has been ordered by Mir, it may end up not being valid after the consensus --perfomed after ordering--.

Initially, we were planning to ship M2 with implicit execution and then implement an alternative protocol for the execution of cross-net messages, but may not be possible anymore.

Proposal

To remove the need of implicit execution, we will rely on a multisig protocol among the validators of the destination subnet to verify the validity of cross-net messages.

Top-down messages

  • All IPC agents from validators in a subnet will be subscribed to a gossipsub topic /ipc/cross-net/<subnet-id>. This can be seen as a new mempool for top-down cross-net messages for each subnet. This mempool is used to propagate information about new unverified top-down cross-net messages in the subnet.

  • For every unverified message seen with destination the subnet-id, the IPC agent for a validator will publish a message in the aforementioned gossipsub topic with their signature over the CID of the message. Validators only perform and broadcast this signature if they consider the message valid for execution in the subnet (i.e. it is valid and final in its originator subnet).

  • Validators in a subnet listen to this updates from other validators in the destination subnet, and collect the signatures of others. When a validator has collected the minimum number of signatures required for the execution of the cross-net message (initially 2/3 although we can make it subnet configurable), it sends a transaction to the apply_mesage of the gateway actor including the multisig and the message to be executed.

  • The gateway will check the multisig from all validators to authenticate the validity of the message removing the need for an off-chain authentication in the consensus stage through the implicit execution of messages.

  • The operation of apply_message stays the same, i.e. it expects cross-net message to be executed sequentially by increasing nonce without gaps. The only change introduced by the actor is this preliminary check, and the fact taht any validator can post a message for execution.

  • Edit with @aakoshh's useful clarification:

To rephrase this, what we are saying is that the validators in the child subnet C identified by subnet-id observe a top-down message in the parent subnet P by running a full node on P and their agent subscribing to it. When an agent, run by validator V on subnet C thinks a certain message M identified by a CID(M) is final on P, they publish their a message (V, CID(M), Sign(V, CID(M))) to the /ipc/cross-msg/<subnet-id> topic.

Bottom-up messages

  • Bottom-up messages are propagated in checkpoints signed by child subnet validators. The commitment of the checkpoint is enough proof for the parent that the messages are final in the child subnet's state and that all the corresponding side-effects there have been triggered successfully.
  • To execute the messages included in a checkpoint, any validator (and actually any EOA), needs to resolve the batch of messages behind the CID of messages and sends a transaction to apply_message with all the batch of messages from the checkpoint. The gateway will check that these messages actually correspond to the cid propagated in the checkpoint in which case they can be immediately executed.

Thus, validators see a new checkpoint with cross_msgs: cid(Vec<Msgs>). They resolve the message through the IPLD relsolver, and send a transaction to apply_message with Vec<Msgs> as argument. The gateway computes CID(Vec<Msgs>) and compares it against the one in the checkpoint to see if they are the same. If they are all messages in the vector are correspondingly executed.

Implementation

The implementation of the protocol involves the following:

  • Implementation of a Gossipsub-based behavior in the agent's libp2p host to handle all the subscription and broadcasting of signed unverified messages. (DRI: @aakoshh?)
  • Cross-net message subsystem in the agent responsible for monitoring new unverified messages directed to the subnets handled by the agent, for collecting the signatures from other validators of unverified messages, and the submission of messages for execution. (DRI: @hmoniz / @cryptoAtwill ?)
  • Modify the gateway actor to accommodate the changes required by the protocol (DRI: @adlrocha)

For the multisig protocol we can use a list of signatures for simplicitly initially, but in Filecoin we already have support for BLS signatures, so we can leverage them to submit more succint proofs where the multisig is an aggregation of the BLS signatures of the validators.

Related

Related issue in the IPC agent: https://github.com/consensus-shipyard/ipc-agent/issues/39 .We can either re-use the same topic to brodcast signed proposals, or use an independent topic.

While exploring this solution I came across ERC4337. Our problem seems a bit narrower than the one tacked by the ERC but we end up with a similar solution.

@aakoshh
Copy link
Contributor

aakoshh commented Mar 10, 2023

For every unverified message seen with destination the subnet-id, the IPC agent for a validator will publish a message in the aforementioned gossipsub topic with their signature over the CID of the message. Validators only perform and broadcast this signature if they consider the message valid for execution in the subnet (i.e. it is valid and final in its originator subnet).

To rephrase this, what we are saying is that the validators in the child subnet C identified by subnet-id observe a top-down message in the parent subnet P by running a full node on P and their agent subscribing to it. When an agent, run by validator V on subnet C thinks a certain message M identified by a CID(M) is final on P, they publish their a message (V, CID(M), Sign(V, CID(M))) to the /ipc/cross-msg/<subnet-id> topic.

If a validator doesn't recognise CID(M) in the message, they can discard it immediately. It would mean that they are either fallen way behind the others, they are on a different fork, or that the CID is adversarial. Alternatively the message could include some location identifier on the source blockchain, to make it easier to find. In any case, it would be up to the other, hopefully in-sync agents to collect the signatures.

The signatures will have to be periodically re-published until the agent sees that the message has been included in the child subnet. That's because with Gossipsub, the agent can never be sure who got their signature, and at least in theory it is possible that other agents weren't connected at the time, and a transaction doesn't gather enough signatures at the right time, or previous signatures are forgotten due to restarts. Also because of the previous point - an agent might simply not recognise a CID at the time it gets it. A more transparent and persistent solution would be to send the individual signatures supporting a top-down message to the ledger. That way an agent could always see that theirs is missing, and add it.

@aakoshh
Copy link
Contributor

aakoshh commented Mar 13, 2023

I would suggest that the top-down messages are also bundled, like the checkpoints, something like: top_down_msgs: HashMap<SubnetID, Cid<Vec<Msgs>>> in each block. This way subnet validators don't have to vote on each separate message.

@ranchalp
Copy link

Regarding bundling messages and as a solution to the multisig protocol, perhaps the topdown messages that are locally seen by the IPC agent at the parent as valid could be sent directly to a Trantor's mempool abstraction at the child that receives requests from Lotus but also from the IPC agent (not sure if such transaction already exists in Spacenet). The availability module of Trantor already tries to get a multisig for the block anyways, so it should be easy to implement with the mempool abstraction.

@aakoshh
Copy link
Contributor

aakoshh commented Mar 13, 2023

@ranchalp that sounds similar to how I want to use Tendermint's voting mechanism to agree on when a top-down message can be included in a block, but better because you don't have to put it on the critical path. Can you modify the availability voting to only cast a vote when the message is observed as final, not just that it's available?

@ranchalp
Copy link

ranchalp commented Mar 13, 2023

@aakoshh That's what I meant by the mempool abstraction. The mempool should provide those messages only when seen as final at the parent (or even the IPC agent should not provide them to the mempool until they're final), and for verification it should be easy to abstract a trantor event to call the mempool (or IPC directly from the availability module). Happy to get on this, but would need a bit of onboarding (or just a pointer to some tutorial/PR/relevant code) to understand the current functioning of topdown msgs in the code.

@adlrocha
Copy link
Contributor Author

I would suggest that the top-down messages are also bundled, like the checkpoints, something like: top_down_msgs: HashMap<SubnetID, Cid<Vec>> in each block. This way subnet validators don't have to vote on each separate message.

This would be great, the problem I personally have with this is deciding the logic to bundle the messages together. In bottom-up messages is clear because the checkpoint does the cutoff, but in top-down messages there is no clear cutoff.

@aakoshh
Copy link
Contributor

aakoshh commented Mar 13, 2023

@ranchalp it sounds like the mempool in this case doesn't do any gossiping, unlike the Lotus mempool for example, right?

It's okay to provide the transaction to the mempool when it's final, or for the mempool not to recommend it for inclusion in a block until it's final; we just have to make sure that if an adversarial validator cannot do anything with it either until it's final. If the availability voting can be made to work that way, great!

@adlrocha I would have thought bundling can happen on a 1 bundle per block basis. Probably not going to achieve much reduction on the number of messages, but still.

@ranchalp
Copy link

No, the Trantor (current) simplemempool does not do any gossiping, which iiuc makes it ideal for this. And yes, the existing availability module with verification on a local check of finality should suffice.

@ranchalp
Copy link

ranchalp commented Mar 13, 2023

This would be great, the problem I personally have with this is deciding the logic to bundle the messages together. In bottom-up messages is clear because the checkpoint does the cutoff, but in top-down messages there is no clear cutoff.

This is the beauty of using Trantor's availability module, batching will happen out of the box

@adlrocha
Copy link
Contributor Author

Update for the execution of top-down messages in M2

This is the outcome after syncing with @matejpavlovic and @hmoniz. This isn't, by any means, the best solution, but it is what we think we can have for M2 considering the tight deadline

To avoid the need of implicit execution or ad-hoc consensus checks, the idea is for IPC agents of validators to submit periodically to the child gateway a "top-down checkpoint" (which we can call e.g. cron checkpoint to differentiate it from bottom-up checkpoints). These top-down/cron checkpoints are only submitted by a validator in the child subnet when it sees the epoch for the checkpoint final in the parent, and it includes information about the latest membership in that epoch, and the list of top-down messages queued for execution since the last cron checkpoint. When the gateway of the checkpoint receives 2f+1 of these checkpoints it triggers the execution of the top-down messages. This can also be potentially leveraged by Mir's reconfiguration without the need of sequenced membership sets (as we have now).

Implementation

The implementation has two parts:

Actors

  • A new configuration parameter for subnets, cron_period that determines how often cron checkpoints will be submitted by validator in the child subnet. This configuration is chosen when creating the subnet and propagated to the gateway of the child subnet when it is created in genesis. This deprecates the need of the finality_threshold parameter.
  • We also need a new genesisEpoch set in the gateway in the subnets' genesis to track the epoch when the subnet started to exist and from which we will start submitting new cron checkpoints.
  • A new submit_cron in the gateway actor that receives a CronCheckpoint with the information mentioned above. A cron for a cron_epoch is only accepted once 2f+1 validators in the child have send their own cron checkpoints and are the same. Once a cron checkpoint is accepted all the top-down messages in them are immediately executed.
pub CronCheckpoint struct {
  pub epoch: ChainEpoch
  pub membership: MembershipSet
  pub top_down_msgs: Vec<StorableMsg>
}

Cross-message orchestrator

The orchestrator process running in the IPC agent that tracks the state of the parent and submits cron checkpoints.

  • It periodically waits for the next cron epoch to be final in the parent.
  • It picks up the latest membership in that epoch and the new top-down messages queued for execution since the last cron checkpoint submitted.
  • It submits the new cron checkpoint to the child subnet.

@matejpavlovic
Copy link

If I understand it correctly, it's basically on-chain voting of the child validators for the last observed parent state and top-down transactions to execute. This is exactly what we currently use for reconfiguration. I think it should work, just make sure you catch all the corner cases when the votes arrive at the child in an arbitrary order (especially "older" cron checkpoints after a "newer" one).

Also note that this can probably be implemented in the integration code (i.e. "between" Eudico and Mir, like the reconfiguration), in case that happens to be easier to do. You can look at @dnkolegov's implementation of child reconfiguration if you choose that path.

@adlrocha
Copy link
Contributor Author

adlrocha commented Mar 14, 2023

just make sure you catch all the corner cases when the votes arrive at the child in an arbitrary order (especially "older" cron checkpoints after a "newer" one).

This shouldn't be a problem, all votes are cached and are not garbage collected until they are executed, and top-down messages need to be executed in order, which means that the order of the cron checkpoints is also kept.

Also note that this can probably be implemented in the integration code (i.e. "between" Eudico and Mir, like the reconfiguration), in case that happens to be easier to do. You can look at @dnkolegov's implementation of child reconfiguration if you choose that path.

Unfortunately, this is not the case, as for this to work we would require the use of implicit execution (back to the original problem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants