Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relay monitoring & preventing continued relay errors #142

Open
metachris opened this issue Jun 9, 2022 · 12 comments
Open

Relay monitoring & preventing continued relay errors #142

metachris opened this issue Jun 9, 2022 · 12 comments
Labels
brainstorming Currently in discussion relay
Milestone

Comments

@metachris
Copy link
Collaborator

metachris commented Jun 9, 2022

Once a proposer calls submitBlindedBlock to a relay (with a signed header), it depends on the relay to release the block to be able to propose anything (no fallback to a local block is possible at that point due to possible slashing).

There's several relay error scenarios:

  1. payload withholding (relay doesn't release the payload and the proposer needs to forfeit the slot)
  2. incorrect payload
    a. incorrect value (the final amount paid by the builder to the proposer was different to the amount claimed in the BuilderBid)
    b. invalid block (invalid data / fields)

Question: How can we shield proposers from faulty relays, and how to prevent continuous slots with errors due to faulty relay behaviour?

A possible solution is a monitoring service run by a trusted third-party, which we can call Relay Monitor (RM).

  1. Whenever mev-boost calls submitBlindedBlock to a relay, it also sends a request to the RM, including the SignedBuilderBid, the relay it originated from, and the submitBlindedBlock body.
  2. The RM will also request the payload from the relay
  3. Thus the RM can check
    a. whether the payload is withheld
    b. whether the block matches the bid

If there is any problem, the relay's scoring/reputation is be updated in the RM, and propagated to all connected proposers (by mev-boost polling the relay status endpoint, maybe also push as an option). If any relay behaves incorrectly, all connected proposers can ignore faulty relays for some time. (reputation mechanism TBD).

This (centralised) service can be put into production quickly, and can mitigate a range of issues resulting from faulty relays. It should be run by a trusted party, and could be replaced in the mid- to longer term with a more decentralized/trustless solution.

TBD:

  • Reputation mechanics: what exactly happens on a single instance of any of the errors?
  • Who should run a relay monitor, and how many instances are the sweet spot? There's an argument for having a small number, because (a) the more proposers connect to it the more it knows about relay issues, and (b) is has a lot of "power" in that it can blacklist relays.

Tl;dr: A relay monitor could observe any relay problem a validator experiences, and can tell all the other connected validators about problems with a specific relay. Thus, if a relay causes a problem with one validator, all the other connected validators would immediately know, and could avoid that relay for some time (or whatever mechanic).

@metachris
Copy link
Collaborator Author

metachris commented Jun 10, 2022

Here's a rough diagram outlining the setup (src):

Untitled-2022-06-10-0925

@lightclient
Copy link
Collaborator

Not sure if you saw this doc by Yoav, but I think he has a nice outline for what should be done to keep relays "honest": https://notes.ethereum.org/@yoav/BJeOQ8rI5

A few general thoughts here:

  • I think the RM should really be multiple actors, otherwise we're not all the much better off than with a relay. Just trusting a different person.
  • Reputation mechanics -- I think withholding should pretty much be insta-ban. Not sure about invalid block, but also feels like a pretty bad fault.

@metachris
Copy link
Collaborator Author

https://notes.ethereum.org/@yoav/BJeOQ8rI5

I think the RM should really be multiple actors, otherwise we're not all the much better off than with a relay. Just trusting a different person.

Good link, it states the problem clearly and hints at a solution based on a committee. It's not yet clear how such a committee would work.

A decentralized setup would definitely be great, and I can see that as a possible next step. It does seems to first require a bunch of work on specification, research and prototyping, to explore the consensus protocol, committee duties and repercussions for malicious behavior.

Reputation mechanics -- I think withholding should pretty much be insta-ban. Not sure about invalid block, but also feels like a pretty bad fault.

Withholding once could actually be a networking issue, I don't think that should be a permaban instantly. Maybe banning for a few hours at first would suffice, and increasing penalties for repeated offenses 🤔

@metachris metachris added the brainstorming Currently in discussion label Jun 25, 2022
@terencechain
Copy link
Collaborator

Could relay reply different responses to RM than to mev-boost? I don't see why relay would do this, but just a thought

@metachris
Copy link
Collaborator Author

metachris commented Jul 1, 2022

I don't think a relay has a reliable way to distinguish relay vs monitor 🤔 The payload is the same, although maybe the request profile over time is different...

@MicahZoltu
Copy link

I would like to see one minor change, which is that the proposer node can connect to multiple monitors (not just one), and the monitors can connect and talk to each other. While a full gossip network would be ideal, just having a hub and spoke (decentralized) system where clients can connect to multiple hubs is probably "good enough" to buy us time until a more complete gossip network can be setup and secured.

@kailinr kailinr added the relay label Aug 1, 2022
@come-maiz come-maiz added this to the safer-merge milestone Aug 2, 2022
@ralexstokes
Copy link
Collaborator

I don't think a relay has a reliable way to distinguish relay vs monitor 🤔 The payload is the same, although maybe the request profile over time is different...

esp if monitors are "trusted third parties" then they will be well-known entities with fairly fixed IPs, relays could definitely use this to discriminate responses although I don't see how this could be gamed right now

@metachris
Copy link
Collaborator Author

Linking the current design doc by @ralexstokes: https://hackmd.io/@ralexstokes/SynPJN_pq

@sambacha
Copy link

sambacha commented Aug 28, 2022

For the SecureRpc Relay we are operating, here are some metrics/KPIs that are collected and some diagrams (out of date, but should be slightly helpful)

@kailinr
Copy link
Contributor

kailinr commented Sep 6, 2022

Linking Alex's relay monitor implementation

@MoeMahhouk
Copy link

MoeMahhouk commented Sep 7, 2023

Once a proposer calls submitBlindedBlock to a relay (with a signed header), it depends on the relay to release the block to be able to propose anything (no fallback to a local block is possible at that point due to possible slashing).

There's several relay error scenarios:

  1. payload withholding (relay doesn't release the payload and the proposer needs to forfeit the slot)
  2. incorrect payload
    a. incorrect value (the final amount paid by the builder to the proposer was different to the amount claimed in the BuilderBid)
    b. invalid block (invalid data / fields)

Question: How can we shield proposers from faulty relays, and how to prevent continuous slots with errors due to faulty relay behaviour?

A possible solution is a monitoring service run by a trusted third-party, which we can call Relay Monitor (RM).

  1. Whenever mev-boost calls submitBlindedBlock to a relay, it also sends a request to the RM, including the SignedBuilderBid, the relay it originated from, and the submitBlindedBlock body.
  2. The RM will also request the payload from the relay
  3. Thus the RM can check
    a. whether the payload is withheld
    b. whether the block matches the bid

If there is any problem, the relay's scoring/reputation is be updated in the RM, and propagated to all connected proposers (by mev-boost polling the relay status endpoint, maybe also push as an option). If any relay behaves incorrectly, all connected proposers can ignore faulty relays for some time. (reputation mechanism TBD).

This (centralised) service can be put into production quickly, and can mitigate a range of issues resulting from faulty relays. It should be run by a trusted party, and could be replaced in the mid- to longer term with a more decentralized/trustless solution.

TBD:

  • Reputation mechanics: what exactly happens on a single instance of any of the errors?
  • Who should run a relay monitor, and how many instances are the sweet spot? There's an argument for having a small number, because (a) the more proposers connect to it the more it knows about relay issues, and (b) is has a lot of "power" in that it can blacklist relays.

Tl;dr: A relay monitor could observe any relay problem a validator experiences, and can tell all the other connected validators about problems with a specific relay. Thus, if a relay causes a problem with one validator, all the other connected validators would immediately know, and could avoid that relay for some time (or whatever mechanic).

Please correct me if I'm wrong, but wouldn't the afformentioned issues be tackled by applying a BFT system for the proposer <-> relay interaction?
I.e., since the error scenarios stated here show that a relay can behave in a byzantine manner, maybe a BFT system where a proposer can send the request to (3f+1) relay nodes and based on a consensus retrieve the correct answer and avoid the issues stated above.
BFT would ensure strong consistency regarding latest payloads and correct values compared to the eventual consistency of gossip network consensus model. Furthermore, the trust model of a BFT system is stricter than gossip network.
On the other hand, the communication overhead of the gossip network is less than the one from BFT.

In Both cases, the caveat here is that it would possible increase the complexity and degrade the performance as a tradeoff.

PS. please excuse my limited knowledge in BFT/Gossip systems in case what I mentioned above is incorrect.

@ralexstokes
Copy link
Collaborator

this is an interesting avenue for exploration; however, the prevailing model is that relays are not guaranteed to share any of their bids/data so having a BFT style approach across disparate actors doesn't really make sense...

there has been a thread we have been dancing around on the mev-boost community calls for some time that points towards a different model where independent entities do just run some kind of "relay" node and then builders are expected to publish to all of them, e.g. over some kind of gossip net -- and in this case we could imagine some kind of consensus over the "bid pool" that reduces room for byzantine behavior

that being said, the "live" pathways of the relay are incredibly latency sensitive so unless the consensus process really brought substantial benefits I think it would be hard to get adoption

and I think we'd also need to move towards more of an optimistic regime, see something like v3 here: https://github.com/michaelneuder/optimistic-relay-documentation/blob/main/towards-epbs.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorming Currently in discussion relay
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

9 participants