Skip to content
This repository was archived by the owner on Jul 1, 2021. It is now read-only.

Event bus all-the-things #79

Closed
pipermerriam opened this issue Nov 20, 2018 · 7 comments
Closed

Event bus all-the-things #79

pipermerriam opened this issue Nov 20, 2018 · 7 comments

Comments

@pipermerriam
Copy link
Member

What is wrong?

Our use of the proxy object pattern has had continued fallout and un-intended results. Specific cases that I'm thinking about are:

  • Quickly discover sparse peers with particular subprotocol #1299 has been difficult due to issues with pickle-ability
  • Weird APIs where we are kind-of interacting with a chain object, but not really since it's actually over a proxy object.
  • Weird APIs where we wrap the non-async Chain methods to make them async.
  • Issues with the JSON-RPC server and it needing direct access to the Chain object which produces awkward coupling/dependencies

How can it be fixed

I think we need to go ahead and get everything over onto an explicit event bus pattern. Please discuss.

@pipermerriam
Copy link
Member Author

Here's my top-of-the-head plan.

  1. Database process
  • restrict to just the base db
  • server component using get/set/exists/delete events.
  • client abstraction hides the events internally re-exposing our standard base db API
  1. Chain process
  • exposes Chain and ChainDB APIs
  • server component using events for the Chain APIs that we use.
  • client component can be high level, exposing these APIs using the same Chain interface but using event system under the hood.
  1. Networking process
  • exposes PeerPool and Peer APIs over event bus
  • probably want to do the de-coupling of the Protocol and Peer as a pre-requisit to this.
  • experiment with a high level API for interacting with peers from a client perspective that wraps the event system.
  1. Ethereum Client Process
  • all of the synchronization code.
  • should largely just be a client to the other processes
  • needs to expose a few server type APIs, primarily for consumption by the JSON-RPC server.
  1. JSON-RPC server process
  • should largely just be a client to the other processes

This pattern gives us things like:

  • stronger guarantees about not blocking the networking process (and thus not timing out our peers).
  • sustainable architecture for future plans for ability to run any/all of beacon-chain, mainnet, shards, swarm, pss concurrently.
  • explicitness about what crosses process boundaries.

@cburgdorf
Copy link
Contributor

Database process
server component using get/set/exists/delete events.

I read that as you want one db process and route every db access through the event bus. If that is what you mean then I'm currently 👎 on that because I believe this will have a negative impact on performance. I believe db access needs to be as fast as it can be and funneling that through the event bus will work against that goal.

I think the other route, using an alternative db (e.g. RocksDB) that allows multiple processes to read simultaneously is more promising.

Ethereum Client Process

I see this process mostly as the main process that would host the main EventBus. I believe that most if not all of the remaining logic can be implemented as plugins.

all of the synchronization code.

I think synchronization should become a plugin that runs in an isolated process and expose several sync related events on the event bus. (e.g. NewChainTipEvent etc.) In fact, probably different plugins for light sync, full sync etc. I think we should have building blocks in the main code base but then plugins that run in their own isolated processes can be built out of them. This would also allow better experimentation with different syncing strategies (e.g. if people want to plug in a warp sync, they can) (related: https://github.com/ethereum/py-evm/issues/1020#issuecomment-405853665)

JSON-RPC server process

That's already running in it's own process as an isolated plugin. Did you just mean to list it as a process?


Also, I want to raise awareness that we do not have performance metrics for the EventBus under high load yet. I believe that the current implementation does not handle high frequency traffic well enough just yet. And for some things like routing all db access through it, I'm highly skeptical if it could ever be a suitable option no matter how much we optimize it.

@pipermerriam
Copy link
Member Author

Regarding the database issue,

Agreed that we can and should make whatever compromises are necessary to ensure performance numbers stay high, but worth pointing out that we currently already route all database access over a multiprocessing boundary... but whether the existing mechanism or the event bus have measurably different performance footprints is something we would want to check. My understanding is that the current lahja model is that everything flows through a central bus meaning two hops. Maybe we need to establish an abstraction for direct connections to allow the database interactions to be direct connections with only a single hop.

@cburgdorf
Copy link
Contributor

Maybe we need to establish an abstraction for direct connections to allow the database interactions to be direct connections with only a single hop.

In fact, having the event bus working without that central coordination layer was on my radar as well
ethereum/lahja#1

That said, if there are multiple consumers for an event (which the one producing the event, in general, should not even have knowledge about) then from the perspective of the process that is broadcasting the event, it is more efficient to just have one hop to the central coordinator and then let the central coordinator make n connections to m different endpoints.

But without getting lost in the specific details, there's a lot we can optimize for efficiency. I just wanted to raise awareness that it may not be as efficient yet or at least we don't know.

I honestly think we should iterate on the RocksDB work you've been doing. Did that already make use of the fact that multiple processes could read simultaneously? Or did that just change the db without exploiting that feature yet?

@pipermerriam
Copy link
Member Author

Just had a thought.

If the interactions between synchronization and peers are changed to be event bus based then it gets a lot easier to mock out the peer side of things. This extends to a bunch of other things as well. Maybe the higher level thing here is that the event bus can act as a way to decouple our various sections of functionality from their implementations.

I think this has some positive and negative implications but I wanted to drop this here to make note of it.

@cburgdorf
Copy link
Contributor

@pipermerriam yes, that is absolutely true and one of the main upsides of the eventbus pattern in general. You get very loose coupling. Like, events can become your main interface and it becomes easy to swap out functionality against different implementations.

The downside to this is complexity and "fragility" (not sure if it's the best word to describe it) which can be mitigated with more tests.

I'm a fan of the pattern and have used it in many code bases. Coincidentally I learned today that embark (truffle competitor by status) is using the same pattern for the same reasons (flexibility, plugin all the things)

@cburgdorf
Copy link
Contributor

Just cross-linking that I've started to put a benchmark together and so far it revealed interesting things such as dropped events.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants