Refactor consensus engine handling as well as clique internals #1899

cburgdorf · 2019-12-12T12:05:52Z

What was wrong?

This is another attempt to refactor Clique based on #1875 and what @carver and I discussed offline.
There's also a Trinity PR that can sync Görli with this: ethereum/trinity#1196

How was it fixed?

validate_seal and validate_header are now instance methods. The only reason they can be classmethods today is because our Pow implementation relies on a globally shared cache which should be refactored as described in Use ConsensusContextAPI to get rid of global cache currently used in PoW implementation. #1900
There are two new methods: chain.validate_chain_extension(header, parents) and vm.validate_seal_extension. They are to perform extension seal checks to support consensus schemes where headers can not be checked if parents are missing.
The consensus mechanism is now abstracted via ConsensusAPI and ConsensusContextAPI. VMs instantiate a consensus api based on the set consensus_classand pass it a context which they receive from the chain upon instantiation. The chain instantiates the consensus context api based on the consensus_context_class.

To-Do

Clean up commit history

Add entry to the release notes

Cute Animal Picture

cburgdorf · 2019-12-12T12:43:38Z

eth/consensus/clique/snapshot_manager.py

@@ -59,27 +58,28 @@ class SnapshotManager:
    """

    logger = get_extended_debug_logger('eth.consensus.clique.snapshot_manager.SnapshotManager')
+    # TODO: How do we get around this? It is way too slow without cached snapshots.
+    _snapshots = lru.LRU(IN_MEMORY_SNAPSHOTS)


@carver I've hit another road block here with the stateless ConsensusAPI concept. I'm sorry I didn't notice this earlier. Here it goes:

So, we were able to get rid of the HeaderCache because that could be replaced by going from validate_seal(header) to validate_seal_extension(header, parents). I should also note that other clients don't necessarily have a header cache in their clique implementation and that this was just me trying to make things work with just validate_seal that only takes a single header.

But the snapshot cache is something different. Every Clique implementation out there has this (afaik) and I believe it is at the very heart of clique so I'm not sure how we would get around it.

It basically means: If we don't cache snapshots then with every single header that we validate, we will have to load all previous header from the database until we hit the most recent checkpoint header (every 30000 header for Görli). As you can imagine, this is super super slow.

As a band-aid fix I moved this to the class level but that obviously isn't an acceptable solution. I can't help myself but think we have to move at least parts this back to the chain level unless you have other clever ideas that I might not be seeing.

I want to note that if you feel like playing with this (you can run it with ethereum/trinity#1196 as simple as trinity --goerli --disable-tx-pool) then I'm happy if you take this in any direction that gets us closer to a merge. I'm not trying to offload this to you, just saying that if you have ideas then you should not feel obligated to discuss them first if you think you could just take a stab on it and finish it up. I basically just want to see this wrapped up and out of my way 😅

Ah, okay. Silly question: what if we persisted every snapshot?

An alternative that comes to mind is having the chain pass in a volatile data store. Much like how it passes in the DB, but with the expectation that everything in the volatile store will be dropped on shutdown. (Also, that the VMs need to manage cross-fork incompatibility across the volatile store themselves, like they need to do with the DB already)

what if we persisted every snapshot?

Multiple problems I think:

Lot's of overhead in snapshot creation

Still one db lookup per block to get the snapshot

Lots of wasted disk space that we need to clean up again

Much like how it passes in the DB, but with the expectation that everything in the volatile store will be dropped on shutdown.

Yes, dropping the data on shutdown won't be a problem I think even dropping them between forks wouldn't be a problem at all. It would just be one header after the fork that causes us to suck in 30000 headers from the db until we are back to normal.

So, yeah, that would work but I think it can not be just some predefined storage because e.g. in this case we want to setup an LRU cache for the snapshots and in other cases we might needs some different kind of storage or eviction polity.

But actually, this brought me to some idea and I took a stab at it and it seems to work. Here's a rough outline and then I continue with some inline comments.

Instead of the ConsensusAPI taking in the db, it will take in a ConsensusContextAPI.

class ConsensusAPI(ABC): @abstractmethod def __init__(self, context: ConsensusContextAPI) -> None: """ Initialize the consensus api. """ ...

We set the consensus_context_class on the chain. And just like the ChainDB the consensus_context is set once when the chain is initialized.

class Chain(BaseChain): ... chaindb_class: Type[ChainDatabaseAPI] = ChainDB consensus_context_class: Type[ConsensusContextAPI] = ConsensusContext def __init__(self, base_db: AtomicDatabaseAPI) -> None: ... self.chaindb = self.get_chaindb_class()(base_db) self.consensus_context = self.consensus_context_class(self.chaindb.db) self.headerdb = HeaderDB(base_db)

Then later, in get_vm(at_header) that context is being used.

def get_vm(self, at_header: BlockHeaderAPI = None) -> VirtualMachineAPI: header = self.ensure_header(at_header) vm_class = self.get_vm_class_for_block_number(header.block_number) chain_context = ChainContext(self.chain_id) return vm_class( header=header, chaindb=self.chaindb, chain_context=chain_context, consensus_context=self.consensus_context )

So, the bottom line is:

we continue to have VM instances that do only operate on the state that they get passed in

we have consensus context state that is per-chain rather than per-vm

we do not have any state leaking (as in: two different chain instances accidentially sharing consensus state).

There's a lot to clean up here but you can take a look at the most recent commit which implements this and also another one on the Trinity side that takes it on a ride. It works and I think it kinds gives us the best of both worlds.

I have a good feeling with that, hoping you feel the same vibe 😅
If you are 👍 on that direction than I can clean up the mess and propose a clean PR.

Yeah, this feels like a good direction 👍 🎉

I also like the idea that maybe we can enforce the reset of the context across fork boundaries. For example, we might create an independent consensus_context for each distinct VM, so in get_vm() instead of just getting self.consensus_context we get self.get_consensus_context(vm_class) which is generated on demand, but only has one per VM.

PoW could use this too! There is an epoch cache that it doesn't want to regenerate every time, and I think it does something hacky to save them (like keep a global cache 🙈 ). Obviously that change doesn't need to go in here, but a tracking issue would be cool.

PoW could use this too! There is an epoch cache that it doesn't want to regenerate every time, and I think it does something hacky to save them

Ah yes! We can use that to clean this up :)

I also like the idea that maybe we can enforce the reset of the context across fork boundaries

Yeah, we could do that if we need it but I think we should have a good reason to do that. Like in, the Clique case it would just cause us to take more time to validate the header after each fork transition while not providing any actual benefit. But sure there could be other consensus schemes that would need a different context per fork.

Ok, I'm gonna start getting this into shape now :)

Tracking issue: #1900

cburgdorf · 2019-12-16T14:11:41Z

@carver This can be given a new review now. For a second I thought we could actually roll back to having validate_seal and validate_header become classmethods again now that we have validate_seal_extension but that's actually not the case because, as you noted, the only reason they work as classmethods today is because they rely on some hacky global cache.

carver · 2019-12-16T21:20:59Z

eth/abc.py

+    @abstractmethod
+    def __init__(self, db: AtomicDatabaseAPI) -> None:
+        """
+        Initialize the context with a database.


Couple random thoughts, as I won't be able to pick up the review again until later:

I was assuming we would formalize the idea that this context can't write to the database (Maybe this should be a read-only version that's supplied?) Or if we don't then maybe it fits as a kind of parallel to ChainDB, like ConsensusDB.

"remains static" in the docs above didn't immediately mean to me what I think we want to say. Something like: this instance stays in memory across VM runs.

I was assuming we would formalize the idea that this context can't write to the database

But, we need to write to the database. E.g. Clique persists snapshots to the db.

py-evm/eth/consensus/clique/snapshot_manager.py

Line 248 in 7b94eb1

db[key] = encode_snapshot(snapshot)

Or if we don't then maybe it fits as a kind of parallel to ChainDB, like ConsensusDB

I'm not sure how that would work out. In the ChainDB case, we know the specific types that make a chain and that can be stored and retrieved. But for consensus, it seems pretty much up to the specific consensus algorithm which kind of things need persistence. Clique uses snapshots but it's hard to imagine all the different things other consensus schemes would require.

"remains static" in the docs above didn't immediately mean to me what I think we want to say. Something like: this instance stays in memory across VM runs.

👍 I'll update that.

cburgdorf · 2020-01-07T15:39:54Z

@carver just a reminder that this still needs to be reviewed.

carver

I did a full passthrough on everything but test_clique_consensus.py, which I kind of skimmed. There are a couple of places where I wish we had a better solution. Like validate_header's check_seal=True is the only reason we need to make it an instance method. Changing that requires a lot of churn, and isn't a clear enough win for me to recommend right now, and I don't have any suggestions... So whenever you're satisfied, it's time to 🚢 !

carver · 2020-01-07T19:45:13Z

eth/consensus/pow.py

+        """
+        Validate the seal on the given header by checking the proof of work.
+        """
+        check_pow(


(If there isn't one yet) Could you add a to-do issue for moving the pow cache into a consensus context?

Yes, we have that here #1900

carver · 2020-01-07T19:48:43Z

eth/tools/builder/chain/builders.py

-            chain_class_without_seal_validation.vm_configuration  # type: ignore
-        ),
-    )
+    return chain_class.configure(vm_configuration=no_pow_vms)


I love how simple/readable this got with the new arch.

This was referenced Dec 12, 2019

Christoph/refactor/clique validate header #1883

Closed

Refactor Clique Consensus to handle initialization internally #1874

Closed

cburgdorf closed this Dec 12, 2019

cburgdorf reopened this Dec 12, 2019

cburgdorf changed the title ~~Christoph/refactor/clique validate header2~~ [DRAFT Refactor Clique Dec 12, 2019

cburgdorf changed the title ~~[DRAFT Refactor Clique~~ [DRAFT] Refactor Clique Dec 12, 2019

cburgdorf commented Dec 12, 2019

View reviewed changes

cburgdorf force-pushed the christoph/refactor/clique-validate-header2 branch 4 times, most recently from ac1b2ba to 2bb85cd Compare December 16, 2019 11:41

cburgdorf mentioned this pull request Dec 16, 2019

Use ConsensusContextAPI to get rid of global cache currently used in PoW implementation. #1900

Open

cburgdorf force-pushed the christoph/refactor/clique-validate-header2 branch 2 times, most recently from 79dd2b9 to 1d2647a Compare December 16, 2019 14:08

cburgdorf changed the title ~~[DRAFT] Refactor Clique~~ Refactor consensus engine handling as well as clique internals Dec 16, 2019

cburgdorf requested a review from carver December 16, 2019 14:11

carver reviewed Dec 16, 2019

View reviewed changes

cburgdorf force-pushed the christoph/refactor/clique-validate-header2 branch 2 times, most recently from 9a13131 to 4ca1b08 Compare December 18, 2019 09:48

carver and others added 2 commits December 18, 2019 10:50

Don't touch coinbase accounts with 0 reward

624adfc

Refactor consensus handling to be more flexible and sound

b9b0908

cburgdorf force-pushed the christoph/refactor/clique-validate-header2 branch from 4ca1b08 to b9b0908 Compare December 18, 2019 09:50

cburgdorf mentioned this pull request Jan 7, 2020

Add support for Görli ethereum/trinity#1196

Merged

carver approved these changes Jan 7, 2020

View reviewed changes

cburgdorf merged commit 68ceb85 into ethereum:master Jan 8, 2020

carver mentioned this pull request Jan 15, 2020

Moving consensus engine into VM #1875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor consensus engine handling as well as clique internals #1899

Refactor consensus engine handling as well as clique internals #1899

cburgdorf commented Dec 12, 2019 •

edited

Loading

cburgdorf Dec 12, 2019

carver Dec 13, 2019

cburgdorf Dec 13, 2019

carver Dec 13, 2019

cburgdorf Dec 16, 2019

cburgdorf Dec 16, 2019

cburgdorf commented Dec 16, 2019

carver Dec 16, 2019

cburgdorf Dec 17, 2019

cburgdorf commented Jan 7, 2020

carver left a comment

carver Jan 7, 2020

cburgdorf Jan 8, 2020

carver Jan 7, 2020

Refactor consensus engine handling as well as clique internals #1899

Refactor consensus engine handling as well as clique internals #1899

Conversation

cburgdorf commented Dec 12, 2019 • edited Loading

What was wrong?

How was it fixed?

To-Do

Cute Animal Picture

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cburgdorf commented Dec 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cburgdorf commented Jan 7, 2020

carver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cburgdorf commented Dec 12, 2019 •

edited

Loading