StateDownloader now handles data requests from peers #987

gsalgado · 2018-07-04T09:49:48Z

I noticed a low peer count during a state sync (which makes it quite slow) and it turns out that's because peers attempt to fetch data from us and disconnect as we fail to reply. This should ensure we keep a high peer count, speeding up the state sync significantly

Depends on #980

pipermerriam · 2018-07-05T05:02:07Z

The other two PRs this is based on have been 👍

I'll review this once those are merged and this has been rebased (hard to parse as is)

gsalgado · 2018-07-05T07:45:09Z

Rebased

pipermerriam

I feel reasonably strongly about my suggestion for a PeerRequestHandler API of some sort to reduce the complexity of the various call sites which handle peer requests. Let me know your thoughts.

pipermerriam · 2018-07-09T17:40:57Z

p2p/chain.py

+            type(block_number_or_hash),
+        )
+
+    limit = max(max_headers, eth.MAX_HEADERS_FETCH)


What do you think about splitting this part of the body (the pure part) into a stand alone utility function for which we can write some simple tests.

I'd prefer to save that for another PR, as I'm just shuffling this code around in this one

pipermerriam · 2018-07-09T17:45:34Z

p2p/chain.py

-
-        headers = [header async for header in self._generate_available_headers(block_numbers)]
+        headers = await lookup_headers(
+            self.db, header_request['block_number_or_hash'], header_request['max_headers'],


These call signatures are quite large. What do you think about something like the following.

class PeerRequestHandler: def __init__(self, db, logger, cancel_token): self.db = db ... def lookup_headers(self, ...): return lookup_headers(db=self.db, ..., logger=self.logger, token=self.cancel_token) ...

Then the BaseChainSyncer classes can just set an instance of this class as a local variable and the call turns into headers = await self.handler.lookup_headers(bn_or_hash, max_headers, skip, reverse) and eliminates the extra 3 arguments that are always the same.

Yeah, that's a nice improvement

pipermerriam · 2018-07-09T17:46:17Z

p2p/chain.py

        elif isinstance(cmd, eth.GetReceipts):
-            await self._handle_get_receipts(peer, cast(List[Hash32], msg))
+            await handle_get_receipts(


In conjunction with my previous comment, these would now be await self.handler.get_receipts(peer, ...)

pipermerriam · 2018-07-09T17:50:49Z

p2p/chain.py

+        logger: logging.Logger, token: CancelToken) -> None:
+    nodes = []
+    # Only serve up to eth.MAX_STATE_FETCH items in every request.
+    for node_hash in node_hashes[:eth.MAX_STATE_FETCH]:


My gut says that the truncation of which hashes we retrieve should happen at a higher level where this function is actually called, and that this function should blindly return all of the data that was requested.

I think this is a pretty minor thing, but architecturally, I think it is more correct.

simplifies this function to be less complex.

separates different classes of business logic.

allows this function to be used for larger retrieval sizes (if that ever becomes a requirement)

That makes sense, but this is where we call the sub-proto method that actually sends the data, so would need to extract this part into a separate unit that just returns the trie nodes. It'd also have to become a generator otherwise malicious peers could easily DoS us as we'd be always retrieving all the data from the DB, regardless of the limit.

How would you feel about just adding a new limit argument to the handle_*() methods for now, and having them default to the current values we use? That way we don't need to break these into even smaller methods (which is not necessary yet), but we keep them flexible and easy to refactor should it become necessary

My intention was not to truncate the return value, but to truncate node_hashes before it was passed into this function.

# at the call-site trie_data = self._handler.handle_get_node_data(..., node_hashes=requested_hashes[:MAX_STATE_FETCH])

I think I prefer this over the limit argument, but both would be fine.

Oh, I see! Went with your proposed solution

Those are the methods related to handling GetBlockHeader requests from peers, which are also needed in StateDownloader, so turned into funcs

Turned the ChainSyncer methods that do that into standalone funcs so they could be reused in StateDownloader

pipermerriam · 2018-07-10T17:21:50Z

p2p/chain.py

+        chaindb = cast('AsyncChainDB', self.db)
+        bodies = []
+        # Only serve up to eth.MAX_BODIES_FETCH items in every request.
+        for block_hash in block_hashes[:eth.MAX_BODIES_FETCH]:


Same thought here about moving the limits up a level to the call site of this function.

bodies = self._handler.handle_get_block_bodies(peer, requested_block_hashes[:MAX_BODIES_FETCH])

pipermerriam · 2018-07-10T17:22:20Z

p2p/chain.py

+        self.logger = logger
+        self.cancel_token = token
+
+    async def handle_get_block_bodies(self, peer: ETHPeer, block_hashes: List[Hash32]) -> None:


Since the class has the name Handler what do you think about dropping the handle_ prefix for these methods?

Well, if I drop the handle_ prefix the names will kinda suggest the methods are getters (e.g. get_block_bodies()), so I'd rather keep the prefix

pipermerriam · 2018-07-10T17:23:54Z

p2p/chain.py

+        # Only serve up to eth.MAX_BODIES_FETCH items in every request.
+        for block_hash in block_hashes[:eth.MAX_BODIES_FETCH]:
+            try:
+                header = await wait_with_token(


Maybe put the self.wait(...) API on this class. Maybe we can introduce that as a mixin class that

class Cancelable: cancel_token: CancelToken = None def wait(...): ... def wait_first(...): ...

Or Just inline a duplication of the wait function from our Service class with a comment/issue to remove the duplication at some point.

I like the mixin idea! Just thought I'd add a Mixin suffix to its name to make that clear, though

pipermerriam · 2018-07-10T17:36:02Z

p2p/chain.py

+
+    async def _get_block_numbers_for_request(
+            self, block_number_or_hash: Union[int, bytes], max_headers: int,
+            skip: int, reverse: bool) -> List[BlockNumber]:


Mostly unrelated. I think go-ethereum recently patched a bug which exploited overflowing their integer with very large skip values. While we don't suffer from the same overflow problems, it makes me think that we should enforce a reasonable upper bound on the skip size.

gsalgado requested a review from pipermerriam July 4, 2018 14:24

gsalgado force-pushed the state-downloader-serve-data branch from bc445f8 to 1b01e14 Compare July 5, 2018 07:34

gsalgado force-pushed the state-downloader-serve-data branch from 1b01e14 to 6dc934d Compare July 5, 2018 08:56

pipermerriam requested changes Jul 9, 2018

View reviewed changes

gsalgado added 3 commits July 9, 2018 20:46

Turn a couple ChainSyncer methods into standalone funcs

f04d51e

Those are the methods related to handling GetBlockHeader requests from peers, which are also needed in StateDownloader, so turned into funcs

StateDownloader now handles GetBlockHeaders requests from peers

f3b2d4c

StateDownloader now handles block body/receipt and trie requests

ee315b7

Turned the ChainSyncer methods that do that into standalone funcs so they could be reused in StateDownloader

gsalgado force-pushed the state-downloader-serve-data branch from 6dc934d to ee315b7 Compare July 9, 2018 20:20

Move request-handling funcs into new PeerRequestHandler

5eb2a11

pipermerriam approved these changes Jul 10, 2018

View reviewed changes

More PR feedback changes

42560e8

gsalgado merged commit 4e05590 into ethereum:master Jul 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StateDownloader now handles data requests from peers #987

StateDownloader now handles data requests from peers #987

gsalgado commented Jul 4, 2018 •

edited

Loading

pipermerriam commented Jul 5, 2018

gsalgado commented Jul 5, 2018

pipermerriam left a comment

pipermerriam Jul 9, 2018

gsalgado Jul 10, 2018

pipermerriam Jul 9, 2018

gsalgado Jul 10, 2018

pipermerriam Jul 9, 2018

pipermerriam Jul 9, 2018

gsalgado Jul 10, 2018

pipermerriam Jul 10, 2018

gsalgado Jul 10, 2018

pipermerriam Jul 10, 2018

pipermerriam Jul 10, 2018

gsalgado Jul 10, 2018

pipermerriam Jul 10, 2018

gsalgado Jul 10, 2018

pipermerriam Jul 10, 2018

StateDownloader now handles data requests from peers #987

StateDownloader now handles data requests from peers #987

Conversation

gsalgado commented Jul 4, 2018 • edited Loading

pipermerriam commented Jul 5, 2018

gsalgado commented Jul 5, 2018

pipermerriam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsalgado commented Jul 4, 2018 •

edited

Loading