Add `fetch_candidate_head` #265

mhchia · 2018-01-12T09:24:09Z

What was wrong?

As #242 described, get_next_log and fetch_candidate_head is needed.

How was it fixed?

get_next_log: since there is a get_new_entries() in web3.eth.filter, which gives latest "unread" logs in the RPC server, when we fetch new logs from get_new_entries(), we are sure that they are latest and unseen. Besides, because get_next_log should only get one latest log, we have to save the other logs fetched altogether from get_new_entries to a stack, and treat them as "new" logs, waiting to be chosen by get_new_log in the future. The way to maintain the stack(VMC.new_collation_added_logs) is, if get_next_log is called, first call get_new_entries and put the new logs on top of the stack(VMC.new_collation_added_logs). Then, return the log on the top, ensuring we get the latest and unseen log.

fetch_candidate_head: implement the same way as it is described in doc.md

TODOs

Use eth_getLogs over filter.xxx, maintaining new logs ourselves

Cute Animal Picture

mhchia · 2018-01-12T09:40:23Z

Note: The reason I currently use raw_log over CollationAdded.log is because bytes is only allowed to be at most 32 bytes in the log declaration in vyper. However the size of headers is far bigger than that. It should be changed once vyper fix this issue.

hwwhww · 2018-01-12T15:56:22Z

evm/vm/forks/sharding/contracts/validator_manager.v.py

+
+    # Update collations_with_score
+    self.collations_with_score[shard_id][_score][self.num_collations_with_score[shard_id][_score]] = entire_header_hash
+    self.num_collations_with_score[shard_id][_score] = self.num_collations_with_score[shard_id][_score] + 1


Hmm, I believe this PR would be merged sooner than my next PR about VMC, so I think it's good to remove all collations_with_score and num_collations_with_score codes in this PR since we're using log now. 🙂

No problem. 👍 Thank you

hwwhww · 2018-01-12T16:05:00Z

tests/sharding/test_vmc_handler.py

+        ),
+    )
+)
+def test_vmc_fetch_candidate_head(vmc,


It's better to be aligned with https://github.com/pipermerriam/ethereum-dev-tactical-manual/blob/master/style-guide.md#function-signatures

pipermerriam · 2018-01-12T21:45:04Z

evm/vm/forks/sharding/vmc_handler.py

+    # For handling logs filtering
+    # Event:
+    #   CollationAdded(indexed uint256 shard, bytes collationHeader, bool isNewHead, uint256 score)
+    collation_added_topic = "0x" + keccak(b"CollationAdded(int128,bytes4096,bool,int128)").hex()


I'd case this as COLLATION_ADDED_TOPIC to make it clear that it's a constant value.

No problem. Will do.

pipermerriam · 2018-01-12T21:46:10Z

evm/vm/forks/sharding/vmc_handler.py

+    # For handling logs filtering
+    # Event:
+    #   CollationAdded(indexed uint256 shard, bytes collationHeader, bool isNewHead, uint256 score)
+    collation_added_topic = "0x" + keccak(b"CollationAdded(int128,bytes4096,bool,int128)").hex()


You can use eth_utils.event_signature_to_log_topic to compute this topic value.

It's cool! will change to it.

pipermerriam · 2018-01-12T21:47:43Z

evm/vm/forks/sharding/vmc_handler.py

+        self.current_checking_score[shard_id] = None
+
+    @to_dict
+    def parse_collation_added_data(self, data):


This looks like a good stand-alone function that could be split off and tested in isolation.

Sorry I don't quite understand it. Do you mean it should be moved outside the class? and can be tested in isolation?

pipermerriam · 2018-01-12T22:14:17Z

evm/vm/forks/sharding/vmc_handler.py

+    def _get_new_logs(self, shard_id):
+        # use `get_new_entries` over `get_all_entries`
+        #   1. Prevent from the increasing size of logs
+        #   2. Leave the efforts maintaining `new_logs` in RPC servers


I have a pattern I've been wanting to try out that this might be a good candidate for. It's meant to emulate get_new_entries without any of the complication of installing the filter and tracking the filter_id. It looks roughly like this.

from cytoolz import ( merge, partition_all, ) def log_filter(web3, filter_params, page_size=10): # TODO: handle pending/latest/earliest from_block = filter_params['fromBlock'] to_block = filter_params['toBlock'] # TODO: handle dynamic block ranges to enforce reasonable request sizes. for block_number in range(from_block, to_block + 1): params = merge(filter_params, {'fromBlock': block_number, 'toBlock': block_number}) logs = web3.eth.getLogs(params) for page in partition_all(page_size, logs): yield page

Issue in web3.py here: ethereum/web3.py#551

I suspect you could implement a specialized version of this that would be more robust than relying on filters on the server, eliminating the need to handle things like re-installing the filter if it gets lost.

No problem.
Thank you for letting me know of this. I didn't consider the situation that filters get lost in servers before.
I will do it. :)

Since I just found getLogs has not been implemented in eth_tester, I copied and pasted some code from eth_tester.create_log_filter and eth_tester._process_block_logs to a new function eth_tester.get_logs, and made web3.py calls it. Now it seems to work fine.(still need testing).

I have one question is, for now I plan to save block_number every time I get new logs with eth_getLogs, to identify the latest block which our new logs have been synced to. Should I handle the situation when forks happen? If we take forks in mainchain into consideration, it seems to be more complicated, and we need to store more things.
Thank you for your wisdom:)

@pipermerriam I'm sorry that after thinking for a while, I still have some questions about the code.

Is it correct that we must specify the fromBlock and toBlock to ensure the logs we are going to get are in the blocks range? if so, outside the log_filter there should be something recording the range of blocks which are already queried, right?

Why not just use

for block_number in range(from_block, to_block + 1): params = merge(filter_params, {'fromBlock': block_number, 'toBlock': block_number}) logs = web3.eth.getLogs(params) for log in logs: yield log

instead of

for block_number in range(from_block, to_block + 1): params = merge(filter_params, {'fromBlock': block_number, 'toBlock': block_number}) logs = web3.eth.getLogs(params) for page in sliding_window(page_size, logs): yield page

? What is purpose to use the sliding_window?
Since there would be many overlapping between each next of log_filter, why not just use yield log?
e.g.

>>> logs = [1,2,3,4,5] >>> from cytoolz import sliding_window >>> z = sliding_window(3, logs) >>> next(z) (1, 2, 3) >>> next(z) (2, 3, 4) >>> next(z) (3, 4, 5)

Thank you so much for the directions!

I have no idea why I put sliding_window in there. I think I got my wires crossed while hammering out that example code. Will try to get you answers to other questions asap but I'm out of time today.

Instead of sliding window, it should be toolz.partition_all

hmm that makes sense.
Thank you so much.
Please take your time. It is no hurry at all!

I have one question is, for now I plan to save block_number every time I get new logs with eth_getLogs, to identify the latest block which our new logs have been synced to. Should I handle the situation when forks happen? If we take forks in mainchain into consideration, it seems to be more complicated, and we need to store more things.

Here is some untested code that won't actually work but should demonstrate the idea:

from eth_utils import ( reversed_return, ) @reversed_return def get_previous_headers(w3, history_size=256): block = w3.eth.getBlock('latest') yield block['hash'] # initialize the list of recent hashes for _ in range(history_size - 1): block = w3.eth.getBlock(block['parentHash']) yield block['hash'] # break the loop if we hit the genesis block. if block['number'] == 0: break def check_chain_head(w3, recent_hashes): block = w3.eth.getBlock('latest') new_hashes = [] for _ in range(history_size): if block['hash'] in recent_hashes: break new_hashes.append(block['hash']) block = w3.eth.getBlock(block['parentHash']) else: raise Exception('No common ancestor found for block: {0}'.format(block['hash'])) first_common_ancestor_idx = recent_hashes.index(block['hash']) unchanged_hashes = recent_hashes[:first_common_ancestor_idx] revoked_hashes = recent_hashes[first_common_ancestor_idx:] recent_hashes = recent_hashes + tuple(new_hashes) recent_hashes = recent_hashes[-1 * history_size:] return revoked_hashes, new_block_hashes

The idea is to build a utility that will track the most recent hashes in the head of the chain in such a way that it can provide you with two values.

the block hashes that were revoked from the head of the chain due to re-org

the new block hashes that we haven't yet seen

You can use the revoked hashes to roll back any logs that you processed which are no longer valid (assuming you need to do that).

You can use the new hashes to define the fromBlock and toBlock for fetching any new logs that you haven't yet seen.

Questions? Does this make sense? The code above could probably more easily be written as a class.

It's a beautiful and simple idea! In this way we can focus on the revoked_changes and unseen_and_canonical_changes, and no need to worry about the other re-orgs between them. I think I can soon implement the utility based on your code.
Still one question: Why is history_size set to 256? Why shouldn't it be larger or smaller?
I really appreciate your help and patience!

256 was a totally arbitrary number. Should probably be configurable.

pipermerriam · 2018-01-12T22:15:01Z

evm/vm/forks/sharding/vmc_handler.py

+            yield pipe(
+                log['data'],
+                decode_hex,
+                self.parse_collation_added_data,


Seems like moving the decode_hex into parse_collation_added_data would make this a lot easier to read.

yep you're right. I have changed it. Thank you:)

pipermerriam · 2018-01-12T22:18:08Z

evm/vm/forks/sharding/vmc_handler.py

+            raise FilterNotFound(
+                "CollationAdded filter haven't been set up in shard {}".format(shard_id)
+            )
+        for i in range(len(self.unchecked_collation_added_logs[shard_id]) - 1, -1, -1):


I think this can be simplified with:

for log_entry in reversed(self.unchecked_collation_added_logs[shard_id]): if log_entry == ...: ...

Thank you for pointing it out. Have changed it.

mhchia · 2018-01-18T12:52:58Z

@pipermerriam I implemented your algorithm, and made a PR for eth_getLogs ethereum/eth-tester#50
Could you help me to see whether there is anything wrong if you have spare time?
If there is no issue in that PR, I will make a PR in web3.py to enable eth_getLogs for eth_tester in web3.py
Thank you!

pipermerriam · 2018-01-18T19:49:32Z

👍 will get your eth-tester pr reviewed and merged as soon as the travis-ci log-jam clears.

pipermerriam · 2018-01-18T19:50:24Z

evm/vm/forks/sharding/log_handler.py

+)
+
+
+class LogHandler(object):


you can drop the (object) part here since we're in python3.

pipermerriam · 2018-01-18T19:51:26Z

evm/vm/forks/sharding/log_handler.py

+        # ----------> higher score
+        self.recent_block_hashes = self.get_recent_block_hashes()
+
+    def get_recent_block_hashes(self):


This would make a good stand-alone function. I believe there are other places in the code that do this exact same logic. I think one of @hwwhww 's recent PRs introduced similar code.

There's the VM. get_prev_hashes(last_block_hash, db) function for the main chain client to build the previous-256-headers-hashes-list:

py-evm/evm/vm/base.py

Lines 418 to 431 in 10d29c4

@classmethod

@to_tuple

def get_prev_hashes(cls, last_block_hash, db):

if last_block_hash == GENESIS_PARENT_HASH:

return

block_header = get_block_header_by_hash(last_block_hash, db)

for _ in range(MAX_PREV_HEADER_DEPTH):

yield block_header.hash

try:

block_header = get_parent_header(block_header, db)

except (IndexError, BlockNotFound):

break

Thank you:)

pipermerriam · 2018-01-18T19:57:20Z

evm/vm/forks/sharding/log_handler.py

+        reversed_recent_hashes = tuple(reversed(recent_hashes))
+        return reversed_recent_hashes
+
+    def check_chain_head(self):


I think this would also make a good stand-alone function, especially because it would be easy to test in isolation which would be nice.

Using this would then look something like this.

last_seen_recent_block_hashes = get_recent_block_hashes(w3, 256) # .. then we check for changes revoked_hashes, new_block_hashes = check_chain_head(w3, recent_block_hashes) new_recent_block_hashes = tuple(concatv( last_seen_recent_block_hashes[:-1 * len(revoked_hashes)], new_block_hashes, ))[-1 * 256:]

This would reduce the size of this class by a significant size and probably make the code a bit easier to parse the code which actually uses this.

Have done it. Thank you for the wise instruction.
May I ask how can I determine which functions should stay in the class, while the others should be put outside the class LogHandler? Is the reason that get_new_logs stays in the class because it is the only interface used by VMCHandler?
Is the reason check_chain_head, get_recent_block_hashes, preprocess_block_param can be put outside the class LogHandler because they only depend on web3 and can be used directly without the logics of logs?
Sorry for asking this newbie question. Just want to know what is the better criteria to design the class.

No need to apologize and I'm all for asking questions of any kind.

My general rule of thumb is the following loose collection of axioms and guidelines.

classes are typically stateful and statefullness is hard to manage.

functions (especially pure ones) are easier to reason about than classes.

a class with only one function is just a function disguised as a class. This also regularly applies to classes with only 2-3 functions.

a method that doesn't use self should almost always be a function.

it is ok to include convenience functions using @staticmethod, but normally should only be done if the convenience is external, meaning that outside actors who use this class almost always need the convenience function. If it's only needed internally, it should just be used as a stand-alone function.

Thank you so much for giving me these guidelines! They are really helpful.
Just want to conclude,
is it correct that: In general, we should use functions over classes(as pointed in https://github.com/pipermerriam/ethereum-dev-tactical-manual#the-zen-of-the-python-ethereum-team). And it seems the right timing to choose a class over functions is when we have to maintain states besides a set of functions. (and should move functions which is not necessarily resided inside the class to stand-alone functions if possible?)

pipermerriam · 2018-01-18T19:59:23Z

evm/vm/forks/sharding/log_handler.py

+            'latest': current_block_number,
+            'pending': current_block_number + 1,
+        }
+        return mapping.get(block_param, block_param)


If I'm reading this correctly, this will return block_param as-is if it isn't one of the ealiers/latest/pending. I think this would be better written as a 4 branch if/elif/else statement which cases each of earlierst/latest/pending and raises an exception in the final else since that should be an error case.

Ok no problem:) Will fix it.

pipermerriam · 2018-01-18T20:00:09Z

evm/vm/forks/sharding/log_handler.py

+        if isinstance(block_param, int):
+            return block_param
+        elif not isinstance(block_param, str):
+            raise ValueError("block parameter {0} is in wrong type".format(block_param))


Instead of saying, "wrong type" it would likely make for a more helpful error message to invert this and say "must be str type"

pipermerriam · 2018-01-18T20:05:12Z

evm/vm/forks/sharding/vmc_handler.py

+    new_collation_added_logs = {}
+    # shard_id -> list
+    # newer <---------------> older
+    unchecked_collation_added_logs = {}


Same here about mutable data structures and setting this in the __init__

pipermerriam · 2018-01-18T20:09:52Z

evm/vm/forks/sharding/vmc_handler.py

+        self.shards.add(shard_id)
+        self.unchecked_collation_added_logs[shard_id] = []
+        self.new_collation_added_logs[shard_id] = []
+        self.current_checking_score[shard_id] = None


I think that you can remove all of this initialization if you use collection.defaultdict instances for all of these three data structures.

self.unchecked_collation_added_logs = collections.defaultdict(list) self.new_collation_added_logs = collections.defaultdict(list) self.current_checking_score = {}

Then you can access both self.unchecked_collation_added_logs[shard_id] and self.new_collation_added_logs[shard_id] as if they have already been initialized thanks to the magic of defaultdicts.

This should allow you to remove all of the initialization and ensure_ initialization code.

It's a good idea. Thank you for pointing out. I will change it.

pipermerriam · 2018-01-18T20:10:38Z

evm/vm/forks/sharding/vmc_handler.py

+
+    def get_next_log(self, shard_id):
+        new_logs = self._get_new_logs(shard_id)
+        self.new_collation_added_logs[shard_id] += new_logs


I think .extend(new_logs) is the more canonical way to do this.

pipermerriam · 2018-01-18T20:17:11Z

evm/vm/forks/sharding/vmc_handler.py

+        # checking in order of oldest to most recent.
+        self.ensure_shard_variables_initialied(shard_id)
+
+        for i in reversed(range(len(self.unchecked_collation_added_logs[shard_id]))):


I think this is more readable as:

# tuple(..) call is to force the iterator to fully evaluate so that the later `.pop(..)` doesn't cause problems. unchecked_logs = tuple(reversed(enumerate(self.unchecked_collation_added_logs[shard_id]))) current_score = self.current_checking_score[shard_id] for idx, log_entry in unchecked_logs: if log_entry['score'] == current_score: return self.unchecked_collation_added_logs[shard_id].pop(idx)

It's really just moving things around, but it removes one index based lookup, as well as storing the current score in a local variable to make the inner loop more legible and easier to follow.

Thank you for the suggestion! It is way more readable. Will change it.

pipermerriam · 2018-01-18T20:19:09Z

evm/vm/forks/sharding/vmc_handler.py

        self.default_privkey = default_privkey
        self.default_sender_address = default_privkey.public_key.to_canonical_address()
        self.config = get_sharding_config()
        super().__init__(*args, **kwargs)

+    def init_shard_variables(self, shard_id):
+        self.shards.add(shard_id)
+        self.unchecked_collation_added_logs[shard_id] = []


What do you think about renaming this variable to self.unchecked_logs and self.new_logs and current_score. Should make a lot of the lines in this class way shorter and easier to read.

Will do. Actually the reason why I was naming this long name is I thought there would be multiple types of logs to be handled, so I specified the name.
But since they are not handled yet, I think it's a good idea to renaming them to shorter ones for now. Thank you.

pipermerriam

I think there are some bugs in this but I'm not sure. Can you look into some of the comments about whether unchecked_logs is correctly ordered?

pipermerriam · 2018-01-19T19:44:04Z

evm/vm/forks/sharding/vmc_handler.py

+        # older <---------------> newer
+        self.new_logs = defaultdict(list)
+        # shard_id -> list
+        # newer <---------------> older


I think this isn't accurate.

Lets say we get logs 1, 2, 3, 4, 5

Log 1 is the oldest

Log 5 is the newest

They come in as two batches, 1, 2, 3 and then when we query again after some blocks, 4, 5

In this case, I think that unchecked_logs will be 3, 2, 1, 5, 4. Maybe I'm reading the code wrong...?

I think you're right. It is not accurate. I will remove this comment.

For the algorithm fetch_candidate_head, it is from https://github.com/ethereum/sharding/blob/develop/docs/doc.md#fetch-candidate-heads-in-reverse-sorted-order .
It's idea is

The idea is that this algorithm is guaranteed to check potential head candidates in highest-to-lowest sorted order of score, with the second priority being oldest to most recent.

I am thinking one example, which follows what you assume above.
Assume we have batch1 = [1, 2, 3] and batch2 = [4, 5].
First, batch1 arrives and we have new_logs = [1, 2, 3].
Before batch2 arrives, the outcome of fetch_candidate_head will be 3, 2, 1. Anytime as long as batch2 arrives, the result of fetch_candidate_head would be the element in batch2, instead of the one in batch1.
So I think for this case the algorithm is right.

However, if we assume the batch1 = [1, 1] and batch2 = [2]. As long as batch2 arrives after batch1 and after the first fetch_candidate_head is called, the order of the outcome would be 1, 1, 2, which seems not consistent with the idea of the algorithm "The idea is that this algorithm is guaranteed to check potential head candidates in highest-to-lowest sorted order of score, with the second priority being oldest to most recent.".
I have an example code
About this I think I should confirm with @vbuterin for its detail.

The intended principle is that you want to get logs in reverse logical order, so latest block to oldest block, and within a block last transaction to first transaction, and within a transaction last to first created.

pipermerriam · 2018-01-19T19:46:05Z

evm/vm/forks/sharding/vmc_handler.py

+        # is_new_head = true log
+        while True:
+            # TODO: currently just raise when there is no log anymore
+            self.unchecked_logs[shard_id].append(self.get_next_log(shard_id))


Related to my comment about the logs potentially not being maintained in the right order, I think this is the offending line. It seems like the logs should be pushed onto the front of the list rather than apppended to the end. If this is the case, then you might consider using a collections.deque which has fast inserts and pops to both the beginning and end of the list.

Sorry about the late reply. I think I need to think for a while of it tomorrow.
Will soon give an explanation and what to revise. Thank you for pointing these out!

pipermerriam · 2018-01-19T19:49:18Z

evm/vm/forks/sharding/vmc_handler.py


-    def __init__(self, *args, default_privkey, **kwargs):
+    def __init__(self, *args, log_handler, default_privkey, **kwargs):
+        self.log_handler = log_handler


I think you're going to need one log handler per shard. Since the log handler is stateful, each time you ask for more logs, it only returns the ones for the given shard_id. If you're tracking multiple shard_ids then I think you're going to miss logs.

Ahh I didn't thought of this case. Thank you for discovering that.
Will fix it.

Actually I was thinking that we could maintain one recent_block_hashes for all shards, to share it over shards. But in this way we will need to store all logs from all shards. It seems not better than keeping one log_handler for each shard.

Yeah, one handler per shard seems cleanest despite the added overhead. We can address the performance issues later if needed.

mhchia · 2018-01-25T08:28:10Z

@pipermerriam
After discussing with @vbuterin , it seems fetch_candidate_head is fine to work in that way, if I didn't understand wrong.
Since a node calls fecth_candidate_head inside GUESS_HEAD after it find itself is a collator in some period. At that time, the node can save the current status of new_logs, unchecked_logs, and current_score. If there is a new block arriving, just restore them to the saved status, append new logs to new_logs, and re-run GUESS_HEAD.
Is this scheme acceptable to you? and do you have any suggestion on this?

Thank you!

pipermerriam · 2018-01-26T15:01:27Z

I'm good with whatever you guys have settled on regarding the proper functionality.

pipermerriam · 2018-01-26T15:07:48Z

evm/vm/forks/sharding/vmc_handler.py

+        # shard_id -> list
+        self.unchecked_logs[shard_id] = []
+        # shard_id -> score
+        self.current_score[shard_id] = None


I think that this entire class would be cleaned up by implementing one more data structure to house all of the shard data.

class ShardTracker: shard_id = ... log_handler = ... new_logs = ... unchecked_logs = ... current_score = ...

I suspect that some, if not most of the business logic for updating the shard state could then be moved into that object at which point this class would just manage all of the individual instances of that class.

Thoughts? I think it would clean this class up in a significant way.

I think that is a good idea. I will do it.
Just two questions about it.

Should I put ShardTracker in another file shard_tracker.py?

Since all logics regarding shards logs are moved to ShardTracker(including get_next_log, fetch_candidate_head), if there is a need for an API in VMCHandler like fetch_candidate_head, may I just set a same name method fetch_candidate_head in VMCHandler and simply calls ShardTracker.fetch_candidate_head? (Though I think there might only be new methods like guess_head in VMC which will be implemented in the future)

Very much appreciate!

Moreover, may I request a bumpversion in web3.py, to resolve the error in CI?
Thank you!

It's probably fine to start with it here in this file or to put it in it's own module. Up to you. shard_tracker.py seems like a fine choice as well.

If this makes VMCHandler obsolete and you think you're able to do what you need with only the ShardTracker class, then it's probably a good idea to do away with it. Really depends on how these APIs will be used and whether you need an abstraction to track many shards and whether it still makes sense for that to be a class.

@pipermerriam Thank you for the great suggestions! I have moved those logics to ShardTracker, currently in the same module vmc_handler as VMCHandler.
For now I just make get_next_log, fetch_candidate_head directly calls the corresponding shard_tracker. I think I can determine what to remove and how to restructure between ShardTracker and VMCHandler after GUESS_HEAD and CREATE_COLLATION are implemented, which will be done in the next stage.

pipermerriam · 2018-01-26T17:08:35Z

I believe you can bump the web3 version locally here in your branch to fix CI

pipermerriam

only required changes are removing the use of mutable data structures as class attributes. Everything else is cleanup. This looks very close to mergable.

pipermerriam · 2018-01-29T16:52:26Z

evm/vm/forks/sharding/vmc_handler.py

@@ -40,6 +40,10 @@ class NextLogUnavailable(Exception):
    pass


+class ShardNotTracked(Exception):


Alternate name suggestion: UnknownShard

It's better, Thank you:)

pipermerriam · 2018-01-29T16:54:14Z

evm/vm/forks/sharding/vmc_handler.py

-        self.log_handlers[shard_id] = log_handler
-        self.init_log_variables(shard_id)
+        self.new_logs = []
+        self.unchecked_logs = []


These should be set in the constructor (and set to None here in the class definition). Mutable data structures as class attributes are bad, as they are shared across all instances of the class.

I think they are in constructor already?
I think setting them as None is a good idea.
Is the purpose of doing this is to explicitly show their existence?

Ignore my comment. I didn't see that this was within __init__.py (thought I was looking at the class body).

The reason for setting them to None on the class is entirely convention to improve readability. I find it helpful to be able to look at the class definition and get a full view of all of the various properties the class has. It's harder to gather that information if you have to dig it out of code, even simple code like this in the constructor.

pipermerriam · 2018-01-29T17:15:46Z

evm/vm/forks/sharding/vmc_handler.py

-            self.unchecked_logs[shard_id].append(self.get_next_log(shard_id))
-            if self.unchecked_logs[shard_id][-1]['is_new_head'] is True:
+            self.unchecked_logs.append(self.get_next_log())
+            if self.unchecked_logs[-1]['is_new_head'] is True:


If i'm reading this logic right.

retrieve the new log with get_next_log()

append to self.unchecked_logs

if newest log is_new_head == True

break from loop and pop off newest log.

I think this is simpler written as:

while True: log_entry = self.get_new_log() if log_entry['is_new_head']: break else: self.unchecked_logs.append(log_entry) ...

It avoids appending and then popping by only appending in the case that the new log isn't the new head.

you're right. Thank you for pointing it out.
I've modified the code.

pipermerriam · 2018-01-29T17:18:26Z

evm/vm/forks/sharding/vmc_handler.py

+
+    # TODO: currently just calls `shard_tracker.get_next_log`
+    def get_next_log(self, shard_id):
+        if shard_id not in self.shard_trackers:


Instead of repeating this logic, I would suggest using self.get_shard_tracker(shard_id) here to retrieve the tracker.

Thank you for pointing out.
Will change it.

pipermerriam · 2018-01-29T17:18:58Z

evm/vm/forks/sharding/vmc_handler.py

+
+    # TODO: currently just calls `shard_tracker.fetch_candidate_head`
+    def fetch_candidate_head(self, shard_id):
+        if shard_id not in self.shard_trackers:


Same here about removing this logic in favor of using self.get_shard_tracker(shard_id)

Will change it. Thank you:)

pipermerriam · 2018-01-29T17:22:32Z

evm/vm/forks/sharding/log_handler.py

+            yield 'topics', topics
+
+    def get_new_logs(self, address=None, topics=None):
+        revoked_hashes, new_block_hashes = check_chain_head(


I may have already asked this question. Does something need to be done with the revoked hashes as well? Should the tracker be doing anything with these to revert anything in the event that one of it's log entries for which is updated it's state is revoked?

Actually I'm not sure about this. Please let me think about it more thoroughly. Sorry about this.
IMO now, If we just want to mimic the behavior of the version using filter.get_new_entries, I think we don't need to do the revoke.
Do you have any suggestion? Thank you!

Unfortunately, I'm not familiar enough with the underlying algorithm to be able to provide a suggestion on what the right thing to do here is. Just noticed that the revoked data wasn't used and wanted be sure that wasn't an oversight. Feel free to mark it with a TODO comment and open an issue to track it and we can merge with the understanding that we'll need to address it down the road.

That's fine. Thank you so much for keeping update with this part.
Yeah you're right it is worth keeping notice of this. I marked it with TODO and opened the issue #344.

Since most of them are test-related, we leave only `create_vmc_tx` and `get_vmc_json` in vmc_utils.py, which specify the vmc tx and the contract path, and move the others to `test_vmc_handler.py`

TODO - Still needs to test `fetch_candidate_head` - Refactor `fixture.py`

- Use event_signature_to_log_topic - Modify parse_collation_added_data - Change the range in `fetch_candidate_head` to reversed

TODO: - Add `filter_log`

And change `preprocess_block_param` to if-else

And fix defaultdict(None) to defaultdict(lambda: None)

new_collation_added_logs -> new_logs unchecked_collation_added_logs -> unchecked_logs current_checking_score -> current_score

- check_chain_head - get_recent_block_hashes - preprocess_block_param

- Maybe we can use one `log_handler` for all shards - Re-run `GUESS_HEAD` when every new block arrives, to solve the problem of `fetch_candidate_head`

- Modify `fetch_candidate_head`, to avoid unnecessary `push` and `pop` when the log is new head - Explicitly declare the logs variables in ShardTracker - Modify `ShardNotTracked` to `UnknownShard`

hwwhww · 2018-01-31T17:56:12Z

Mind writing a brief document in the PR description?

hwwhww · 2018-01-31T18:03:06Z

tests/sharding/fixtures.py

-    to_canonical_address,
-    to_checksum_address,
-)
-
 from evm.vm.forks.spurious_dragon.transactions import (
    SpuriousDragonTransaction,


Just curious, why use SpuriousDragonTransaction instead of ByzantiumTransaction?

Thank you for pointing out!
I will change it.

hwwhww · 2018-01-31T18:33:08Z

evm/vm/forks/sharding/log_handler.py

+    logger = logging.getLogger("evm.chain.sharding.LogHandler")
+
+    def __init__(self, w3, history_size=256):
+        self.history_size = history_size


For LogHandler or higher levels (ShardTracker, VMC) APIs, is the LogHandler.history_size a fixed number of previous block hashes/headers capacity from the head block and it is only set in initialization?
If it's true, we may need a more flexible method to enable us to query for older block hashes (more than previous 256 block headers) dynamically during guess_head algorithm. The reason is that we want to verify as many collations as we can.

mhchia changed the title ~~Sharding vmc handler handle logs~~ Add get_next_log and fetch_candidate_head Jan 12, 2018

mhchia added Needs Review eth2.0 labels Jan 12, 2018

mhchia changed the title ~~Add get_next_log and fetch_candidate_head~~ Add fetch_candidate_head Jan 12, 2018

hwwhww reviewed Jan 12, 2018

View reviewed changes

pipermerriam reviewed Jan 12, 2018

View reviewed changes

mhchia force-pushed the sharding_vmc_handler_handle_logs branch from befaced to edaa738 Compare January 13, 2018 09:33

mhchia mentioned this pull request Jan 18, 2018

Add get_logs for RPC eth_getLogs ethereum/eth-tester#50

Merged

pipermerriam requested changes Jan 18, 2018

View reviewed changes

pipermerriam reviewed Jan 19, 2018

View reviewed changes

mhchia force-pushed the sharding_vmc_handler_handle_logs branch from a7eeab3 to 953bf0a Compare January 23, 2018 15:58

hwwhww mentioned this pull request Jan 23, 2018

Interaction between shard chain client and main chain node #258

Closed

hwwhww force-pushed the sharding branch from c0c1008 to 7a1db70 Compare January 25, 2018 08:23

mhchia force-pushed the sharding_vmc_handler_handle_logs branch from 953bf0a to 20989f0 Compare January 25, 2018 15:44

pipermerriam reviewed Jan 26, 2018

View reviewed changes

pipermerriam requested changes Jan 29, 2018

View reviewed changes

pipermerriam reviewed Jan 29, 2018

View reviewed changes

mhchia added 7 commits January 30, 2018 20:05

Fix flake8

f37f7b5

Move functions to tests from vmc_utils.py

564c226

Since most of them are test-related, we leave only `create_vmc_tx` and `get_vmc_json` in vmc_utils.py, which specify the vmc tx and the contract path, and move the others to `test_vmc_handler.py`

Modify the log emitting: CollationAdded

85e40ed

Add get_new_logs and tests

5f870b9

Add get_next_log and fetch_candidate_head

6efc916

TODO - Still needs to test `fetch_candidate_head` - Refactor `fixture.py`

Remove unneeded comments

1c18906

Add test_vmc_fetch_candidate_head

3b14004

mhchia added 20 commits January 30, 2018 20:05

Fix errors due to rebase

fb8c8da

Several modifications

10fdcf8

- Use event_signature_to_log_topic - Modify parse_collation_added_data - Change the range in `fetch_candidate_head` to reversed

Fix flake8

6d67a0b

Remove unneeded score records in vmc

cefd7d6

Add log_handler.py and tests

8411a9b

TODO: - Add `filter_log`

Add get_new_logs to LogHandler and tests

bfc37b1

Change the criteria for is_new_head

221a27f

Use log_handler over filter.get_new_entries

0caae3f

Add the mistakenly removed setting of shard_head

ede8d2a

Fix validator_manager.json

148cca6

Fix bug: init logs list members in constructor

9a25f29

Remove LogHandler.filter_logs

d05e581

And change `preprocess_block_param` to if-else

Make fetch_candidate_head more readable

bc52a28

And fix defaultdict(None) to defaultdict(lambda: None)

Rename logs recording members in vmc_handler

3566094

new_collation_added_logs -> new_logs unchecked_collation_added_logs -> unchecked_logs current_checking_score -> current_score

Move several functions out of log_handler

3696dd9

- check_chain_head - get_recent_block_hashes - preprocess_block_param

Create one log_handler for each shard

c9bca94

Add TODO comment in vmc_handler

9fe7273

- Maybe we can use one `log_handler` for all shards - Re-run `GUESS_HEAD` when every new block arrives, to solve the problem of `fetch_candidate_head`

Fix flake8

0097537

Add new class ShardTracker, maintaining shard heads

e089ff4

Clean up the code

360ea86

- Modify `fetch_candidate_head`, to avoid unnecessary `push` and `pop` when the log is new head - Explicitly declare the logs variables in ShardTracker - Modify `ShardNotTracked` to `UnknownShard`

mhchia force-pushed the sharding_vmc_handler_handle_logs branch from 8683a38 to 360ea86 Compare January 31, 2018 16:24

pipermerriam approved these changes Jan 31, 2018

View reviewed changes

hwwhww reviewed Jan 31, 2018

View reviewed changes

mhchia added 2 commits February 1, 2018 17:27

Use ByzantiumTransaction instead of SpuriousDragon

e6f9f2e

Add TODO label for revoked_hashes

3ee86a4

mhchia mentioned this pull request Feb 2, 2018

Make sure whether to do something with revoked_hashes returned by check_chain_head #344

Closed

mhchia merged commit 8171501 into ethereum:sharding Feb 2, 2018

mhchia mentioned this pull request Feb 2, 2018

Add get_next_log and fetch_candidate_head #242

Closed

mhchia deleted the sharding_vmc_handler_handle_logs branch February 28, 2018 08:42

	@classmethod
	@to_tuple
	def get_prev_hashes(cls, last_block_hash, db):
	if last_block_hash == GENESIS_PARENT_HASH:
	return

	block_header = get_block_header_by_hash(last_block_hash, db)

	for _ in range(MAX_PREV_HEADER_DEPTH):
	yield block_header.hash
	try:
	block_header = get_parent_header(block_header, db)
	except (IndexError, BlockNotFound):
	break

		@@ -40,6 +40,10 @@ class NextLogUnavailable(Exception):
		pass


		class ShardNotTracked(Exception):

Add fetch_candidate_head #265

Add fetch_candidate_head #265

Uh oh!

Conversation

mhchia commented Jan 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was wrong?

How was it fixed?

TODOs

Cute Animal Picture

Uh oh!

mhchia commented Jan 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhchia Jan 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pipermerriam Jan 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhchia Jan 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhchia Jan 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhchia Jan 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pipermerriam Jan 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhchia Jan 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhchia commented Jan 18, 2018

Uh oh!

pipermerriam commented Jan 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Add `fetch_candidate_head` #265

Add `fetch_candidate_head` #265

mhchia commented Jan 12, 2018 •

edited

Loading

mhchia commented Jan 12, 2018 •

edited

Loading

mhchia Jan 13, 2018 •

edited

Loading

pipermerriam Jan 12, 2018 •

edited

Loading

mhchia Jan 13, 2018 •

edited

Loading

mhchia Jan 14, 2018 •

edited

Loading

mhchia Jan 15, 2018 •

edited

Loading

pipermerriam Jan 16, 2018 •

edited

Loading

mhchia Jan 16, 2018 •

edited

Loading