-
Notifications
You must be signed in to change notification settings - Fork 686
Add fetch_candidate_head
#265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fetch_candidate_head
#265
Conversation
get_next_log
and fetch_candidate_head
Note: The reason I currently use |
get_next_log
and fetch_candidate_head
fetch_candidate_head
|
||
# Update collations_with_score | ||
self.collations_with_score[shard_id][_score][self.num_collations_with_score[shard_id][_score]] = entire_header_hash | ||
self.num_collations_with_score[shard_id][_score] = self.num_collations_with_score[shard_id][_score] + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I believe this PR would be merged sooner than my next PR about VMC, so I think it's good to remove all collations_with_score
and num_collations_with_score
codes in this PR since we're using log now. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. 👍 Thank you
tests/sharding/test_vmc_handler.py
Outdated
), | ||
) | ||
) | ||
def test_vmc_fetch_candidate_head(vmc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to be aligned with https://github.com/pipermerriam/ethereum-dev-tactical-manual/blob/master/style-guide.md#function-signatures
evm/vm/forks/sharding/vmc_handler.py
Outdated
# For handling logs filtering | ||
# Event: | ||
# CollationAdded(indexed uint256 shard, bytes collationHeader, bool isNewHead, uint256 score) | ||
collation_added_topic = "0x" + keccak(b"CollationAdded(int128,bytes4096,bool,int128)").hex() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd case this as COLLATION_ADDED_TOPIC
to make it clear that it's a constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Will do.
evm/vm/forks/sharding/vmc_handler.py
Outdated
# For handling logs filtering | ||
# Event: | ||
# CollationAdded(indexed uint256 shard, bytes collationHeader, bool isNewHead, uint256 score) | ||
collation_added_topic = "0x" + keccak(b"CollationAdded(int128,bytes4096,bool,int128)").hex() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use eth_utils.event_signature_to_log_topic
to compute this topic value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cool! will change to it.
evm/vm/forks/sharding/vmc_handler.py
Outdated
self.current_checking_score[shard_id] = None | ||
|
||
@to_dict | ||
def parse_collation_added_data(self, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good stand-alone function that could be split off and tested in isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't quite understand it. Do you mean it should be moved outside the class? and can be tested in isolation?
evm/vm/forks/sharding/vmc_handler.py
Outdated
def _get_new_logs(self, shard_id): | ||
# use `get_new_entries` over `get_all_entries` | ||
# 1. Prevent from the increasing size of logs | ||
# 2. Leave the efforts maintaining `new_logs` in RPC servers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a pattern I've been wanting to try out that this might be a good candidate for. It's meant to emulate get_new_entries
without any of the complication of installing the filter and tracking the filter_id
. It looks roughly like this.
from cytoolz import (
merge,
partition_all,
)
def log_filter(web3, filter_params, page_size=10):
# TODO: handle pending/latest/earliest
from_block = filter_params['fromBlock']
to_block = filter_params['toBlock']
# TODO: handle dynamic block ranges to enforce reasonable request sizes.
for block_number in range(from_block, to_block + 1):
params = merge(filter_params, {'fromBlock': block_number, 'toBlock': block_number})
logs = web3.eth.getLogs(params)
for page in partition_all(page_size, logs):
yield page
Issue in web3.py here: ethereum/web3.py#551
I suspect you could implement a specialized version of this that would be more robust than relying on filters on the server, eliminating the need to handle things like re-installing the filter if it gets lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
Thank you for letting me know of this. I didn't consider the situation that filters get lost in servers before.
I will do it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I just found getLogs
has not been implemented in eth_tester, I copied and pasted some code from eth_tester.create_log_filter
and eth_tester._process_block_logs
to a new function eth_tester.get_logs
, and made web3.py
calls it. Now it seems to work fine.(still need testing).
I have one question is, for now I plan to save block_number
every time I get new logs with eth_getLogs
, to identify the latest block which our new logs have been synced to. Should I handle the situation when forks happen? If we take forks in mainchain into consideration, it seems to be more complicated, and we need to store more things.
Thank you for your wisdom:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pipermerriam I'm sorry that after thinking for a while, I still have some questions about the code.
- Is it correct that we must specify the
fromBlock
andtoBlock
to ensure the logs we are going to get are in the blocks range? if so, outside thelog_filter
there should be something recording the range of blocks which are already queried, right? - Why not just use
for block_number in range(from_block, to_block + 1):
params = merge(filter_params, {'fromBlock': block_number, 'toBlock': block_number})
logs = web3.eth.getLogs(params)
for log in logs:
yield log
instead of
for block_number in range(from_block, to_block + 1):
params = merge(filter_params, {'fromBlock': block_number, 'toBlock': block_number})
logs = web3.eth.getLogs(params)
for page in sliding_window(page_size, logs):
yield page
? What is purpose to use the sliding_window
?
Since there would be many overlapping between each next
of log_filter
, why not just use yield log
?
e.g.
>>> logs = [1,2,3,4,5]
>>> from cytoolz import sliding_window
>>> z = sliding_window(3, logs)
>>> next(z)
(1, 2, 3)
>>> next(z)
(2, 3, 4)
>>> next(z)
(3, 4, 5)
Thank you so much for the directions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why I put sliding_window
in there. I think I got my wires crossed while hammering out that example code. Will try to get you answers to other questions asap but I'm out of time today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of sliding window, it should be toolz.partition_all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm that makes sense.
Thank you so much.
Please take your time. It is no hurry at all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question is, for now I plan to save
block_number
every time I get new logs witheth_getLogs
, to identify the latest block which our new logs have been synced to. Should I handle the situation when forks happen? If we take forks in mainchain into consideration, it seems to be more complicated, and we need to store more things.
Here is some untested code that won't actually work but should demonstrate the idea:
from eth_utils import (
reversed_return,
)
@reversed_return
def get_previous_headers(w3, history_size=256):
block = w3.eth.getBlock('latest')
yield block['hash']
# initialize the list of recent hashes
for _ in range(history_size - 1):
block = w3.eth.getBlock(block['parentHash'])
yield block['hash']
# break the loop if we hit the genesis block.
if block['number'] == 0:
break
def check_chain_head(w3, recent_hashes):
block = w3.eth.getBlock('latest')
new_hashes = []
for _ in range(history_size):
if block['hash'] in recent_hashes:
break
new_hashes.append(block['hash'])
block = w3.eth.getBlock(block['parentHash'])
else:
raise Exception('No common ancestor found for block: {0}'.format(block['hash']))
first_common_ancestor_idx = recent_hashes.index(block['hash'])
unchanged_hashes = recent_hashes[:first_common_ancestor_idx]
revoked_hashes = recent_hashes[first_common_ancestor_idx:]
recent_hashes = recent_hashes + tuple(new_hashes)
recent_hashes = recent_hashes[-1 * history_size:]
return revoked_hashes, new_block_hashes
The idea is to build a utility that will track the most recent hashes in the head of the chain in such a way that it can provide you with two values.
- the block hashes that were revoked from the head of the chain due to re-org
- the new block hashes that we haven't yet seen
You can use the revoked hashes to roll back any logs that you processed which are no longer valid (assuming you need to do that).
You can use the new hashes to define the fromBlock
and toBlock
for fetching any new logs that you haven't yet seen.
Questions? Does this make sense? The code above could probably more easily be written as a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a beautiful and simple idea! In this way we can focus on the revoked_changes
and unseen_and_canonical_changes
, and no need to worry about the other re-orgs between them. I think I can soon implement the utility based on your code.
Still one question: Why is history_size
set to 256
? Why shouldn't it be larger or smaller?
I really appreciate your help and patience!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
256
was a totally arbitrary number. Should probably be configurable.
evm/vm/forks/sharding/vmc_handler.py
Outdated
yield pipe( | ||
log['data'], | ||
decode_hex, | ||
self.parse_collation_added_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like moving the decode_hex
into parse_collation_added_data
would make this a lot easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep you're right. I have changed it. Thank you:)
evm/vm/forks/sharding/vmc_handler.py
Outdated
raise FilterNotFound( | ||
"CollationAdded filter haven't been set up in shard {}".format(shard_id) | ||
) | ||
for i in range(len(self.unchecked_collation_added_logs[shard_id]) - 1, -1, -1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified with:
for log_entry in reversed(self.unchecked_collation_added_logs[shard_id]):
if log_entry == ...:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing it out. Have changed it.
befaced
to
edaa738
Compare
@pipermerriam I implemented your algorithm, and made a PR for eth_getLogs ethereum/eth-tester#50 |
👍 will get your |
evm/vm/forks/sharding/log_handler.py
Outdated
) | ||
|
||
|
||
class LogHandler(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can drop the (object)
part here since we're in python3.
evm/vm/forks/sharding/log_handler.py
Outdated
# ----------> higher score | ||
self.recent_block_hashes = self.get_recent_block_hashes() | ||
|
||
def get_recent_block_hashes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make a good stand-alone function. I believe there are other places in the code that do this exact same logic. I think one of @hwwhww 's recent PRs introduced similar code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's the VM. get_prev_hashes(last_block_hash, db)
function for the main chain client to build the previous-256-headers-hashes-list:
Lines 418 to 431 in 10d29c4
@classmethod | |
@to_tuple | |
def get_prev_hashes(cls, last_block_hash, db): | |
if last_block_hash == GENESIS_PARENT_HASH: | |
return | |
block_header = get_block_header_by_hash(last_block_hash, db) | |
for _ in range(MAX_PREV_HEADER_DEPTH): | |
yield block_header.hash | |
try: | |
block_header = get_parent_header(block_header, db) | |
except (IndexError, BlockNotFound): | |
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you:)
evm/vm/forks/sharding/log_handler.py
Outdated
reversed_recent_hashes = tuple(reversed(recent_hashes)) | ||
return reversed_recent_hashes | ||
|
||
def check_chain_head(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would also make a good stand-alone function, especially because it would be easy to test in isolation which would be nice.
Using this would then look something like this.
last_seen_recent_block_hashes = get_recent_block_hashes(w3, 256)
# .. then we check for changes
revoked_hashes, new_block_hashes = check_chain_head(w3, recent_block_hashes)
new_recent_block_hashes = tuple(concatv(
last_seen_recent_block_hashes[:-1 * len(revoked_hashes)],
new_block_hashes,
))[-1 * 256:]
This would reduce the size of this class by a significant size and probably make the code a bit easier to parse the code which actually uses this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have done it. Thank you for the wise instruction.
May I ask how can I determine which functions should stay in the class, while the others should be put outside the class LogHandler
? Is the reason that get_new_logs
stays in the class because it is the only interface used by VMCHandler
?
Is the reason check_chain_head
, get_recent_block_hashes
, preprocess_block_param
can be put outside the class LogHandler
because they only depend on web3
and can be used directly without the logics of logs?
Sorry for asking this newbie question. Just want to know what is the better criteria to design the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to apologize and I'm all for asking questions of any kind.
My general rule of thumb is the following loose collection of axioms and guidelines.
- classes are typically stateful and statefullness is hard to manage.
- functions (especially pure ones) are easier to reason about than classes.
- a class with only one function is just a function disguised as a class. This also regularly applies to classes with only 2-3 functions.
- a method that doesn't use
self
should almost always be a function. - it is ok to include convenience functions using
@staticmethod
, but normally should only be done if the convenience is external, meaning that outside actors who use this class almost always need the convenience function. If it's only needed internally, it should just be used as a stand-alone function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for giving me these guidelines! They are really helpful.
Just want to conclude,
is it correct that: In general, we should use functions over classes(as pointed in https://github.com/pipermerriam/ethereum-dev-tactical-manual#the-zen-of-the-python-ethereum-team). And it seems the right timing to choose a class over functions is when we have to maintain states besides a set of functions. (and should move functions which is not necessarily resided inside the class to stand-alone functions if possible?)
evm/vm/forks/sharding/log_handler.py
Outdated
'latest': current_block_number, | ||
'pending': current_block_number + 1, | ||
} | ||
return mapping.get(block_param, block_param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly, this will return block_param
as-is if it isn't one of the ealiers/latest/pending
. I think this would be better written as a 4 branch if/elif/else
statement which cases each of earlierst/latest/pending
and raises an exception in the final else
since that should be an error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok no problem:) Will fix it.
evm/vm/forks/sharding/log_handler.py
Outdated
if isinstance(block_param, int): | ||
return block_param | ||
elif not isinstance(block_param, str): | ||
raise ValueError("block parameter {0} is in wrong type".format(block_param)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of saying, "wrong type" it would likely make for a more helpful error message to invert this and say "must be str
type"
evm/vm/forks/sharding/vmc_handler.py
Outdated
new_collation_added_logs = {} | ||
# shard_id -> list | ||
# newer <---------------> older | ||
unchecked_collation_added_logs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about mutable data structures and setting this in the __init__
evm/vm/forks/sharding/vmc_handler.py
Outdated
self.shards.add(shard_id) | ||
self.unchecked_collation_added_logs[shard_id] = [] | ||
self.new_collation_added_logs[shard_id] = [] | ||
self.current_checking_score[shard_id] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can remove all of this initialization if you use collection.defaultdict
instances for all of these three data structures.
self.unchecked_collation_added_logs = collections.defaultdict(list)
self.new_collation_added_logs = collections.defaultdict(list)
self.current_checking_score = {}
Then you can access both self.unchecked_collation_added_logs[shard_id]
and self.new_collation_added_logs[shard_id]
as if they have already been initialized thanks to the magic of defaultdicts.
This should allow you to remove all of the initialization and ensure_
initialization code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea. Thank you for pointing out. I will change it.
evm/vm/forks/sharding/vmc_handler.py
Outdated
|
||
def get_next_log(self, shard_id): | ||
new_logs = self._get_new_logs(shard_id) | ||
self.new_collation_added_logs[shard_id] += new_logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think .extend(new_logs)
is the more canonical way to do this.
evm/vm/forks/sharding/vmc_handler.py
Outdated
# checking in order of oldest to most recent. | ||
self.ensure_shard_variables_initialied(shard_id) | ||
|
||
for i in reversed(range(len(self.unchecked_collation_added_logs[shard_id]))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more readable as:
# tuple(..) call is to force the iterator to fully evaluate so that the later `.pop(..)` doesn't cause problems.
unchecked_logs = tuple(reversed(enumerate(self.unchecked_collation_added_logs[shard_id])))
current_score = self.current_checking_score[shard_id]
for idx, log_entry in unchecked_logs:
if log_entry['score'] == current_score:
return self.unchecked_collation_added_logs[shard_id].pop(idx)
It's really just moving things around, but it removes one index based lookup, as well as storing the current score in a local variable to make the inner loop more legible and easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! It is way more readable. Will change it.
evm/vm/forks/sharding/vmc_handler.py
Outdated
self.default_privkey = default_privkey | ||
self.default_sender_address = default_privkey.public_key.to_canonical_address() | ||
self.config = get_sharding_config() | ||
super().__init__(*args, **kwargs) | ||
|
||
def init_shard_variables(self, shard_id): | ||
self.shards.add(shard_id) | ||
self.unchecked_collation_added_logs[shard_id] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about renaming this variable to self.unchecked_logs
and self.new_logs
and current_score
. Should make a lot of the lines in this class way shorter and easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Actually the reason why I was naming this long name is I thought there would be multiple types of logs to be handled, so I specified the name.
But since they are not handled yet, I think it's a good idea to renaming them to shorter ones for now. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some bugs in this but I'm not sure. Can you look into some of the comments about whether unchecked_logs
is correctly ordered?
evm/vm/forks/sharding/vmc_handler.py
Outdated
# older <---------------> newer | ||
self.new_logs = defaultdict(list) | ||
# shard_id -> list | ||
# newer <---------------> older |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't accurate.
- Lets say we get logs
1, 2, 3, 4, 5
- Log 1 is the oldest
- Log 5 is the newest
- They come in as two batches,
1, 2, 3
and then when we query again after some blocks,4, 5
In this case, I think that unchecked_logs
will be 3, 2, 1, 5, 4
. Maybe I'm reading the code wrong...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. It is not accurate. I will remove this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the algorithm fetch_candidate_head
, it is from https://github.com/ethereum/sharding/blob/develop/docs/doc.md#fetch-candidate-heads-in-reverse-sorted-order .
It's idea is
The idea is that this algorithm is guaranteed to check potential head candidates in highest-to-lowest sorted order of score, with the second priority being oldest to most recent.
I am thinking one example, which follows what you assume above.
Assume we have batch1 = [1, 2, 3]
and batch2 = [4, 5]
.
First, batch1
arrives and we have new_logs = [1, 2, 3]
.
Before batch2
arrives, the outcome of fetch_candidate_head
will be 3
, 2
, 1
. Anytime as long as batch2
arrives, the result of fetch_candidate_head
would be the element in batch2
, instead of the one in batch1
.
So I think for this case the algorithm is right.
However, if we assume the batch1 = [1, 1]
and batch2 = [2]
. As long as batch2 arrives after batch1
and after the first fetch_candidate_head
is called, the order of the outcome would be 1, 1, 2
, which seems not consistent with the idea of the algorithm "The idea is that this algorithm is guaranteed to check potential head candidates in highest-to-lowest sorted order of score, with the second priority being oldest to most recent.".
I have an example code
About this I think I should confirm with @vbuterin for its detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intended principle is that you want to get logs in reverse logical order, so latest block to oldest block, and within a block last transaction to first transaction, and within a transaction last to first created.
evm/vm/forks/sharding/vmc_handler.py
Outdated
# is_new_head = true log | ||
while True: | ||
# TODO: currently just raise when there is no log anymore | ||
self.unchecked_logs[shard_id].append(self.get_next_log(shard_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment about the logs potentially not being maintained in the right order, I think this is the offending line. It seems like the logs should be pushed onto the front of the list rather than apppended to the end. If this is the case, then you might consider using a collections.deque
which has fast inserts and pops to both the beginning and end of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the late reply. I think I need to think for a while of it tomorrow.
Will soon give an explanation and what to revise. Thank you for pointing these out!
evm/vm/forks/sharding/vmc_handler.py
Outdated
|
||
def __init__(self, *args, default_privkey, **kwargs): | ||
def __init__(self, *args, log_handler, default_privkey, **kwargs): | ||
self.log_handler = log_handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're going to need one log handler per shard. Since the log handler is stateful, each time you ask for more logs, it only returns the ones for the given shard_id
. If you're tracking multiple shard_id
s then I think you're going to miss logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I didn't thought of this case. Thank you for discovering that.
Will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I was thinking that we could maintain one recent_block_hashes
for all shards, to share it over shards. But in this way we will need to store all logs from all shards. It seems not better than keeping one log_handler for each shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, one handler per shard seems cleanest despite the added overhead. We can address the performance issues later if needed.
a7eeab3
to
953bf0a
Compare
@pipermerriam Thank you! |
953bf0a
to
20989f0
Compare
I'm good with whatever you guys have settled on regarding the proper functionality. |
evm/vm/forks/sharding/vmc_handler.py
Outdated
# shard_id -> list | ||
self.unchecked_logs[shard_id] = [] | ||
# shard_id -> score | ||
self.current_score[shard_id] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this entire class would be cleaned up by implementing one more data structure to house all of the shard data.
class ShardTracker:
shard_id = ...
log_handler = ...
new_logs = ...
unchecked_logs = ...
current_score = ...
I suspect that some, if not most of the business logic for updating the shard state could then be moved into that object at which point this class would just manage all of the individual instances of that class.
Thoughts? I think it would clean this class up in a significant way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good idea. I will do it.
Just two questions about it.
- Should I put
ShardTracker
in another fileshard_tracker.py
? - Since all logics regarding shards logs are moved to
ShardTracker
(includingget_next_log
,fetch_candidate_head
), if there is a need for an API inVMCHandler
likefetch_candidate_head
, may I just set a same name methodfetch_candidate_head
inVMCHandler
and simply callsShardTracker.fetch_candidate_head
? (Though I think there might only be new methods likeguess_head
in VMC which will be implemented in the future)
Very much appreciate!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, may I request a bumpversion in web3.py
, to resolve the error in CI?
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
It's probably fine to start with it here in this file or to put it in it's own module. Up to you.
shard_tracker.py
seems like a fine choice as well. -
If this makes
VMCHandler
obsolete and you think you're able to do what you need with only theShardTracker
class, then it's probably a good idea to do away with it. Really depends on how these APIs will be used and whether you need an abstraction to track many shards and whether it still makes sense for that to be a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pipermerriam Thank you for the great suggestions! I have moved those logics to ShardTracker
, currently in the same module vmc_handler
as VMCHandler
.
For now I just make get_next_log
, fetch_candidate_head
directly calls the corresponding shard_tracker
. I think I can determine what to remove and how to restructure between ShardTracker
and VMCHandler
after GUESS_HEAD
and CREATE_COLLATION
are implemented, which will be done in the next stage.
I believe you can bump the web3 version locally here in your branch to fix CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only required changes are removing the use of mutable data structures as class attributes. Everything else is cleanup. This looks very close to mergable.
evm/vm/forks/sharding/vmc_handler.py
Outdated
@@ -40,6 +40,10 @@ class NextLogUnavailable(Exception): | |||
pass | |||
|
|||
|
|||
class ShardNotTracked(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternate name suggestion: UnknownShard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better, Thank you:)
evm/vm/forks/sharding/vmc_handler.py
Outdated
self.log_handlers[shard_id] = log_handler | ||
self.init_log_variables(shard_id) | ||
self.new_logs = [] | ||
self.unchecked_logs = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be set in the constructor (and set to None
here in the class definition). Mutable data structures as class attributes are bad, as they are shared across all instances of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are in constructor already?
I think setting them as None
is a good idea.
Is the purpose of doing this is to explicitly show their existence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore my comment. I didn't see that this was within __init__.py
(thought I was looking at the class body).
The reason for setting them to None
on the class is entirely convention to improve readability. I find it helpful to be able to look at the class definition and get a full view of all of the various properties the class has. It's harder to gather that information if you have to dig it out of code, even simple code like this in the constructor.
evm/vm/forks/sharding/vmc_handler.py
Outdated
self.unchecked_logs[shard_id].append(self.get_next_log(shard_id)) | ||
if self.unchecked_logs[shard_id][-1]['is_new_head'] is True: | ||
self.unchecked_logs.append(self.get_next_log()) | ||
if self.unchecked_logs[-1]['is_new_head'] is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i'm reading this logic right.
- retrieve the new log with
get_next_log()
- append to
self.unchecked_logs
- if newest log
is_new_head == True
- break from loop and pop off newest log.
I think this is simpler written as:
while True:
log_entry = self.get_new_log()
if log_entry['is_new_head']:
break
else:
self.unchecked_logs.append(log_entry)
...
It avoids appending and then popping by only appending in the case that the new log isn't the new head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. Thank you for pointing it out.
I've modified the code.
evm/vm/forks/sharding/vmc_handler.py
Outdated
|
||
# TODO: currently just calls `shard_tracker.get_next_log` | ||
def get_next_log(self, shard_id): | ||
if shard_id not in self.shard_trackers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of repeating this logic, I would suggest using self.get_shard_tracker(shard_id)
here to retrieve the tracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out.
Will change it.
evm/vm/forks/sharding/vmc_handler.py
Outdated
|
||
# TODO: currently just calls `shard_tracker.fetch_candidate_head` | ||
def fetch_candidate_head(self, shard_id): | ||
if shard_id not in self.shard_trackers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about removing this logic in favor of using self.get_shard_tracker(shard_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change it. Thank you:)
evm/vm/forks/sharding/log_handler.py
Outdated
yield 'topics', topics | ||
|
||
def get_new_logs(self, address=None, topics=None): | ||
revoked_hashes, new_block_hashes = check_chain_head( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have already asked this question. Does something need to be done with the revoked hashes as well? Should the tracker be doing anything with these to revert anything in the event that one of it's log entries for which is updated it's state is revoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not sure about this. Please let me think about it more thoroughly. Sorry about this.
IMO now, If we just want to mimic the behavior of the version using filter.get_new_entries
, I think we don't need to do the revoke.
Do you have any suggestion? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I'm not familiar enough with the underlying algorithm to be able to provide a suggestion on what the right thing to do here is. Just noticed that the revoked data wasn't used and wanted be sure that wasn't an oversight. Feel free to mark it with a TODO
comment and open an issue to track it and we can merge with the understanding that we'll need to address it down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Thank you so much for keeping update with this part.
Yeah you're right it is worth keeping notice of this. I marked it with TODO
and opened the issue #344.
Since most of them are test-related, we leave only `create_vmc_tx` and `get_vmc_json` in vmc_utils.py, which specify the vmc tx and the contract path, and move the others to `test_vmc_handler.py`
TODO - Still needs to test `fetch_candidate_head` - Refactor `fixture.py`
- Use event_signature_to_log_topic - Modify parse_collation_added_data - Change the range in `fetch_candidate_head` to reversed
TODO: - Add `filter_log`
And change `preprocess_block_param` to if-else
And fix defaultdict(None) to defaultdict(lambda: None)
new_collation_added_logs -> new_logs unchecked_collation_added_logs -> unchecked_logs current_checking_score -> current_score
- check_chain_head - get_recent_block_hashes - preprocess_block_param
- Maybe we can use one `log_handler` for all shards - Re-run `GUESS_HEAD` when every new block arrives, to solve the problem of `fetch_candidate_head`
- Modify `fetch_candidate_head`, to avoid unnecessary `push` and `pop` when the log is new head - Explicitly declare the logs variables in ShardTracker - Modify `ShardNotTracked` to `UnknownShard`
8683a38
to
360ea86
Compare
Mind writing a brief document in the PR description? |
tests/sharding/fixtures.py
Outdated
to_canonical_address, | ||
to_checksum_address, | ||
) | ||
|
||
from evm.vm.forks.spurious_dragon.transactions import ( | ||
SpuriousDragonTransaction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why use SpuriousDragonTransaction
instead of ByzantiumTransaction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out!
I will change it.
logger = logging.getLogger("evm.chain.sharding.LogHandler") | ||
|
||
def __init__(self, w3, history_size=256): | ||
self.history_size = history_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LogHandler
or higher levels (ShardTracker
, VMC
) APIs, is the LogHandler.history_size
a fixed number of previous block hashes/headers capacity from the head block and it is only set in initialization?
If it's true, we may need a more flexible method to enable us to query for older block hashes (more than previous 256 block headers) dynamically during guess_head
algorithm. The reason is that we want to verify as many collations as we can.
What was wrong?
As #242 described,
get_next_log
andfetch_candidate_head
is needed.How was it fixed?
get_next_log
: since there is aget_new_entries()
inweb3.eth.filter
, which gives latest "unread" logs in the RPC server, when we fetch new logs fromget_new_entries()
, we are sure that they are latest and unseen. Besides, becauseget_next_log
should only get one latest log, we have to save the other logs fetched altogether fromget_new_entries
to a stack, and treat them as "new" logs, waiting to be chosen byget_new_log
in the future. The way to maintain the stack(VMC.new_collation_added_logs
) is, ifget_next_log
is called, first callget_new_entries
and put the new logs on top of the stack(VMC.new_collation_added_logs
). Then, return the log on the top, ensuring we get the latest and unseen log.fetch_candidate_head
: implement the same way as it is described in doc.mdTODOs
eth_getLogs
overfilter.xxx
, maintaining new logs ourselvesCute Animal Picture