-
Notifications
You must be signed in to change notification settings - Fork 20.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
triedb/pathdb: track flat state changes in pathdb (snapshot integration pt 2) #30643
base: master
Are you sure you want to change the base?
triedb/pathdb: track flat state changes in pathdb (snapshot integration pt 2) #30643
Conversation
9918288
to
2288a96
Compare
This pull request ports some changes from the main state snapshot integration one, specifically introducing the flat state tracking in pathdb. Note, the tracked flat state changes are only held in memory and won't be persisted in the disk. Meanwhile, the correspoding state retrieval in persistent state is also not supported yet. The states management in disk is more complicated and will be implemented in a separate pull request.
522b841
to
7169f52
Compare
// Keep track of whether the account has already been marked as destructed. | ||
// This additional marker is useful for undoing the merge operation. | ||
_, exist := s.destructSet[accountHash] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can it already be marked as destructed? I guess the only case would be
tx 1
creates contract at addressC
viaCREATE2
tx 2
self-destructsC
tx 3
creates contract at addressC
viaCREATE2
tx 4
self-destructsC
- ... and so on ...
Is this more or less the scenario that make you do the whole journalling thingy? Or are there other cornercases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two sets belong to different blocks.
In block A, the account x was destructed, and then resurrected;
In block B, the account x was destructed again;
It's a valid operation before the Cancun fork (in which the self-destruction is disabled).
destructs = append(destructs, destruct{ | ||
Hash: accountHash, | ||
Exist: exist, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite see why each change needs to be appended here. Don't they either all get reverted or none?
The semantics of merge
usually means that "we now much everything into one" -- so this need to keep a journal of events makes me confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the destruction set merging process, we track details for each operation, specifically noting whether the associated account has already been marked as destructed. This information is stored in the stateSet and is crucial during reverts.
During a revert, we need to undo changes made to the destruction set, and this flag helps distinguish between two cases:
(a) If the account was already marked as destructed before the merge operation (i.e., stateSet.destructSet[addr] exists), the undo operation should not remove this entry from stateSet.destructSet.
(b) If the account was not previously marked as destructed (i.e., stateSet.destructSet[addr] does not exist), the undo operation should remove this entry from stateSet.destructSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not "see" it. Can you outline a scenario where this matters. I mean in terms like this:
Block A: contract x is deleted.
Block B: contract x is recreated
Block C: contract x is deleted
Block D: contract x is recreated
Now we merge B
into A
, stateset AB
. And then merge C
into AB
, stateset ABC
. Then revert ?
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block A: contract x is deleted, and then recreated
Block B: contract x is deleted, and then recreated
We will have set A (destruct[x] = true, account[x] = x_1)
We will have set B (destruct[x] = true, account[x] = x_2)
We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)
If we want to revert B from AB, then reverted set should be ((destruct[x] = true, account[x] = x_1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another case,
Block A: contract x is updated
Block B: contract x is deleted, and then recreated
We will have set A (destruct[x] = false, account[x] = x_1)
We will have set B (destruct[x] = true, account[x] = x_2)
We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)
If we want to revert B from AB, then reverted set should be ((destruct[x] = false, account[x] = x_1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first case,
We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)
When merging B
into A
, so that the "end state" represents all changes in A and B, shouldn't we wind up with (destruct[x] = false, account[x] = x_2)
-- that is: the account is not destructed. The destruct+recreate simply becomes a modification.
In the second case.
We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)
In this instance, again, update + delete + recreate
should IMO become update
.
That's how I think about the merge
operation. Doing it this way means that the semantics are clear. And the same goes for slots, doesn't it?
If, the following happens for slot X
: 2->0
, 0->1
, it "might aswell" have gone 2->1
. We don't care about all intermediate transitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When merging B into A, so that the "end state" represents all changes in A and B, shouldn't we wind up with (destruct[x] = false, account[x] = x_2) -- that is: the account is not destructed. The destruct+recreate simply becomes a modification.
The AB should be (destruct[x] = true, account[x] = x_2)
. The destruct[x] = true
means the original account x was destructed in set AB and all the state access to x should be forbidden. account[x] = x_2
means the account with address x (it's different with the original one) is created within the set AB.
The destruct is actually a flag indicating the account with this address has been removed. Please don't check the state in the deeper level (e.g. disk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
means the original account x was destructed in set AB and all the state access to x should be forbidden
This is what I don't understand. What is the difference between
- an original account
X
that has been destructed and recreated, and - a modified account
X
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1. the set should be destruct[x] = true, account[x] = v
For 2. the set should be destruct[x] = false, account[x] = v
For case 3, the account is deleted in the block, it should be destruct[x] = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The destruct[addr]
is only marked if this account is deleted within the block.
The account[addr]
is only marked if this account is modified/created within the block.
These two flags can be combined
triedb/pathdb/states.go
Outdated
slots := make(map[common.Hash][]byte) | ||
for storageHash, data := range storage { | ||
slots[storageHash] = data | ||
delta += 2*common.HashLength + len(data) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it weren't for the delta-tracking, you could do
slots := make(map[common.Hash][]byte) | |
for storageHash, data := range storage { | |
slots[storageHash] = data | |
delta += 2*common.HashLength + len(data) | |
} | |
slots := maps.Clone(storage) |
A lot of the code is like that: mostly keeping track of count. I guess there's not much to do about that... :/
The data
here is the rlp-encoding of the slot value (with zeroes trimmed, and nil
for the all-zero-value), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the code is like that: mostly keeping track of count. I guess there's not much to do about that... :/
Yeah, we have to traverse the set due to the size tracking, unfortunately.
The data here is the rlp-encoding of the slot value (with zeroes trimmed, and nil for the all-zero-value), right?
Correct
// revert takes the original value of accounts and storages as input and reverts | ||
// the latest state transition applied on the state set. | ||
func (s *stateSet) revert(accountOrigin map[common.Hash][]byte, storageOrigin map[common.Hash]map[common.Hash][]byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the name matches what I understand when I look at the code. Isn't it more like revertTo
, where you give it a set of accounts/slots, and say "hey reset yourself to these values, ignore anything else" ?
It doesn't revert the given values, it uses the given values to reset the internal state. (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the name matches what I understand when I look at the code. Isn't it more like revertTo, where you give it a set of accounts/slots, and say "hey reset yourself to these values, ignore anything else" ?
True, the supplied account and storage set is the "original" values of these mutated states and we want to reset them to the old value.
} | ||
|
||
// encode serializes the content of state set into the provided writer. | ||
func (s *stateSet) encode(w io.Writer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically do not use rlp like this, encoding four times with this type of manual steps. How come you don't use something more auto-generated?
Is there some optimization at play here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about optimization at all (also performance is not important here).
It's more like a hand-written encode rules. In StateSet, it has several maps and maps are not suitable/supported for RLP encoding.
Therefore, these code tells RLP encoder how to pack the data.
This pull request ports some changes from the main state snapshot integration one, specifically introducing the flat state tracking in pathdb.
Note, the tracked flat state changes are only held in memory and won't be persisted in the disk. Meanwhile, the correspoding state retrieval in persistent state is also not supported yet. The states management in disk is more complicated and will be implemented in a separate pull request.