Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triedb/pathdb: track flat state changes in pathdb (snapshot integration pt 2) #30643

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rjl493456442
Copy link
Member

This pull request ports some changes from the main state snapshot integration one, specifically introducing the flat state tracking in pathdb.

Note, the tracked flat state changes are only held in memory and won't be persisted in the disk. Meanwhile, the correspoding state retrieval in persistent state is also not supported yet. The states management in disk is more complicated and will be implemented in a separate pull request.

triedb/database/database.go Show resolved Hide resolved
triedb/pathdb/disklayer.go Show resolved Hide resolved
This pull request ports some changes from the main state snapshot
integration one, specifically introducing the flat state tracking
in pathdb.

Note, the tracked flat state changes are only held in memory and
won't be persisted in the disk. Meanwhile, the correspoding state
retrieval in persistent state is also not supported yet. The states
management in disk is more complicated and will be implemented in
a separate pull request.
@holiman holiman changed the title triedb/pathdb: track flat state changes in pathdb triedb/pathdb: track flat state changes in pathdb (snapshot integration pt 2) Oct 23, 2024
Comment on lines +297 to +299
// Keep track of whether the account has already been marked as destructed.
// This additional marker is useful for undoing the merge operation.
_, exist := s.destructSet[accountHash]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it already be marked as destructed? I guess the only case would be

  1. tx 1 creates contract at address C via CREATE2
  2. tx 2 self-destructs C
  3. tx 3 creates contract at address C via CREATE2
  4. tx 4 self-destructs C
  5. ... and so on ...

Is this more or less the scenario that make you do the whole journalling thingy? Or are there other cornercases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two sets belong to different blocks.

In block A, the account x was destructed, and then resurrected;
In block B, the account x was destructed again;

It's a valid operation before the Cancun fork (in which the self-destruction is disabled).

Comment on lines +300 to +303
destructs = append(destructs, destruct{
Hash: accountHash,
Exist: exist,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see why each change needs to be appended here. Don't they either all get reverted or none?

The semantics of merge usually means that "we now much everything into one" -- so this need to keep a journal of events makes me confused.

Copy link
Member Author

@rjl493456442 rjl493456442 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the destruction set merging process, we track details for each operation, specifically noting whether the associated account has already been marked as destructed. This information is stored in the stateSet and is crucial during reverts.

During a revert, we need to undo changes made to the destruction set, and this flag helps distinguish between two cases:

(a) If the account was already marked as destructed before the merge operation (i.e., stateSet.destructSet[addr] exists), the undo operation should not remove this entry from stateSet.destructSet.

(b) If the account was not previously marked as destructed (i.e., stateSet.destructSet[addr] does not exist), the undo operation should remove this entry from stateSet.destructSet.

Copy link
Contributor

@holiman holiman Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not "see" it. Can you outline a scenario where this matters. I mean in terms like this:

Block A: contract x is deleted.
Block B: contract x is recreated
Block C: contract x is deleted
Block D: contract x is recreated

Now we merge B into A, stateset AB. And then merge C into AB, stateset ABC. Then revert ?...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block A: contract x is deleted, and then recreated
Block B: contract x is deleted, and then recreated

We will have set A (destruct[x] = true, account[x] = x_1)
We will have set B (destruct[x] = true, account[x] = x_2)

We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)

If we want to revert B from AB, then reverted set should be ((destruct[x] = true, account[x] = x_1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another case,

Block A: contract x is updated
Block B: contract x is deleted, and then recreated

We will have set A (destruct[x] = false, account[x] = x_1)
We will have set B (destruct[x] = true, account[x] = x_2)

We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)

If we want to revert B from AB, then reverted set should be ((destruct[x] = false, account[x] = x_1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first case,

We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)

When merging B into A, so that the "end state" represents all changes in A and B, shouldn't we wind up with (destruct[x] = false, account[x] = x_2) -- that is: the account is not destructed. The destruct+recreate simply becomes a modification.

In the second case.

We merge B into A, then AB is (destruct[x] = true, account[x] = x_2)

In this instance, again, update + delete + recreate should IMO become update.

That's how I think about the merge operation. Doing it this way means that the semantics are clear. And the same goes for slots, doesn't it?

If, the following happens for slot X: 2->0, 0->1, it "might aswell" have gone 2->1. We don't care about all intermediate transitions.

Copy link
Member Author

@rjl493456442 rjl493456442 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When merging B into A, so that the "end state" represents all changes in A and B, shouldn't we wind up with (destruct[x] = false, account[x] = x_2) -- that is: the account is not destructed. The destruct+recreate simply becomes a modification.

The AB should be (destruct[x] = true, account[x] = x_2). The destruct[x] = true means the original account x was destructed in set AB and all the state access to x should be forbidden. account[x] = x_2 means the account with address x (it's different with the original one) is created within the set AB.

The destruct is actually a flag indicating the account with this address has been removed. Please don't check the state in the deeper level (e.g. disk).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

means the original account x was destructed in set AB and all the state access to x should be forbidden

This is what I don't understand. What is the difference between

  1. an original account X that has been destructed and recreated, and
  2. a modified account X.

Copy link
Member Author

@rjl493456442 rjl493456442 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1. the set should be destruct[x] = true, account[x] = v
For 2. the set should be destruct[x] = false, account[x] = v
For case 3, the account is deleted in the block, it should be destruct[x] = true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destruct[addr] is only marked if this account is deleted within the block.
The account[addr] is only marked if this account is modified/created within the block.

These two flags can be combined

triedb/pathdb/states.go Outdated Show resolved Hide resolved
Comment on lines 331 to 335
slots := make(map[common.Hash][]byte)
for storageHash, data := range storage {
slots[storageHash] = data
delta += 2*common.HashLength + len(data)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it weren't for the delta-tracking, you could do

Suggested change
slots := make(map[common.Hash][]byte)
for storageHash, data := range storage {
slots[storageHash] = data
delta += 2*common.HashLength + len(data)
}
slots := maps.Clone(storage)

A lot of the code is like that: mostly keeping track of count. I guess there's not much to do about that... :/

The data here is the rlp-encoding of the slot value (with zeroes trimmed, and nil for the all-zero-value), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the code is like that: mostly keeping track of count. I guess there's not much to do about that... :/

Yeah, we have to traverse the set due to the size tracking, unfortunately.

The data here is the rlp-encoding of the slot value (with zeroes trimmed, and nil for the all-zero-value), right?

Correct

triedb/pathdb/states.go Outdated Show resolved Hide resolved
triedb/pathdb/states.go Outdated Show resolved Hide resolved
Comment on lines +357 to +359
// revert takes the original value of accounts and storages as input and reverts
// the latest state transition applied on the state set.
func (s *stateSet) revert(accountOrigin map[common.Hash][]byte, storageOrigin map[common.Hash]map[common.Hash][]byte) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the name matches what I understand when I look at the code. Isn't it more like revertTo, where you give it a set of accounts/slots, and say "hey reset yourself to these values, ignore anything else" ?

It doesn't revert the given values, it uses the given values to reset the internal state. (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the name matches what I understand when I look at the code. Isn't it more like revertTo, where you give it a set of accounts/slots, and say "hey reset yourself to these values, ignore anything else" ?

True, the supplied account and storage set is the "original" values of these mutated states and we want to reset them to the old value.

}

// encode serializes the content of state set into the provided writer.
func (s *stateSet) encode(w io.Writer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically do not use rlp like this, encoding four times with this type of manual steps. How come you don't use something more auto-generated?
Is there some optimization at play here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about optimization at all (also performance is not important here).

It's more like a hand-written encode rules. In StateSet, it has several maps and maps are not suitable/supported for RLP encoding.

Therefore, these code tells RLP encoder how to pack the data.

triedb/pathdb/states.go Outdated Show resolved Hide resolved
triedb/pathdb/states.go Outdated Show resolved Hide resolved
triedb/pathdb/states.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants