Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chain ancestry hash storage #19319

Closed
wants to merge 2 commits into from
Closed

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Mar 23, 2019

This PR implements a hash storage, which is a fixed-size slice (currently set to 288 x 32 bytes) that is used to store ancestor hashes. When a new block is imported, it is added to the storage.

Reorgs are handled internally, by just swapping out the hash at the given number and wiping information about descendants (an O(1) operation)

This makes fetching ancestry hashes from a given reference header an O(1) operation. In the case that all hashes are not availalbe, iteration can be done from the earliest point.

This PR also adds a new mehtod to the ChainContext interface, to expose this method to the EVM.
The lookup is currently only used by the blockhash operation, but can also be used instead of iterating when we receive a GetBlockHeadersMsg from a peer.

The structure is not threadsafe, but all current uses of it happens from within already mutex:ed places. For use in p2p communication, we'd probably need to make it larger. It's at 8k now, but even storing the last million hashes would only take up 32M of memory.

I'll follow up with charts later. This PR is an alternative to #19291

@csplawn918
Copy link

Amazing solution great thinking. Problem solver a very important gift. Thank you for your commit---ment. Your fellow problem solver

@holiman
Copy link
Contributor Author

holiman commented Mar 25, 2019

This PR totally removes BLOCKHASH from the top 25. This PR is run3, and run2 is #19291. and run1 is current master.

run1 total-bars-4
run2 total-bars-4
run3 total-bars-4

As for why SMOD jumps up, I believe it's one of the least common opcodes, and thus the stats about that one has an extreme variance.

run1 total-bars-3
run2 total-bars-3
run3 total-bars-3

run1 total-bars-2
run2 total-bars-2
run3 total-bars-2

@holiman
Copy link
Contributor Author

holiman commented Mar 25, 2019

blockash-run1
blockash-run2
blockash-run3

@holiman holiman changed the title WIP: Chain ancestry hash storage chain ancestry hash storage Mar 25, 2019
@holiman
Copy link
Contributor Author

holiman commented Mar 25, 2019

@zsfelfoldi is this something that the LES client could make use of ?

Copy link
Contributor

@Matthalp-zz Matthalp-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a circular buffer to cache block hash lookups makes sense to me (if the great performance results weren't enough!). I've left a few comments that you can feel free to address. There are some other nits, but figured I would start with more high-level questions/comments.

}
if target > number {
// Should never happen
log.Error("Ancestor number must be <= descendant", "ancestor target", target, "descendant", number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this code base's convention is to use panic for these scenarios. If it were me, I would have introduced a second error return value, but that's not the way this code base handles error conditions like these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a caller error if so. The blockhash opcode has internal guards to prevent letting through a too-recent number, but I don't want to place a panic there -- if it's used from the p2p layer, I don't want a flaw in that code to crash the client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It may be worth adding a comment to this method to make it explicit that this method is designed solely for BLOCKHASH so future contributors know the scope of this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it can be used for other purposes, however, you can't ask "give me the ancestor at 105 for [block 100, hash H] and expect anything other than zeroes....

core/hashstorage.go Outdated Show resolved Hide resolved
core/hashstorage.go Outdated Show resolved Hide resolved
}

// Newest returns the most recent (number, hash) stored
func (hs *HashStorage) Newest() (uint64, common.Hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a future use for this method? From what I can tell it's only used in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used in tests -- but it's very useful in those, and might be useful in the future

@@ -1736,3 +1743,47 @@ func (bc *BlockChain) SubscribeLogsEvent(ch chan<- []*types.Log) event.Subscript
func (bc *BlockChain) SubscribeBlockProcessingEvent(ch chan<- bool) event.Subscription {
return bc.scope.Track(bc.blockProcFeed.Subscribe(ch))
}

// GetAncestorHash return the hash of ancestor at the given number
func (bc *BlockChain) GetAncestorHash(ref *types.Header, target uint64) common.Hash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to put this logic in HeaderChain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should probably wind up there instead. Right now, however, it's only used for EVM execution of BLOCKHASH, and I was more confident in how the blockchain works than how the headerchain works.

If this is becomes used from any other place, like p2p, then it should definitely go into headerchain instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for clarifying.

core/blockchain.go Outdated Show resolved Hide resolved
number = header.Number.Uint64() - 1
hash = header.ParentHash
}
return hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that there are cases where hash will not correspond to target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If we get an old sidechain. If hashhistory contains e..g blocks [1000 .. 1288], and we import sideblock on 1100 which requests block 900. Then the hash history can be used to jump back to block 1000, but the remaining 100 headers are iterated by this loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Because to get here we have already established we are on the canonical chain. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've established we are on the canonical chain, doesn't this mean that the for loop can be replaced with rawdb.ReadCanonicalHash (or a wrapper around it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but it creates the invariant that the hash storage is used to track canon blocks. Right now, there's no such invariant within the hash storage itself

I'll think about it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canonical blocks will be the common case, so its something to consider.

@holiman
Copy link
Contributor Author

holiman commented Mar 27, 2019

Thanks for reviewing!

core/blockchain.go Outdated Show resolved Hide resolved
// fast lookups for EVM execution. This should probably be improved if this
// method becomes more used.
func (hc *HeaderChain) GetAncestorHash(ref *types.Header, target uint64) common.Hash {
number := ref.Number.Uint64() - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked if this method is ever used on HeaderChain (or LightChain, but if it's not then I would consider not including this code. The only place I could see it was maybe used was to adhere to the ChainContext interface, but neither of these chains support transaction processing (which is what the ChainContext says it is for).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they're not used, but they had to have it to provide ChainContext. I didn't investigate further why, but these are totally unused methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light clients can actually execute transactions on demand (i.e. CALL via RPC). It might be slow since it needs to pull state from remote nodes, but it should nonetheless be able to, so you do need the method.

core/hashstorage.go Outdated Show resolved Hide resolved
}

// Create a new storage with a header in it.
func NewHashStorage(hdr *types.Header) *HashStorage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to me that to create a data structure, we already need an item to put into it. Why tie the two together? Can't we just have a constructor and a setter? Also, perhaps we could make the number of hashes it can contain a constructor paramter instead of hard coding it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Also a name nit: header is already pretty short that hdr is on that shortness-readability line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that since it's a circular buffer, it simplifies things if it cannot be empty. Then we have to add checks to distinguish between if it's full or empty, basically.

core/hashstorage_test.go Outdated Show resolved Hide resolved
core/hashstorage.go Outdated Show resolved Hide resolved
core/hashstorage.go Outdated Show resolved Hide resolved
@Matthalp-zz
Copy link
Contributor

Moving to [hashBufferElems]common.Hash LGTM

core/hashbuffer.go Outdated Show resolved Hide resolved
core/hashbuffer.go Outdated Show resolved Hide resolved
core/hashbuffer.go Outdated Show resolved Hide resolved
core/hashbuffer.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor Author

holiman commented Apr 3, 2019

rebased to fix conflicts.
I'm not quite certain where to add it, if we want to move it into headerchain. The headerchain has WriteHeader, but that one is only called from Blockchain.InsertHeaderChain, which is not called during regular block import

core/hashbuffer.go Outdated Show resolved Hide resolved
core/hashbuffer.go Outdated Show resolved Hide resolved
core/hashbuffer.go Outdated Show resolved Hide resolved
core/hashbuffer.go Outdated Show resolved Hide resolved
core/hashbuffer.go Outdated Show resolved Hide resolved
@fjl fjl added this to the 1.9.7 milestone Oct 2, 2019
@holiman holiman changed the title [WIP] chain ancestry hash storage chain ancestry hash storage Oct 29, 2019
@karalabe karalabe modified the milestones: 1.9.7, 1.9.8 Nov 8, 2019
@holiman
Copy link
Contributor Author

holiman commented Nov 19, 2019

Squashed and rebased on master

@karalabe karalabe modified the milestones: 1.9.8, 1.9.9 Nov 27, 2019
@karalabe karalabe modified the milestones: 1.9.9, 1.9.10 Dec 6, 2019
@karalabe karalabe modified the milestones: 1.9.10, 1.9.11 Jan 21, 2020
@karalabe karalabe modified the milestones: 1.9.11, 1.9.12 Feb 18, 2020
@karalabe karalabe modified the milestones: 1.9.12, 1.9.13 Mar 16, 2020
@karalabe karalabe modified the milestones: 1.9.13, 1.9.14 Apr 20, 2020
@adamschmideg adamschmideg removed this from the 1.9.14 milestone May 12, 2020
@holiman holiman closed this Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants