-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth/protocols/eth: implement eth/69 #29158
base: master
Are you sure you want to change the base?
Conversation
d748b1c
to
cde3bc1
Compare
Please link to the protocol specification in the PR description. |
eth/protocols/eth/handlers.go
Outdated
continue | ||
} | ||
} else { | ||
block := chain.GetBlockByHash(hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chain.GetBlockByHash
GetBlockByHash retrieves a block from the database by hash, caching it if found.
(emphasis mine). Are we sure we want to do that? I mean, maybe, I don't know, but worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there are basically two paths here:
- The header/block has no txs / recipts
- The header/block has txs / receipts
- (error-path: the header/block has it, but we're missing it)
For the two paths, we can distinguish between them if we first read the header, and check the receipts-root. The current code unconditionally tries to load the receipts, which reads header number and then loads from ancients or regular-database. And then, it always needs to read the header anyway (either explicitly or implicitly when loading the body) to distinguish between 1 or 2.
So if you read the header first, you can make path 1 less loady.
eth/protocols/eth/handlers.go
Outdated
} else { | ||
header := chain.GetHeaderByHash(hash) | ||
if header.ReceiptHash != types.EmptyReceiptsHash { | ||
body := chain.GetBody(hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other handles do not add body to cache (only the rlp-body. ). The GetBody uses ReadBodyRLP
under the hood, perhaps you can/should use chain.GetBodyRLP
instead?
It will avoid putting the body object in cache.
You can then either decode it into a body object, or you can choose to iterate the rlp lists: the first one is the transactions. And you can decode one transaction at a time, or make some smart iterator which serves you the data you need.
Serving this query is mighty expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetBlodyRLP also caches it...
This query will be less expensive than before either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rlp caches the rlp, not the objecr
This query will be a lot more expensive than before, as I see it. How do you reckon "less expensive"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this, we always called ReadReceipts
which does two ReadHeaders
a ReadBody
and a ReadRawReceipts
also it does DeriveFields
on every receipt
@@ -316,8 +316,10 @@ loop: | |||
return fmt.Errorf("wrong head block in status, want: %#x (block %d) have %#x", | |||
want, chain.blocks[chain.Len()-1].NumberU64(), have) | |||
} | |||
if have, want := msg.TD.Cmp(chain.TD()), 0; have != want { | |||
return fmt.Errorf("wrong TD in status: have %v want %v", have, want) | |||
if c.negotiatedProtoVersion < 69 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to specialize the check for protocols before 69?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After 69 we will not have the TD anymore. We could just remove the check, but I wanted to preserve it to make sure we don't accidentally break 68 compatibility
// GetRawReceiptsByHash retrieves the receipts for all transactions in a given block | ||
// without deriving the internal fields and the Bloom. | ||
func (bc *BlockChain) GetRawReceiptsByHash(hash common.Hash) rlp.RawValue { | ||
number := rawdb.ReadHeaderNumber(bc.db, hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also maintain a RLP cache for the raw receipt data?
} | ||
// Retrieve the requested block's receipts | ||
results := chain.GetRawReceiptsByHash(hash) | ||
if results == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the results
returned by chain.GetRawReceiptsByHash(hash)
is nil, it means the receipts is not existent.
Namely, the results
should never be nil even the block is empty (receipt size is zero).
The eth68 spec is unclear how should we handle the request for non-existent receipts.
Should we include the nil into the response slice (it's chosen by this pull request)?
Should we completely ignore the missing receipt?
Eth68 is defined as: https://github.com/ethereum/devp2p/blob/master/caps/eth.md#getreceipts-0x0f,
probably @fjl can provide more information here.
Besides, serviceGetReceiptsQuery68
has the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we've been thinking about is to completely remove the Bloom field from the receipt and generate it on the fly wnehever/wherever it is needed. Otherwise it's going to get generated in a potentially many cases when we don't even need it. Also trying to be smart and not generate it will blow up if someone later starts accessing it. By not having it available (or only via a method), we can make sure it's used correctly. Of course, if we would end up regenerating it multiple times then that's a problem, so we might need to be careful with cached stuff, but still.
func (bc *BlockChain) GetRawReceiptsByHash(hash common.Hash) rlp.RawValue { | ||
number := rawdb.ReadHeaderNumber(bc.db, hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better LRU cache
This can be done e.g. when deriving the root sha, e.g. in a closure. However, it can be done in a follow-up PR, it's not directly related to eth69, it's an internal thing. So let's ignore it here and now. @fjl will discuss the protocol options a bit more in the next p2p meeting. |
// A batch of receipts arrived to one of our previous requests | ||
res := new(ReceiptsPacket) | ||
if err := msg.Decode(res); err != nil { | ||
return fmt.Errorf("%w: message %v: %v", errDecode, msg, err) | ||
} | ||
// calculate the bloom filter before dispatching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't decode directly into types.Receipts, since it will just try to decode it into a legacy receipt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decode into network receipts
facc0b2
to
0a62389
Compare
@fjl I added a commit to provide custom encoding/decoding to the receipts for network. I'm not 100% sure about my decoding logic, also the way I do it right now will result in a bunch more allocations. I'm not sure how to fix that without writing much more weird code, this is the cleanest solution I came up with, maybe you have some better idea? |
core/types/receipt_test.go
Outdated
for _, receipt := range receipts { | ||
receipt.Bloom = CreateBloom(Receipts{receipt}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do this via some lazy-loading, e.g. using sync.Once. Otherwise this will happen on every test-run which imports core/types, and it will show up on cpu-profiles and mem-profiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if kind != rlp.List { | ||
return errors.New("invalid receipt") | ||
} | ||
if size == 4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the RLP format, the size is always given in bytes, even for lists. There is no way to know the number of items in a list before decoding the whole list. So this check doesn't work, and it's why I suggested to just encode all receipt types, even legacy ones, in the format [type, statusOrRoot, gasUsed, logs]
.
My suggestion here would be something like:
var (
rtype byte
status []byte
gasUsed uint64
logs []*types.Log
err error
)
if _, err = s.List(); err != nil {
return err
}
{
if rtype, err = s.Uint8(); err != nil {
return err
}
if status, err = s.Bytes(); err != nil {
return err
}
if gasUsed, err = s.Uint64(); err != nil {
return nil
}
if err = s.Decode(&logs); err != ni {
return err
}
}
if err := s.ListEnd(); err != nil {
return err
}
receipt := types.Receipt{Type: rtype, GasUsed: gasUsed, Logs: logs}
if len(status) == 32 {
receipt.Root = status
} else if len(status) == 8 {
receipt.Status = binary.BigEndian.Uint64(status)
}
It's ugly, but hand-writing decoders always is.
This PR implements eth/69 which drops the bloom filter from the receipts messages.
This will reduce the amount of data needed for a sync by ~530GB (2.3B txs * 256 byte) uncompressed.
Compressed this will be reduced to ~100GB