Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth/protocols/eth: implement eth/69 #29158

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MariusVanDerWijden
Copy link
Member

@MariusVanDerWijden MariusVanDerWijden commented Mar 4, 2024

This PR implements eth/69 which drops the bloom filter from the receipts messages.
This will reduce the amount of data needed for a sync by ~530GB (2.3B txs * 256 byte) uncompressed.
Compressed this will be reduced to ~100GB

818590-800

core/blockchain_reader.go Outdated Show resolved Hide resolved
core/types/receipt.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor

holiman commented Aug 20, 2024

Please link to the protocol specification in the PR description.

core/types/receipt_test.go Outdated Show resolved Hide resolved
eth/protocols/eth/handlers.go Show resolved Hide resolved
eth/protocols/eth/handlers.go Outdated Show resolved Hide resolved
continue
}
} else {
block := chain.GetBlockByHash(hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chain.GetBlockByHash

GetBlockByHash retrieves a block from the database by hash, caching it if found.

(emphasis mine). Are we sure we want to do that? I mean, maybe, I don't know, but worth considering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there are basically two paths here:

  1. The header/block has no txs / recipts
  2. The header/block has txs / receipts
  3. (error-path: the header/block has it, but we're missing it)

For the two paths, we can distinguish between them if we first read the header, and check the receipts-root. The current code unconditionally tries to load the receipts, which reads header number and then loads from ancients or regular-database. And then, it always needs to read the header anyway (either explicitly or implicitly when loading the body) to distinguish between 1 or 2.

So if you read the header first, you can make path 1 less loady.

eth/protocols/eth/handlers.go Outdated Show resolved Hide resolved
eth/protocols/eth/handshake.go Show resolved Hide resolved
} else {
header := chain.GetHeaderByHash(hash)
if header.ReceiptHash != types.EmptyReceiptsHash {
body := chain.GetBody(hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other handles do not add body to cache (only the rlp-body. ). The GetBody uses ReadBodyRLP under the hood, perhaps you can/should use chain.GetBodyRLP instead?

It will avoid putting the body object in cache.
You can then either decode it into a body object, or you can choose to iterate the rlp lists: the first one is the transactions. And you can decode one transaction at a time, or make some smart iterator which serves you the data you need.

Serving this query is mighty expensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetBlodyRLP also caches it...
This query will be less expensive than before either way

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rlp caches the rlp, not the objecr

This query will be a lot more expensive than before, as I see it. How do you reckon "less expensive"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this, we always called ReadReceipts which does two ReadHeaders a ReadBody and a ReadRawReceipts also it does DeriveFields on every receipt

@@ -316,8 +316,10 @@ loop:
return fmt.Errorf("wrong head block in status, want: %#x (block %d) have %#x",
want, chain.blocks[chain.Len()-1].NumberU64(), have)
}
if have, want := msg.TD.Cmp(chain.TD()), 0; have != want {
return fmt.Errorf("wrong TD in status: have %v want %v", have, want)
if c.negotiatedProtoVersion < 69 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to specialize the check for protocols before 69?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 69 we will not have the TD anymore. We could just remove the check, but I wanted to preserve it to make sure we don't accidentally break 68 compatibility

// GetRawReceiptsByHash retrieves the receipts for all transactions in a given block
// without deriving the internal fields and the Bloom.
func (bc *BlockChain) GetRawReceiptsByHash(hash common.Hash) rlp.RawValue {
number := rawdb.ReadHeaderNumber(bc.db, hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also maintain a RLP cache for the raw receipt data?

}
// Retrieve the requested block's receipts
results := chain.GetRawReceiptsByHash(hash)
if results == nil {
Copy link
Member

@rjl493456442 rjl493456442 Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the results returned by chain.GetRawReceiptsByHash(hash) is nil, it means the receipts is not existent.
Namely, the results should never be nil even the block is empty (receipt size is zero).

The eth68 spec is unclear how should we handle the request for non-existent receipts.
Should we include the nil into the response slice (it's chosen by this pull request)?
Should we completely ignore the missing receipt?

Eth68 is defined as: https://github.com/ethereum/devp2p/blob/master/caps/eth.md#getreceipts-0x0f,
probably @fjl can provide more information here.

Besides, serviceGetReceiptsQuery68 has the same issue.

Copy link
Member

@karalabe karalabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we've been thinking about is to completely remove the Bloom field from the receipt and generate it on the fly wnehever/wherever it is needed. Otherwise it's going to get generated in a potentially many cases when we don't even need it. Also trying to be smart and not generate it will blow up if someone later starts accessing it. By not having it available (or only via a method), we can make sure it's used correctly. Of course, if we would end up regenerating it multiple times then that's a problem, so we might need to be careful with cached stuff, but still.

Comment on lines +235 to +236
func (bc *BlockChain) GetRawReceiptsByHash(hash common.Hash) rlp.RawValue {
number := rawdb.ReadHeaderNumber(bc.db, hash)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better LRU cache

@holiman
Copy link
Contributor

holiman commented Sep 26, 2024

One thing we've been thinking about is to completely remove the Bloom field from the receipt and generate it on the fly wnehever/wherever it is needed.

This can be done e.g. when deriving the root sha, e.g. in a closure. However, it can be done in a follow-up PR, it's not directly related to eth69, it's an internal thing. So let's ignore it here and now.

@fjl will discuss the protocol options a bit more in the next p2p meeting.

// A batch of receipts arrived to one of our previous requests
res := new(ReceiptsPacket)
if err := msg.Decode(res); err != nil {
return fmt.Errorf("%w: message %v: %v", errDecode, msg, err)
}
// calculate the bloom filter before dispatching
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't decode directly into types.Receipts, since it will just try to decode it into a legacy receipt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decode into network receipts

@MariusVanDerWijden
Copy link
Member Author

@fjl I added a commit to provide custom encoding/decoding to the receipts for network. I'm not 100% sure about my decoding logic, also the way I do it right now will result in a bunch more allocations. I'm not sure how to fix that without writing much more weird code, this is the cleanest solution I came up with, maybe you have some better idea?

Comment on lines 301 to 303
for _, receipt := range receipts {
receipt.Bloom = CreateBloom(Receipts{receipt})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do this via some lazy-loading, e.g. using sync.Once. Otherwise this will happen on every test-run which imports core/types, and it will show up on cpu-profiles and mem-profiles

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if kind != rlp.List {
return errors.New("invalid receipt")
}
if size == 4 {
Copy link
Contributor

@fjl fjl Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the RLP format, the size is always given in bytes, even for lists. There is no way to know the number of items in a list before decoding the whole list. So this check doesn't work, and it's why I suggested to just encode all receipt types, even legacy ones, in the format [type, statusOrRoot, gasUsed, logs].

My suggestion here would be something like:

var (
	rtype   byte
	status  []byte
	gasUsed uint64
	logs    []*types.Log
	err     error
)
if _, err = s.List(); err != nil {
	return err
}
{
	if rtype, err = s.Uint8(); err != nil {
		return err
	}
	if status, err = s.Bytes(); err != nil {
		return err
	}
	if gasUsed, err = s.Uint64(); err != nil {
		return nil
	}
	if err = s.Decode(&logs); err != ni {
		return err
	}
}
if err := s.ListEnd(); err != nil {
	return err
}

receipt := types.Receipt{Type: rtype, GasUsed: gasUsed, Logs: logs}
if len(status) == 32 {
	receipt.Root = status
} else if len(status) == 8 {
	receipt.Status = binary.BigEndian.Uint64(status)
}

It's ugly, but hand-writing decoders always is.

@fjl fjl removed the status:triage label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants