Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

les: historical data garbage collection #19570

Merged
merged 3 commits into from
Jul 13, 2020

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented May 14, 2019

This PR introduces garbage collection feature for light client. If you want to disable the GC, please add flag --light.nopruning

Now we have several types of data in light client which can be pruned:

  • block headers
  • bloom bits
  • CHT trie
  • Bloom trie

Since light client can generate the new cht root and bloom trie root in runtime, so all historical chain data is unnecessary for light client.

If we need to prove something with a pruned headers, light client will fetch it again with latest CHT root which covers all historical headers.

Here are some GC results:

Original client:

+-----------------+------------------------------+------------+
|    DATABASE     |           CATEGORY           |    SIZE    |
+-----------------+------------------------------+------------+
| Key-Value store | Headers                      | 164.75 MiB |
| Key-Value store | Bodies                       | 44.00 B    |
| Key-Value store | Receipts                     | 42.00 B    |
| Key-Value store | Total Difficulty             | 12.45 MiB  |
| Key-Value store | Block <number-hash> pairings | 11.40 MiB  |
| Key-Value store | Block <hash-number> pairings | 10.51 MiB  |
| Key-Value store | Trie nodes                   | 50.88 KiB  |
| Key-Value store | Transaction indexes          | 0.00 B     |
| Key-Value store | Preimages                    | 15.81 KiB  |
| Key-Value store | BloomBits                    | 12.46 MiB  |
| Ancient store   | Headers                      | 6.00 B     |
| Ancient store   | Bodies                       | 6.00 B     |
| Ancient store   | Receipts                     | 6.00 B     |
| Ancient store   | Total Difficulty             | 6.00 B     |
| Ancient store   | Block <number-hash> pairings | 6.00 B     |
| Light client    | CHT trie nodes               | 25.13 MiB  |
| Light client    | Bloom trie nodes             | 25.51 MiB  |
+-----------------+------------------------------+------------+
|                              TOTAL             | 262.46 MIB |
+-----------------+------------------------------+------------+

GCed client

+-----------------+------------------------------+-----------+
|    DATABASE     |           CATEGORY           |   SIZE    |
+-----------------+------------------------------+-----------+
| Key-Value store | Headers                      | 44.19 MiB |
| Key-Value store | Bodies                       | 44.00 B   |
| Key-Value store | Receipts                     | 42.00 B   |
| Key-Value store | Total Difficulty             | 3.34 MiB  |
| Key-Value store | Block <number-hash> pairings | 3.08 MiB  |
| Key-Value store | Block <hash-number> pairings | 2.82 MiB  |
| Key-Value store | Trie nodes                   | 51.51 KiB |
| Key-Value store | Transaction indexes          | 0.00 B    |
| Key-Value store | Preimages                    | 15.81 KiB |
| Key-Value store | BloomBits                    | 3.04 MiB  |
| Ancient store   | Headers                      | 6.00 B    |
| Ancient store   | Bodies                       | 6.00 B    |
| Ancient store   | Receipts                     | 6.00 B    |
| Ancient store   | Total Difficulty             | 6.00 B    |
| Ancient store   | Block <number-hash> pairings | 6.00 B    |
| Light client    | CHT trie nodes               | 3.59 MiB  |
| Light client    | Bloom trie nodes             | 2.57 MiB  |
+-----------------+------------------------------+-----------+
|                              TOTAL             | 62.88 MIB |
+-----------------+------------------------------+-----------+

With this feature, we can control the storage size of light client into a fixed value.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I know the codebase enough to review, generally looks good to me

light/postprocess.go Outdated Show resolved Hide resolved
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from a3b41ae to 7aa7475 Compare May 16, 2019 07:39
@fjl fjl added this to the 1.9.1 milestone Jun 5, 2019
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from 7aa7475 to cd89f63 Compare June 13, 2019 06:58
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from cd89f63 to 4024180 Compare July 1, 2019 07:15
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from 4024180 to d79678c Compare July 17, 2019 09:01
@karalabe karalabe modified the milestones: 1.9.1, 1.9.2 Jul 23, 2019
les/pruner.go Outdated
defer p.wg.Done()

var (
last = p.checkpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that every time the client is started you begin deleting old chain data starting from the checkpoint? This feels a bit wasteful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: you could change Prune to use a db iterator to find and delete everything in the db from the beginning to the given section instead of iterating with a for loop on every block number of the specific section. This would have multiple advantages:

  • you do not need to remember what was pruned already
  • it is efficient because chain data db keys are prefixed with block number (this was one of the reasons I did it like this)
  • this way you can also prune cached chain data that was ODRed after pruning that section. For example log searching can download many old receipts and it is nice to cache them for a while but it would also be good to throw them away eventually. If you just clean everything before the new section every time a new section is processed then I think it is good enough.

les/pruner.go Outdated
return
}
// Always keep the latest section data in database.
for i := last + 1; i < min-1; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic does not handle indexer rollbacks. While this is not a frequent case we do handle it in the indexers (might happen on testnets/private nets). Rolling back while pruning is a bit tricky though (you would have to restore the headers of the current unfinished section) so I think it is fine to not be able to roll back properly if pruning is enabled. In this case an error message would be nice at least. And/or automatically reverting to the last stable checkpoint and resyncing from there.

Copy link
Member Author

@rjl493456442 rjl493456442 Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'll keep the latest section in db. It means at least 32768+2048 headers are kept.
Do we really have that deep reorg even in the testnet? Maybe it's possible in the ropsten...

Now we don't have a very good solution to restore all pruned chain data. Rewind HEAD to checkpoint(or geneisis) seems feasible. I may try this approach.

@rjl493456442
Copy link
Member Author

Find a critial issue.
If a light client is pruned, then it will lose all hash->number mapping of headers.
So these backend APIs won't work since the parameters of these APIs are blockhash. We can't retrieve relative headers back via ODR.

  • GetReceipts
  • GetLogs

@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from d79678c to 13d3395 Compare July 30, 2019 06:45
@rjl493456442 rjl493456442 requested a review from gballet as a code owner July 30, 2019 06:45
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch 5 times, most recently from 38055b8 to 0914adb Compare July 31, 2019 02:56
@rjl493456442
Copy link
Member Author

@zsfelfoldi I change the code a bit. One important thing is I reserve all hash->number mappings in the database since they are necessary for hash based APIs. From the storage size, the size of 1 section mappings is about 1.4MB. So I think it's totally acceptable.

Please take another look :).

les/pruner.go Outdated Show resolved Hide resolved
Copy link
Contributor

@zsfelfoldi zsfelfoldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments about the iterators (the rest is fine now).

core/rawdb/accessors_chain.go Show resolved Hide resolved
core/rawdb/accessors_chain.go Outdated Show resolved Hide resolved
core/rawdb/accessors_indexes.go Outdated Show resolved Hide resolved
core/rawdb/accessors_indexes.go Outdated Show resolved Hide resolved
les/pruner.go Show resolved Hide resolved
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch 2 times, most recently from b54a316 to bfeeac0 Compare August 5, 2019 09:01
cmd/utils/flags.go Outdated Show resolved Hide resolved
core/chain_indexer.go Outdated Show resolved Hide resolved
core/rawdb/accessors_chain.go Outdated Show resolved Hide resolved
// ReadAllCanonicalHashes retrieves all canonical number and hash mappings at the
// certain chain range. Note, this method should only used in pruned light client
// otherwise the cost can be very expensive.
func ReadAllCanonicalHashes(db ethdb.Iteratee, from uint64, to uint64) ([]uint64, []common.Hash) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterator implementation seems a bit suboptimal to me. If I call this for mainnet with all blocks present, for from = 8000000 and to = from + 1, you will iterate over 8M entries until you find the starting item. I guess if the use case is light client pruning you can expect all initial keys to be missing, but it still seems suboptimal.

Wouldn't a better solution be to use: start = headerHashKey(from); end = headerHashKey(to)and then simply iterate withNewIteratorWithStart(start)and terminate whenbytes.Compare(key, end) >= 0? We could also swap out NewIteratorWithStarttoNewIteratorWithRange` to make this code even simpler and the iterator even more flexible.

core/rawdb/accessors_indexes.go Outdated Show resolved Hide resolved
core/rawdb/accessors_indexes.go Outdated Show resolved Hide resolved
eth/config.go Outdated Show resolved Hide resolved
@@ -697,7 +697,7 @@ func (db *Database) Cap(limit common.StorageSize) error {
//
// Note, this method is a non-synchronized mutator. It is unsafe to call this
// concurrently with other mutators.
func (db *Database) Commit(node common.Hash, report bool) error {
func (db *Database) Commit(node common.Hash, report bool, callback func(common.Hash)) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this callback?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want to prune CHT and Bloom Trie. Since the trie nodes of the latest section are enough for generating next Merkle root. So I use the callback to collect all trie nodes of the current section and delete all other.

@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from f65e1f7 to 7d8be12 Compare August 6, 2019 01:56
@rjl493456442
Copy link
Member Author

@karalabe Fixed, ptal

@karalabe karalabe removed this from the 1.9.2 milestone Aug 13, 2019
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from f1cc133 to e5a76a0 Compare March 31, 2020 08:42
@karalabe karalabe modified the milestones: 1.9.13, 1.9.14 Apr 20, 2020
@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from e5a76a0 to 9001100 Compare May 12, 2020 09:03
@karalabe karalabe modified the milestones: 1.9.14, 1.9.15 May 13, 2020
@karalabe karalabe modified the milestones: 1.9.15, 1.9.16 Jun 8, 2020
@fjl fjl removed the status:triage label Jul 1, 2020
@fjl
Copy link
Contributor

fjl commented Jul 1, 2020

Please remove the flag. This feature can just be on by default.

@rjl493456442 rjl493456442 force-pushed the les-garbage-collection branch from 9001100 to 796f8a5 Compare July 6, 2020 06:33
@fjl fjl modified the milestones: 1.9.16, 1.9.17 Jul 10, 2020
@fjl fjl merged commit 6eef141 into ethereum:master Jul 13, 2020
@fjl fjl mentioned this pull request Jul 13, 2020
enriquefynn pushed a commit to enriquefynn/go-ethereum that referenced this pull request Mar 10, 2021
This change introduces garbage collection for the light client. Historical
chain data is deleted periodically. If you want to disable the GC, use
the --light.nopruning flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants