-
Notifications
You must be signed in to change notification settings - Fork 21.6k
core, eth, trie: write nodebuffer asynchronously to disk #28471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@rjl493456442 Can I invite you to help evaluate this PR? |
92e4c98 to
ad03bc6
Compare
e9d0659 to
288b3c9
Compare
|
I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble. As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details? |
Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended. Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 |
288b3c9 to
37ec06a
Compare
@rjl493456442 How do you think this PR is necessary? |
|
@joeylichang I will try to do some benchmarks first. |
|
Deployed on our benchmark machine, will post the result once it's available. |
cool~, looking forward to the result.
|
@rjl493456442 Can you also show the disk read/write metrics here? Maybe disk bandwidth is the bottleneck. |
|
After running a few more days, this pull request is consistently faster than master. Due to fact that accumulating 256 Megabytes state in memory will take roughly 1.5min in full sync, so the theoretical performance speedup is ~1%(average_buffer_flush_time / 1.5min). Although the speedup is not that obvious from the metrics. |
holiman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't really say much about the overall idea at this point, don't fully understand it yet. Some things looks a bit flaky to me.
| // layer, the dirty nodes buffered within the disk layer, and the size of cached | ||
| // preimages. | ||
| func (db *Database) Size() (common.StorageSize, common.StorageSize, common.StorageSize) { | ||
| func (db *Database) Size() (common.StorageSize, common.StorageSize, common.StorageSize, common.StorageSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the method docs. With four return values, might be time to split it into a bullet-list, e.g. something like
// Size returns the sizes of:
// - the diff layer nodes above the persistent disk layer,
// - the dirty nodes buffered within the disk layer,
// - the immutable nodes in the disk layer,
// - the cached preimages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the docs.
|
|
||
| // size returns the approximate size of cached nodes in the disk layer. | ||
| func (dl *diskLayer) size() common.StorageSize { | ||
| func (dl *diskLayer) size() (common.StorageSize, common.StorageSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update method doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the docs.
trie/triedb/pathdb/disklayer.go
Outdated
| // empty returns an indicator if trienodebuffer contains any state transition inside. | ||
| empty() bool | ||
|
|
||
| // getSize return the trienodebuffer used size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns two numbers, please document what they mean, respectively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the docs.
trie/triedb/pathdb/nodebuffer.go
Outdated
| // asyncnodebuffer implement trienodebuffer interface, and aysnc the nodecache | ||
| // to disk. | ||
| type asyncnodebuffer struct { | ||
| mux sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically name mutexes mu rather than mux. Not saying it's better, but please let's stick to the convention :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also document what the mutex covers. In this case, I assume it is used as a concurrency-guard for operations on both current and background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, change to mu and, add docs for 'asyncnodebuffer' and 'flush' method
trie/triedb/pathdb/nodebuffer.go
Outdated
| if atomic.LoadUint64(&a.background.immutable) == 1 { | ||
| return nil | ||
| } | ||
|
|
||
| atomic.StoreUint64(&a.current.immutable, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look concurrency-safe: first you check, then you store (blindly). What you probably want to use here is CompareAndSwap, and why not make the immutable into an atomic.Bool rather than using atomic.LoadUint64 ?
It would become something like
if !a.background.immutable.CompareAndSwap(false, true) {
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic.Bool is better. CompareAndSwap is not appropriate, the background switch 'currnet' the immutable should be false.
trie/triedb/pathdb/nodebuffer.go
Outdated
| size uint64 // The size of aggregated writes | ||
| limit uint64 // The maximum memory allowance in bytes | ||
| nodes map[common.Hash]map[string]*trienode.Node // The dirty node set, mapped by owner and path | ||
| immutable uint64 // The flag equal 1, flush nodes to disk background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a bool or even better atomic.Bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The flag equal 1, flush nodes to disk background" -- please expand a bit on what the meaning of this field is.
For example (and I might be incorrect since I've still not fully understood this PR)
"If this is set to true, then some other thread is performing a flush in the background, and thus the ..."
or
"If this is set to true, then this nodebuffer is immutable and any write-operations to it will exit with error"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
trie/triedb/pathdb/nodebuffer.go
Outdated
| if atomic.LoadUint64(&b.immutable) == 1 { | ||
| return errWriteImmutable | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is sufficient. What if someone is performing a commit, so is in the for-loop below, and then some other threads sets the immutable to true? This seems not thread-safe to me, but maybe I don't fully understand what the concurrency looks like.
|
@rjl493456442 Any further results? |
|
I think we can close this, because the nodebuffer is being flushed async now, see 2192020 |
|
Yes, it was implemented. |






Pathdb will write trienodes to the disk in one batch after the
nodebufferis full. The operation of writing to the disk will block the execution of blocks, which will cause performance glitches from the monitoring point of view. This PR proposesasyncnodebufferto formally solve the performance glitch introduced by this.asyncnodebufferis an optimization ofnodebufferfor asynchronous writing to disk. It consists of twonodebuffer. When the current nodebuffer is full, it will become immutable and switch to the background to write to disk. The new frontnodebufferis used for subsequent writing of chasing blocks. Of course, the immutablenodebufferis still readable.