-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parallel dbuf eviction #16487
base: master
Are you sure you want to change the base?
Conversation
d3aad0f
to
3570e89
Compare
3570e89
to
5b070d1
Compare
if (skip == 0) | ||
skip = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to check before assign?
*/ | ||
static void | ||
dbuf_evict_one(void) | ||
dbuf_evict_many(uint64_t bytes, unsigned int idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the arguments order.
uint64_t nchunks = ((bytes - 1) >> DBUF_MIN_EVICT_PERTASK_SHIFT) + 1; | ||
unsigned n = nchunks < num_sublists ? nchunks : num_sublists; | ||
uint64_t fullrows = nchunks / n; | ||
unsigned lastrowcols = nchunks % n; | ||
unsigned k = (lastrowcols ? lastrowcols : n); | ||
|
||
uint64_t bytes_pertask_low = fullrows << DBUF_MIN_EVICT_PERTASK_SHIFT; | ||
uint64_t bytes_pertask = bytes_pertask_low + (lastrowcols ? | ||
(1 << DBUF_MIN_EVICT_PERTASK_SHIFT) : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for ARC seems over-engineered. No need for tasks to be multiple of 1 << DBUF_MIN_EVICT_PERTASK_SHIFT
.
evarg[i].idx = idx; | ||
evarg[i].bytes = evict; | ||
|
||
taskq_dispatch_ent(dbuf_evict_taskq, dbuf_evict_task, | ||
&evarg[i], 0, &evarg[i].tqe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere here should be taskq_init_ent()
, probably.
if (size > dbuf_cache_target_bytes()) { | ||
if (size > dbuf_cache_hiwater_bytes()) | ||
dbuf_evict_one(); | ||
dbuf_evict(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets assume we have 10 user threads calling this. I suppose each of them will try to create own task sets to evict the same full amount of extra dbuf caches using all the same CPUs. In best case it may end up with empty dbuf cache. I am not sure I greatly like the design of one main eviction thread calling bunch of other taskqs, but each client thread doing that definitely looks weird. I think if user threads has to do evictions, they should do it directly, just doing more than one buffer at a time to be more efficient, as you have said.
54888d8
to
883cd30
Compare
I have updated the patch with a different logic for picking the default maximum number of dbuf eviction threads. The new logic aims to pick the number that is one-eighth of the available CPUs, with a minimum of 2 and a maximum of 16. |
.It Sy dbuf_evict_parallel Ns = Ns Sy 0 Pq uint | ||
When set to 1, ZFS will use up to | ||
.Sy dbuf_evict_threads | ||
threads to evict dbuf data in parallel, improving the responsiveness | ||
of ZFS to memory pressure. | ||
. | ||
.It Sy dbuf_evict_threads Ns = Ns Sy 0 Pq uint | ||
Sets the maximum number of dbuf eviction threads to be used. | ||
When set to 0, ZFS uses one-eighth of the available CPUs, | ||
with a minimum of 2 and a maximum of 16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a time you would want both dbuf_evict_threads >= 2
and dbuf_evict_parallel=0
? Just wondering if you can simplify this to dbuf_evict_threads
only, and imply "parallel" if it's set to 2 threads or more.
In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater(). dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return. As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict. This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached. The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread. Sponsored-by: Expensify, Inc. Sponsored-by: Klara, Inc. Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Signed-off-by: Alexander Stetsenko <alex.stetsenko@gmail.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
883cd30
to
47c490b
Compare
I've rebased onto latest master. I'll address the feedback soon. |
This controls how quickly the transaction delay approaches infinity. | ||
Larger values cause longer delays for a given amount of dirty data. | ||
.Pp | ||
For the smoothest delay, this value should be about 1 billion divided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you added some tabs here?
Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Motivation and Context
Replace
dbuf_evict_one()
withdbuf_evict_many()
to more efficiently evict dbuf objects without looping over the same locked objects over and over.Description
In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater().
dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return.
As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict.
This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached.
The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread.
How Has This Been Tested?
Performance benchmarks while the dbuf cache is full
Types of changes
Checklist:
Signed-off-by
.