Skip to content

Commit

Permalink
ARC shrinking blocks reads/writes
Browse files Browse the repository at this point in the history
ZFS registers a memory hook, `__arc_shrinker_func`, which is supposed to
allow the ARC to shrink when the kernel experiences memory pressure.
The ARC shrinker changes `arc_c` via a call to
`arc_reduce_target_size()`.  Before commit 3ec34e5, the ARC
shrinker would also evict data from the ARC to bring `arc_size` down to
the new `arc_c`.  However, that commit (seemingly inadvertently) made it
so that the ARC shrinker no longer evicts any data or waits for eviction
to complete.

Repeated calls to the ARC shrinker can reduce `arc_c` drastically, often
all the way to `arc_c_min`.  Since it doesn't wait for the actual
eviction of data from the ARC, this creates a situation where `arc_size`
is more than `arc_c` for the several seconds/minutes it takes for
`arc_adjust_zthr` to evict data from the ARC.  During this time,
arc_get_data_impl() will block, so ZFS can't process read/write requests
(e.g. from iSCSI, NFS, or read/write syscalls).

To ensure that `arc_c` doesn't shrink faster than the adjust thread can
keep up, this commit makes the ARC shrinker wait for the eviction to
complete, resulting in similar behavior to what we had before commit
3ec34e5.

Note: commit 3ec34e5 is `OpenZFS 9284 - arc_reclaim_thread
has 2 jobs` and was integrated in December 2018, and is part of ZoL
0.8.x but not 0.7.x.

Additionally, when the ARC size is reduced drastically, the
`arc_adjust_zthr` can be on-CPU for many seconds without blocking.  Any
threads that are bound to the same CPU that arc_adjust_zthr is running
on will not able to run for a long time.

To ensure that CPU-bound threads can make progress, this commit changes
`arc_evict_state_impl()` make a voluntary preemption call,
`cond_resched()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-70703
Closes openzfs#10496
  • Loading branch information
ahrens authored Jun 26, 2020
1 parent 221e670 commit 67c0f0d
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 2 deletions.
1 change: 1 addition & 0 deletions include/sys/arc_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -894,6 +894,7 @@ extern int arc_lotsfree_percent;
extern void arc_reduce_target_size(int64_t to_free);
extern boolean_t arc_reclaim_needed(void);
extern void arc_kmem_reap_soon(void);
extern boolean_t arc_is_overflowing(void);

extern void arc_lowmem_init(void);
extern void arc_lowmem_fini(void);
Expand Down
18 changes: 18 additions & 0 deletions module/os/linux/zfs/arc_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,24 @@ __arc_shrinker_func(struct shrinker *shrink, struct shrink_control *sc)
*/
if (pages > 0) {
arc_reduce_target_size(ptob(sc->nr_to_scan));

/*
* Repeated calls to the arc shrinker can reduce arc_c
* drastically, potentially all the way to arc_c_min. While
* arc_c is below arc_size, ZFS can't process read/write
* requests, because arc_get_data_impl() will block. To
* ensure that arc_c doesn't shrink faster than the adjust
* thread can keep up, we wait for eviction here.
*/
mutex_enter(&arc_adjust_lock);
if (arc_is_overflowing()) {
arc_adjust_needed = B_TRUE;
zthr_wakeup(arc_adjust_zthr);
(void) cv_wait(&arc_adjust_waiters_cv,
&arc_adjust_lock);
}
mutex_exit(&arc_adjust_lock);

if (current_is_kswapd())
arc_kmem_reap_soon();
#ifdef HAVE_SPLIT_SHRINKER_CALLBACK
Expand Down
12 changes: 10 additions & 2 deletions module/zfs/arc.c
Original file line number Diff line number Diff line change
Expand Up @@ -853,7 +853,6 @@ static void arc_free_data_impl(arc_buf_hdr_t *hdr, uint64_t size, void *tag);
static void arc_hdr_free_abd(arc_buf_hdr_t *, boolean_t);
static void arc_hdr_alloc_abd(arc_buf_hdr_t *, boolean_t);
static void arc_access(arc_buf_hdr_t *, kmutex_t *);
static boolean_t arc_is_overflowing(void);
static void arc_buf_watch(arc_buf_t *);

static arc_buf_contents_t arc_buf_type(arc_buf_hdr_t *);
Expand Down Expand Up @@ -3995,6 +3994,15 @@ arc_evict_state_impl(multilist_t *ml, int idx, arc_buf_hdr_t *marker,

multilist_sublist_unlock(mls);

/*
* If the ARC size is reduced from arc_c_max to arc_c_min (especially
* if the average cached block is small), eviction can be on-CPU for
* many seconds. To ensure that other threads that may be bound to
* this CPU are able to make progress, make a voluntary preemption
* call here.
*/
cond_resched();

return (bytes_evicted);
}

Expand Down Expand Up @@ -4992,7 +5000,7 @@ arc_adapt(int bytes, arc_state_t *state)
* Check if arc_size has grown past our upper threshold, determined by
* zfs_arc_overflow_shift.
*/
static boolean_t
boolean_t
arc_is_overflowing(void)
{
/* Always allow at least one block of overflow */
Expand Down

0 comments on commit 67c0f0d

Please sign in to comment.