-
Notifications
You must be signed in to change notification settings - Fork 28
/
ChangeLog.highlights
151 lines (116 loc) · 5.9 KB
/
ChangeLog.highlights
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
[Current status: 2015 11 08]
Much of the spl memory mechanism has been removed
(e.g., kmem_avail(), kmem_num_pages_wanted()) or is radically
changed.
This new system starts with the premise that xnu's
variables vm_page_free_wanted, vm_page_free_count,
vm_page_speculative_count and vm_page_free_min are
sufficient to determine how much memory arc should be told
is available. It also retains and acts on the 80%
deflation of total_mem that originated with Lundman in the
hybrid branch.
An spl_free variable is exposed to the zfs layer by a
wrapper function. This variable is meant to approximate
Illumos's (freemem - lotsfree - needfree - desfree).
spl_free is SIGNED, and will be negative when arc should
give back memory.
A further variable, spl_free_manual_pressure, is also
exposed to arc.c, to allow a sysctl to force an arc shrink
by arc_reclaim_thread(); a further wrapper allows
arc_reclaim_thread() to reset the pressure when that
thread has acted on it. When spl_free_manual_pressure is
positive, arc will quickly give back memory.
Writes to these two variables are mutexed.
spl_minimal_physmem_p() is used to determine when memory
is really low. it wraps spl_minimal_physmem_p_logic() for
reasons which existed in previous incarnations.
A trio of low-frequency threads, structured like
arc_reclaim_thread() and similar threads in arc, each do a
share of the main work. They can be low frequency even on
a busy system with lots of changes in non-zfs memory
usage, mainly because ARC itself, by design, suppresses
small and high-frequency memory availability information
(e.g., Illumos's freemem variable).
(The threads each use a shutdown variable, a mutex for the
shutdown variable, and a condvar for controlling the while
loop and acting as a callback target).
spl_free_thread() manipulates the spl_free variable. It
sets spl_free to free memory + half of speculative mem,
and then mostly decreases spl_free from there, for example
when segkmem_total_mem_allocated is above thresholds, or
spl_free is nearing 90% of real_total_memory. ARC will
grow to arc_max (or a large fraction thereof when DEBUG is
enabled) but will not trigger OOM or glitching.
Additionally, ARC will compete gently with the xnu UBC
cache; heavy HFS use will cause ARC to shrink back for a
while, whereas heavy ZFS use will cause UBC to shrink if
there is only very light use of other filesystems.
There is no non-negligible advantage in practice even on a
heavy system with highly variable memory to running
spl_free_thread more frequently, nor being more precise
about actual memory availability even on a
restricted-memory system. ARC itself damps out such things.
However, the VM_PAGE_FREE_MIN macro incorporates a pair of
sysctl tunables that can vary the aggressiveness of spl
with respect to other system memory users. Discussed below.
reap_thread() calls kmem_reap() and kmem_reap_idspace()
periodically. It will also call them when
segkmem_total_mem_allocated has grown above 90% of the
(deflated to 80% of physmem) total_memory, and when
signalled to do so by another thread or by sysctl (both
via the mutexed reap_now boolean variable), provided that
calling kmem_reap* is likely to actually release memory.
spl_mach_pressure_monitor_thread() calls
mach_vm_pressure_monitor(). This call can block for
minutes or even hours on lightly loaded systems, so it is
sequestered into its own thread. When a system is in
tight memory, the call returns much more quickly. This
thread may adjust spl_mem, and may cause spl_mem to go
negative, thus causing ARC to give back memory. In
previous incarnations, this thread did more (it represents
the residue of the core of the memory_monitor_thread in
hybrid and master).
memory_monitor_thread() currently holds policy that
affects reap_thread()'s behaviour. Mostly it does work
during the "background 80% real memory check". In
previous incarnations it did MUCH more.
Differences from hybrid branch:
* KMEM_QUANTUM is 128k rather than 512k
* spl_cv_destroy() was noop, now it calls spl_cv_broadcast() defensively
Debugging:
SPL_DEBUG_MUTEX is on, DEBUG is enabled in spl-mutex.c
dprintf() is a macro in spl-kmem.c
DEBUG is enabled in spl-kmem.c
DEBUG is enabled in spl-osx.c to enable stack backtrace
* when DEBUG is true in spl-kmem.c, randomly reap all the kmem caches
(and more often than Illumos kemem DEBUG does)
bunches of printfs and dprintfs in spl-kmem.c
Tunables:
* vm_page_free_min_multiplier
* vm_page_free_min_min
(used for determining how much free memory
overhead to leave compared to vm_page_free_min
which comes from xnu)
Adjusting them printfs a "headroom" message.
When headroom goes up, non-ZFS consumers of memory see less pressure.
When headroom goes down, system memory pressure increases.
Small numbers of pages worth of difference can have an enormous impact
on system performance on a busy, tight-memory system.
xnu likes being in the region where vm_page_free_count is between
vm_page_free_min and vm_page_free_target, which on all systems
of all memory sizes I've seen 3500 and 4000 pages, respecitvely.
By default, spl has "incursions" into that region, otherwise ARC melts away
to nothingness. However, it's only small incursions, otherwise
system performance tanks (glitches, OOMs, etc.)
* several tunables and kstats related to the operation of the three work threads
* several tunables related to the new manual pressure mechanism
(in paricular, spl_spl_free_manual_pressure kstat stays nonzero until arc deals
with the manual pressure input)
* [temporarily, probably DEBUG only eventually] an exponential moving average of the
amount of policy-based adjustment of memory from "actual free memory" to the
spl_free variable
CAVEAT:
I don't unload the spl kext much. Right at this moment kextunload might not work.
OTHER CHANGES from master not in hybrid:
Bring back VNOP_LOOKUP (lundman)
Display SPL version on module unload (lundman)