@@ -14,16 +14,16 @@ A family of instructions has the following fundamental properties:
14
14
it executes the non-adaptive instruction.
15
15
* It has at least one specialized form of the instruction that is tailored
16
16
for a particular value or set of values at runtime.
17
- * All members of the family have access to the same number of cache entries.
18
- Individual family members do not need to use all of the entries.
17
+ * All members of the family must have the same number of inline cache entries,
18
+ to ensure correct execution.
19
+ Individual family members do not need to use all of the entries,
20
+ but must skip over any unused entries when executing.
19
21
20
22
The current implementation also requires the following,
21
23
although these are not fundamental and may change:
22
24
23
- * If a family uses one or more entries, then the first entry must be a
24
- ` _PyAdaptiveEntry ` entry.
25
- * If a family uses no cache entries, then the ` oparg ` is used as the
26
- counter for the adaptive instruction.
25
+ * All families uses one or more inline cache entries,
26
+ the first entry is always the counter.
27
27
* All instruction names should start with the name of the non-adaptive
28
28
instruction.
29
29
* The adaptive instruction should end in ` _ADAPTIVE ` .
@@ -76,6 +76,10 @@ keeping `Ti` low which means minimizing branches and dependent memory
76
76
accesses (pointer chasing). These two objectives may be in conflict,
77
77
requiring judgement and experimentation to design the family of instructions.
78
78
79
+ The size of the inline cache should as small as possible,
80
+ without impairing performance, to reduce the number of
81
+ ` EXTENDED_ARG ` jumps, and to reduce pressure on the CPU's data cache.
82
+
79
83
### Gathering data
80
84
81
85
Before choosing how to specialize an instruction, it is important to gather
@@ -106,7 +110,7 @@ This can be tested quickly:
106
110
* ` globals->keys->dk_version == expected_version `
107
111
108
112
and the operation can be performed quickly:
109
- * ` value = globals->keys-> entries[index].value ` .
113
+ * ` value = entries[cache-> index].me_value; ` .
110
114
111
115
Because it is impossible to measure the performance of an instruction without
112
116
also measuring unrelated factors, the assessment of the quality of a
@@ -119,8 +123,7 @@ base instruction.
119
123
120
124
In general, specialized instructions should be implemented in two parts:
121
125
1 . A sequence of guards, each of the form
122
- ` DEOPT_IF(guard-condition-is-false, BASE_NAME) ` ,
123
- followed by a ` record_cache_hit() ` .
126
+ ` DEOPT_IF(guard-condition-is-false, BASE_NAME) ` .
124
127
2 . The operation, which should ideally have no branches and
125
128
a minimum number of dependent memory accesses.
126
129
@@ -129,3 +132,11 @@ can be re-used in the operation.
129
132
130
133
If there are branches in the operation, then consider further specialization
131
134
to eliminate the branches.
135
+
136
+ ### Maintaining stats
137
+
138
+ Finally, take care that stats are gather correctly.
139
+ After the last ` DEOPT_IF ` has passed, a hit should be recorded with
140
+ ` STAT_INC(BASE_INSTRUCTION, hit) ` .
141
+ After a optimization has been deferred in the ` ADAPTIVE ` form,
142
+ that should be recorded with ` STAT_INC(BASE_INSTRUCTION, deferred) ` .
0 commit comments