Shrinking the inline caches #396
Replies: 4 comments 1 reply
-
We can further shrink the |
Beta Was this translation helpful? Give feedback.
-
Another way to shrink the We probably don't want to use the global type cache, as we need to recheck the version, perform a redundant check on the name and handle misses due to collisions. If this is effective, we could claim back some memory by shrinking the global method cache. |
Beta Was this translation helpful? Give feedback.
-
How do we ensure that the borrowed references remain valid? (Especially in the light of all of the discussion about changes to refcounting.) |
Beta Was this translation helpful? Give feedback.
-
They are borrowed references, so they are always invalid by themselves. It is only the combination of use and cache that may be valid. The references are only valid at point of use if the type versions match, as that means there is a strong reference to the attribute in the class or one of its superclasses. The type cache used in |
Beta Was this translation helpful? Give feedback.
-
Currently inline caches take up about two thirds of the space used by the bytecode.
We get good performance from inline caches, so obviously we want to keep them, but they could be smaller.
For hardware reasons caches must be a power-of-two size, which is a bit unfortunate as we often don't need that much data.
We should look into ways to reduce the amount of information needed in the cache to fit into a smaller power-of-two.
We can do that by merging two values into a single cache entry, or by accepting that some things won't get cached.
Some examples
Pointers
We currently store pointers as 64bit values. We could store pointers as 32bit offsets. That would prevent caching any pointers more than 2Gb away from the base pointer. For the vast majority of applications that should be a win, if we are caching objects created early like functions or classes.
Combining function version and minimum args in
_PyCallCache
Currently we store the function version in 32 bits, where 26 bits should be enough. The minimum args is given 16 bits, when 6 bits is enough; functions with 64 or more parameters are extremely rare. 26 + 6 == 32.
LOAD_METHOD
The
LOAD_METHOD
instruction has a 20 byte cache.We can reduce its size to 12 bytes by changing the pointer to an offset, ditching the
LOAD_METHOD_WITH_DICT
which is mostly ineffective and shrinking the cached keys version from 32 bits to 16 bits.Beta Was this translation helpful? Give feedback.
All reactions