-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add symbol caches to opcodes that do dynamic resolution of names #28
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
High level overview and preliminary tests
This technique is usually known as inline caching.
Basically, we try to optimistically cache the result of a method/field/global variable name resolution inside the runtime state of the VM in the hopes that, for the next execution of the opcode, types won't have changed and so we can directly return the cached value (the value directly in the case of methods, the field offset inside the
ObjInstance
for fields) instead of doing name resolution again, which involves an hashtable lookup i the J* VM.Final Results
Achieved a ~30% speedup on code that heavily relies on methods, fields or global variable lookups. Also, a speedup is achieved across the board as most code uses at least global variables (i.e. module imports or module calls) fairly extensively.
Heavy method calls:
Heavy use of fields (and some method calls):
Heavy use of global vars:
Moderate use of global vars:
(Almost) no use of global vars:
All benchmarks above are microbenches. Here we also present a bench on a more realistic project: https://github.com/bamless/pulsar (arguments to pulsar are shortened wit
...
as there are lots of them)Implementation overview
The inline caching implementation presented in this PR differs slightly from classical inline cache implementations found in other VMs (as far as I can tell).
For certain aspects, it resembles the name resolution performed by the hotspot JVM when resolving a field or class name for the first time. Differently from Java, that can check statically at compile time the type of the value, our implementation is guarded by checks on a
key
(typically the last type, i.eObjClass
, the opcode has seen) to prevent erroneously resolving a name from another class when the types of a variable change for the same opcode.Implementation in detail
One key aspect in which this implementation differs from classical approaches on the fact that caches are not actually stored inline in the bytecode. Instead, a new array of
Symbol
s has been added to the runtime representation of compiled code. These symbol function as a proxy to a constant String in the base case, for example when first trying to resolve the name. In this case, all functions as before, but with an extra level of indirection:Then, an extra and crucial step is added:
Next time that opcode is executed, we first check the symbol for a cached value, and if the types match we directly return the resolved value without doint a full name resolution.
To give a better idea, this is the new
code.h
:This is the new
OP_GET_FIELD
implementation:Note the extra indirection to get the field's name from the symbol, and the fact that the symbol is forwarded to
getValueField
.The
getCached
performs the magic. If it finds that the key of the cache satisfies the preconditions, it returns directly the value:On cache hits, this gives us a massive boost in performnce coming from not having to do an (and possible multiple) hashtable lookups.
Implications of the new implementation
One pretty big change coming with this PR is the fact that
ObjectInstance
andObjModule
will have a different struct layout. These two object now store their values in a plain array, indexed by an HashTable<String, int> from name to offset in this array. This means that in the case we have a cache miss, we pay the cost of an extra indirection for looking up the field (we firest look up the offset using the name, and then we index into the array storing it). This could impact performance when we have lots of cache misses. In practice though, it doesn't seem to costs us a lot, and the performance gains of cache hits vastly justify a couple of extra memory reads in the worst case.Also, the object layout change is not a problem for binary compatibility of the library. J* never has exposed internal types to embedding libraries and instead relies fully on the stack-based protocol for embedding. This means that all should behave exactly the same as before.
The only thing that will be done, is extending the J* embedding interface with new methods that mirror
jsrGetField
,jsrInvoke
, etc... that will take a symbol as input, in order to give to the embedder the option to cache lookups from the extension side.Progress
int
version. Better yet rewrite the hashtable implementation to be generic using macros.Further work
Once this is fully implemented, it would probably be worthwile trying to implement quickening. The two techniques play well together and would probably result in a further speed-up.