-
Notifications
You must be signed in to change notification settings - Fork 285
[TG-2478] Make Bootstrap methods available in method/instruction conversion #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Unit tests will follow in the next few days. |
|
|
||
| const java_class_typet::java_lambda_method_handlet &lambda_method_handle = | ||
| lambda_method_handles.at( | ||
| std::stoi(id2string(code_type.get(ID_java_lambda_method_handle_id)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index rather than id since I think that more strongly suggests it is a zero indexed counter rather than some unique identifier.
src/java_bytecode/java_types.h
Outdated
| return set(ID_access, access); | ||
| } | ||
|
|
||
| class java_lambda_method_handlet : public exprt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting my reservations about inheriting from exprt here (my preference would be irept since it isn't a node in an AST but we still need irep functionality to store it in a class_typet). However interested in seeing other perspectives on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an offline discussion, a reasonable solution seems to be holding the method handle as a symbol_exprt (which only has an identifier of the referred method).
| // now do lambda method handles (bootstrap methods) | ||
| for(const auto &lambda_entry : c.lambda_method_handle_map) | ||
| { | ||
| const reference_typet lambda_ref = java_reference_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think about it a bit more, seems odd that this is a reference_typet, why not just directly store either:
symbol_typetsymbol_exprtof type the type of the method (e.g. acode_typet)- just an
irep_idtas the identifier
Again interested in other opinions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I opted for a reference here is that method handles are defined as references to methods: https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandle.html
I'm happy to change it to any of the suggested types if that would make it semantically clearer or more fitting to the CBMC logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - I think you're right that my suggestions aren't quite right.
However, I think what you have might be a bit of a mismatch. It seems like the type of the method handle is a java_reference_type(code_typet{params and returns}) (e.g. kind of like Predicate<Integer> and the value is a address_of_exprt(symbol_exprt{method_name})) (e.g. Integer::equals).
I don't think it will cause problems, but for example I would expect if I look up a symbol_typet in the symbol table, then what I'd get back would have is_type set to true, wheras in this case it will be is_function() that returns true.
Not sure - thinking in comments - do you have any thoughts on this @mgudemann ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an offline discussion with @thk123 the type of a method handle is now symbol_exprt. Any further thoughts on this choice would be appreciated.
| const java_class_typet::java_lambda_method_handlet &lambda_method_handle = | ||
| lambda_method_handles.at( | ||
| std::stoi(id2string(code_type.get(ID_java_lambda_method_handle_id)))); | ||
| const symbolt &lambda_method_symbol = symbol_table.lookup_ref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can build directly on this - could this be moved into a method on/taking lambda_method_handles that gives me java_lambda_method_handlet and lambda_method_symbol for a given index (so for example I'm protected from changes in exactly how this works: type().subtype().get(ID_identifier))
df3085b to
061a72c
Compare
|
This is now ready for (re)review. |
|
@thk123 @thomasspriggs @smowton Can I please get two reviews for this? |
thk123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good, some suggestions but I don't think anything blocking or major.
src/util/irep_ids.def
Outdated
| IREP_ID_TWO(generic_types, #generic_types) | ||
| IREP_ID_TWO(implicit_generic_types, #implicit_generic_types) | ||
| IREP_ID_TWO(type_variables, #type_variables) | ||
| IREP_ID_TWO(java_lambda_method_handle_index, #lambda_method_handle_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments by convention are called ID_C_..., though I wonder if this really should be a comment, if two invokedynamics refer to two different method handles they probably should be not-equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This index is stored in the invoke dynamic instruction together with it's type which is of the form Execute:(<arguments>)<lambda's_interface_name> which I think should be just as unique as the index. So there's probably no harm in putting the index in comment but I'm not entirely sure about it. What would be the benefit of having it as a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't clear - it currently is a comment (since the name starts with a # ), I agree with you - it should not be a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, will change it.
|
|
||
| const optionalt<symbolt> &lambda_method_symbol = get_lambda_method_symbol( | ||
| lambda_method_handles, | ||
| code_type.get(ID_java_lambda_method_handle_index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to use get_int here and have get_lambda_method_symbol take an int as this handles the conversion for you.
| }); | ||
|
|
||
| REQUIRE(num_matches == 1); | ||
| INFO("Entry found"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INFO is only printed when a require fails so I don't know if this is actually useful since it will only appear when the wrong entry is found? (e.g. count_if > == 1 but find_if == false)
| const auto num_matches = std::count_if( | ||
| parsed_class.lambda_method_handle_map.begin(), | ||
| parsed_class.lambda_method_handle_map.end(), | ||
| [&method_type, &lambda_method_ref](const lambda_method_entryt &entry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps store this predicate as a is_match or remove the second loop by using copy_if and checking there is exactly one (bit clunky I know)
| ? "StaticLambdas.lambda$1:(ILjava/lang/Object;LDummyGeneric;)V" | ||
| : "StaticLambdas.lambda$static$1:(ILjava/lang/" | ||
| "Object;LDummyGeneric;)V"; | ||
| const std::string method_type = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pull the string up a line you can suffix it to both method name variants to reduce duplication/stop the lines wrapping making it harder to follow what is going on:
const std::string lambda_method_ref =
compiler == "eclipse"
? "StaticLambdas.lambda$1:" + method_type
: "StaticLambdas.lambda$static$1:" + method_type;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why change from descriptor to method_type, is descriptor not more precise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names for both variables come from the names of the fields of lambda_method_handlet which (mostly) come from the bytecode specification. The thing with using method_type in declaration of lambda_method_ref would only work if the lambda does not capture anything so it would only work for some of the examples in the tests. I would prefer to keep it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also the reason why I introduced the lambda_method_ref as a field, because it is unambiguous unlike method_type.
|
|
||
| #include <java_bytecode/java_bytecode_parse_tree.h> | ||
| #include <java_bytecode/java_types.h> | ||
| #include <testing-utils/run_test_with_compilers.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet one less thing for me to do for my PR 🎉
unit/testing-utils/require_type.cpp
Outdated
| return generic_base_type; | ||
| } | ||
|
|
||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing doc line
| #include <testing-utils/run_test_with_compilers.h> | ||
| #include <testing-utils/require_type.h> | ||
|
|
||
| SCENARIO( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cover checking that the method handle table ends up in the symbol table, it would be great to have tests that check the indexs for the instruction end up in the instruction, but maybe this is better covered by the next step when we've actually converted that instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it will be easier in the next step. If you would prefer to have tests at this stage, let me know.
|
|
||
| void add_lambda_method_handle(const irep_idt &identifier) | ||
| { | ||
| lambda_method_handles().emplace_back(identifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be quite right, due to the mix an match of move semantics and const references. My understanding is that emplace_back will move its parameter into the vector, leaving the original copy it was passed in, in an invalid/undefined state. However the type of identifier is const reference, which means we are promising not to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emplace_back takes arguments to pass to the constructor of the vector type (in this case const irep_idt &identifier as an input to symbol_exprt constructor):
http://en.cppreference.com/w/cpp/container/vector/emplace_back
I don't think the identifier is left in an invalid/undefined state, maybe the original copy of symbol_exprt that is moved to the vector? Although the above source says that the new element is constructed 'in-place at the location provided by the container'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thank you for explaining how this works, I was wondering where the conversion from irep_idt to symbol_exprt was happening. I had previously been reading about && as part of move semantics. Why is the type of the identifier parameter not irep_idt &&identifier to match the type needed by emplace_back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emplace_back and std::move are different things, with similar aims. emplace_back just tries to avoid constructing a temporary object to pass as an argument, copy-constructing an object to go in the collection, and then destructing the original temporary. It can only be used to create a new object, but it does it directly in the place it's needed.
| /// unknown type | ||
| optionalt<symbolt> java_bytecode_convert_methodt::get_lambda_method_symbol( | ||
| const java_class_typet::java_lambda_method_handlest &lambda_method_handles, | ||
| const irep_idt &index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this rather be an integer (size_t)?
| // any symbol (it is a symbol expression with empty identifier) | ||
| if(!lambda_method_handle.get_identifier().empty()) | ||
| return symbol_table.lookup_ref(lambda_method_handle.get_identifier()); | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else could be omitted
| /// \param descriptor: the descriptor the lambda method should have | ||
| /// \param entry_index: the number to skip (i.e. if multiple entries with the | ||
| /// same descriptor | ||
| /// \param lambda_method_ref: the reference/descriptor of the lambda method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side comment: I think we need a coding guideline on whether there should be a colon after the param name or not (both versions co-exist in this PR). I see the advantage of using a colon to improve readability in the source code, however, doxygen lays it out as follows: http://cprover.diffblue.com/invariant_8h.html#a66872166e18e5b1c3a087f532078b803
Ie, it seems that the colon is part of the description, which may look a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I created a PR to propose this in our coding standard #1959
8ad281a to
fab5952
Compare
|
@thk123 @peterschrammel Thank you, the changes according to reviews are in the commits with 'cont' at the end and they will be squashed into the commits with the name before 'cont.' before merging. I rebased on top of develop so all commits are shown after the reviews though. |
2e250fe to
f7f7901
Compare
| @@ -0,0 +1,262 @@ | |||
| /*******************************************************************\ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file does not appear to be in the Makefiles
9144912 to
3d6854e
Compare
When parsing invokedynamic instructions in pool table, store the index of the corresponding lambda method handle (bootstrap method) in the bootstrap methods attribute table
The method ref can differ from the interface_type and method_type when, e.g., local variables are used in the lambda method.
To test for a new field lamba_method_ref for a lambda method handle
Keeping the data members const deletes the default copy assignment operator and causes the existing unit tests crash for Microsoft Visual Studio.
5e0f38a to
212da75
Compare
|
@thk123 @peterschrammel Can I please get a re-review? CI should be passing now. |
| REQUIRE(matching_lambda_entry != parsed_class.lambda_method_handle_map.end()); | ||
|
|
||
| return matching_lambda_entry->second; | ||
| INFO("Number of matching lambda method entries: " << matches.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this already be printed if the matches.size() != 1 anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true, but in a less readable way.
peterschrammel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
| /// \param lambda_method_handles Vector of lambda method handles (bootstrap | ||
| /// methods) of the class where the lambda is called | ||
| /// \param index Index of the lambda method handle in the vector | ||
| /// \return Symbol of the lambda method if the method handle does not have an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... does not have an unknown ..." maybe better "... have a known ..." ?
| /// unknown type | ||
| optionalt<symbolt> java_bytecode_convert_methodt::get_lambda_method_symbol( | ||
| const java_class_typet::java_lambda_method_handlest &lambda_method_handles, | ||
| const size_t &index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index should be passed by value
| // "new" when it is a class variable, instantiated in <init> | ||
| lambda_method_handle.lambda_method_name = | ||
| name_and_type.get_name(pool_entry_lambda); | ||
| lambda_method_handle.lambda_method_ref = method_ref; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a std::move
| } | ||
| void add_unknown_lambda_method_handle() | ||
| { | ||
| lambda_method_handles().emplace_back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a comment that explains why this works? I imagine it is because UNKNOWN is the default
|
Merging to avoid CI round trips, @svorenova has agreed to apply these non-blocking changes in a separate commit. |
This passes the parsed bootstrap methods information to a class symbol and subsequently to method conversion. As a result, when
invokedynamicinstruction is to be converted, we have all necessary information to look up the corresponding lambda method.