Skip to content

Conversation

@majakusber
Copy link

This passes the parsed bootstrap methods information to a class symbol and subsequently to method conversion. As a result, when invokedynamic instruction is to be converted, we have all necessary information to look up the corresponding lambda method.

@majakusber
Copy link
Author

Unit tests will follow in the next few days.


const java_class_typet::java_lambda_method_handlet &lambda_method_handle =
lambda_method_handles.at(
std::stoi(id2string(code_type.get(ID_java_lambda_method_handle_id))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index rather than id since I think that more strongly suggests it is a zero indexed counter rather than some unique identifier.

return set(ID_access, access);
}

class java_lambda_method_handlet : public exprt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting my reservations about inheriting from exprt here (my preference would be irept since it isn't a node in an AST but we still need irep functionality to store it in a class_typet). However interested in seeing other perspectives on this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an offline discussion, a reasonable solution seems to be holding the method handle as a symbol_exprt (which only has an identifier of the referred method).

// now do lambda method handles (bootstrap methods)
for(const auto &lambda_entry : c.lambda_method_handle_map)
{
const reference_typet lambda_ref = java_reference_type(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think about it a bit more, seems odd that this is a reference_typet, why not just directly store either:

  • symbol_typet
  • symbol_exprt of type the type of the method (e.g. a code_typet)
  • just an irep_idt as the identifier

Again interested in other opinions on this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I opted for a reference here is that method handles are defined as references to methods: https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandle.html
I'm happy to change it to any of the suggested types if that would make it semantically clearer or more fitting to the CBMC logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I think you're right that my suggestions aren't quite right.

However, I think what you have might be a bit of a mismatch. It seems like the type of the method handle is a java_reference_type(code_typet{params and returns}) (e.g. kind of like Predicate<Integer> and the value is a address_of_exprt(symbol_exprt{method_name})) (e.g. Integer::equals).

I don't think it will cause problems, but for example I would expect if I look up a symbol_typet in the symbol table, then what I'd get back would have is_type set to true, wheras in this case it will be is_function() that returns true.

Not sure - thinking in comments - do you have any thoughts on this @mgudemann ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an offline discussion with @thk123 the type of a method handle is now symbol_exprt. Any further thoughts on this choice would be appreciated.

const java_class_typet::java_lambda_method_handlet &lambda_method_handle =
lambda_method_handles.at(
std::stoi(id2string(code_type.get(ID_java_lambda_method_handle_id))));
const symbolt &lambda_method_symbol = symbol_table.lookup_ref
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can build directly on this - could this be moved into a method on/taking lambda_method_handles that gives me java_lambda_method_handlet and lambda_method_symbol for a given index (so for example I'm protected from changes in exactly how this works: type().subtype().get(ID_identifier))

@majakusber
Copy link
Author

This is now ready for (re)review.

@majakusber
Copy link
Author

@thk123 @thomasspriggs @smowton Can I please get two reviews for this?

Copy link
Contributor

@thk123 thk123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, some suggestions but I don't think anything blocking or major.

IREP_ID_TWO(generic_types, #generic_types)
IREP_ID_TWO(implicit_generic_types, #implicit_generic_types)
IREP_ID_TWO(type_variables, #type_variables)
IREP_ID_TWO(java_lambda_method_handle_index, #lambda_method_handle_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments by convention are called ID_C_..., though I wonder if this really should be a comment, if two invokedynamics refer to two different method handles they probably should be not-equal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index is stored in the invoke dynamic instruction together with it's type which is of the form Execute:(<arguments>)<lambda's_interface_name> which I think should be just as unique as the index. So there's probably no harm in putting the index in comment but I'm not entirely sure about it. What would be the benefit of having it as a comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't clear - it currently is a comment (since the name starts with a # ), I agree with you - it should not be a comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, will change it.


const optionalt<symbolt> &lambda_method_symbol = get_lambda_method_symbol(
lambda_method_handles,
code_type.get(ID_java_lambda_method_handle_index));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to use get_int here and have get_lambda_method_symbol take an int as this handles the conversion for you.

});

REQUIRE(num_matches == 1);
INFO("Entry found");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INFO is only printed when a require fails so I don't know if this is actually useful since it will only appear when the wrong entry is found? (e.g. count_if > == 1 but find_if == false)

const auto num_matches = std::count_if(
parsed_class.lambda_method_handle_map.begin(),
parsed_class.lambda_method_handle_map.end(),
[&method_type, &lambda_method_ref](const lambda_method_entryt &entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps store this predicate as a is_match or remove the second loop by using copy_if and checking there is exactly one (bit clunky I know)

? "StaticLambdas.lambda$1:(ILjava/lang/Object;LDummyGeneric;)V"
: "StaticLambdas.lambda$static$1:(ILjava/lang/"
"Object;LDummyGeneric;)V";
const std::string method_type =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pull the string up a line you can suffix it to both method name variants to reduce duplication/stop the lines wrapping making it harder to follow what is going on:

const std::string lambda_method_ref =
  compiler == "eclipse"
    ? "StaticLambdas.lambda$1:" + method_type
    : "StaticLambdas.lambda$static$1:" + method_type;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why change from descriptor to method_type, is descriptor not more precise?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names for both variables come from the names of the fields of lambda_method_handlet which (mostly) come from the bytecode specification. The thing with using method_type in declaration of lambda_method_ref would only work if the lambda does not capture anything so it would only work for some of the examples in the tests. I would prefer to keep it consistent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also the reason why I introduced the lambda_method_ref as a field, because it is unambiguous unlike method_type.


#include <java_bytecode/java_bytecode_parse_tree.h>
#include <java_bytecode/java_types.h>
#include <testing-utils/run_test_with_compilers.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet one less thing for me to do for my PR 🎉

return generic_base_type;
}

///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing doc line

#include <testing-utils/run_test_with_compilers.h>
#include <testing-utils/require_type.h>

SCENARIO(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cover checking that the method handle table ends up in the symbol table, it would be great to have tests that check the indexs for the instruction end up in the instruction, but maybe this is better covered by the next step when we've actually converted that instruction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it will be easier in the next step. If you would prefer to have tests at this stage, let me know.


void add_lambda_method_handle(const irep_idt &identifier)
{
lambda_method_handles().emplace_back(identifier);
Copy link
Contributor

@thomasspriggs thomasspriggs Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be quite right, due to the mix an match of move semantics and const references. My understanding is that emplace_back will move its parameter into the vector, leaving the original copy it was passed in, in an invalid/undefined state. However the type of identifier is const reference, which means we are promising not to update it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emplace_back takes arguments to pass to the constructor of the vector type (in this case const irep_idt &identifier as an input to symbol_exprt constructor):
http://en.cppreference.com/w/cpp/container/vector/emplace_back
I don't think the identifier is left in an invalid/undefined state, maybe the original copy of symbol_exprt that is moved to the vector? Although the above source says that the new element is constructed 'in-place at the location provided by the container'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you for explaining how this works, I was wondering where the conversion from irep_idt to symbol_exprt was happening. I had previously been reading about && as part of move semantics. Why is the type of the identifier parameter not irep_idt &&identifier to match the type needed by emplace_back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emplace_back and std::move are different things, with similar aims. emplace_back just tries to avoid constructing a temporary object to pass as an argument, copy-constructing an object to go in the collection, and then destructing the original temporary. It can only be used to create a new object, but it does it directly in the place it's needed.

/// unknown type
optionalt<symbolt> java_bytecode_convert_methodt::get_lambda_method_symbol(
const java_class_typet::java_lambda_method_handlest &lambda_method_handles,
const irep_idt &index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this rather be an integer (size_t)?

// any symbol (it is a symbol expression with empty identifier)
if(!lambda_method_handle.get_identifier().empty())
return symbol_table.lookup_ref(lambda_method_handle.get_identifier());
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else could be omitted

/// \param descriptor: the descriptor the lambda method should have
/// \param entry_index: the number to skip (i.e. if multiple entries with the
/// same descriptor
/// \param lambda_method_ref: the reference/descriptor of the lambda method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: I think we need a coding guideline on whether there should be a colon after the param name or not (both versions co-exist in this PR). I see the advantage of using a colon to improve readability in the source code, however, doxygen lays it out as follows: http://cprover.diffblue.com/invariant_8h.html#a66872166e18e5b1c3a087f532078b803
Ie, it seems that the colon is part of the description, which may look a bit weird.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I created a PR to propose this in our coding standard #1959

@majakusber majakusber force-pushed the lambda_tg2711 branch 2 times, most recently from 8ad281a to fab5952 Compare March 22, 2018 15:49
@majakusber
Copy link
Author

@thk123 @peterschrammel Thank you, the changes according to reviews are in the commits with 'cont' at the end and they will be squashed into the commits with the name before 'cont.' before merging. I rebased on top of develop so all commits are shown after the reviews though.

@majakusber majakusber force-pushed the lambda_tg2711 branch 2 times, most recently from 2e250fe to f7f7901 Compare March 23, 2018 16:20
@@ -0,0 +1,262 @@
/*******************************************************************\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file does not appear to be in the Makefiles

@majakusber majakusber force-pushed the lambda_tg2711 branch 2 times, most recently from 9144912 to 3d6854e Compare March 26, 2018 11:31
svorenova added 5 commits March 26, 2018 13:40
When parsing invokedynamic instructions in pool table, store the index of
the corresponding lambda method handle (bootstrap method) in the
bootstrap methods attribute table
The method ref can differ from the interface_type and method_type when,
e.g., local variables are used in the lambda method.
@majakusber
Copy link
Author

@thk123 @peterschrammel Can I please get a re-review? CI should be passing now.

REQUIRE(matching_lambda_entry != parsed_class.lambda_method_handle_map.end());

return matching_lambda_entry->second;
INFO("Number of matching lambda method entries: " << matches.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this already be printed if the matches.size() != 1 anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is true, but in a less readable way.

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

/// \param lambda_method_handles Vector of lambda method handles (bootstrap
/// methods) of the class where the lambda is called
/// \param index Index of the lambda method handle in the vector
/// \return Symbol of the lambda method if the method handle does not have an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... does not have an unknown ..." maybe better "... have a known ..." ?

/// unknown type
optionalt<symbolt> java_bytecode_convert_methodt::get_lambda_method_symbol(
const java_class_typet::java_lambda_method_handlest &lambda_method_handles,
const size_t &index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index should be passed by value

// "new" when it is a class variable, instantiated in <init>
lambda_method_handle.lambda_method_name =
name_and_type.get_name(pool_entry_lambda);
lambda_method_handle.lambda_method_ref = method_ref;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a std::move

}
void add_unknown_lambda_method_handle()
{
lambda_method_handles().emplace_back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add a comment that explains why this works? I imagine it is because UNKNOWN is the default

@thk123
Copy link
Contributor

thk123 commented Mar 27, 2018

Merging to avoid CI round trips, @svorenova has agreed to apply these non-blocking changes in a separate commit.

@thk123 thk123 merged commit 5cbb758 into diffblue:develop Mar 27, 2018
@majakusber majakusber deleted the lambda_tg2711 branch March 27, 2018 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants