-
Notifications
You must be signed in to change notification settings - Fork 285
Improvement in check_axioms method of the string solver #1241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement in check_axioms method of the string solver #1241
Conversation
dbd7bad to
f5a69a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're using std::size_t at the callsite, might as well do the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the line below I guess this must be *(j-1) not (*j)-1. Add parens to indicate which is intended, and note that this is a bad deref on the first iteration if it's *(j-1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an invariant to check the concrete_array size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this, vector looks too much like the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, this will turn e.g. { 1, x, 2, x, 3 } into { 1, 2, 2, 3, 3 } ? If so then that's not really padding (padding would make it longer). How about fill_in_vector?
Also is it right that the vector continues to have the extra default entry in the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename this after what it does, e.g. rewrite_array_expressions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about concretize_array_expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp literally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give a concrete example of the sort of change this produces (e.g. array_of(0) with 1 := 2 -> { 0, 2, 0} ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the end-of-line comment? Please elaborate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least comment on the kind of difference that can currently occur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the statement was wrong, it should have been index.type()==with_expr.where().type. I fixed this and now it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not constraint -> no constraint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces n · log(n) complexity but it should not be a problem if the strings are limited in length. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I implemented a new version that works with map and should not have this problem. I changed the code to use the new version. Only one usage remains in concretize_strings but we should normally not have to use concretize_strings since it's not called when the model checked is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interprete_solver_result -> interpret_solver_result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to concretize_array_expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not necessary to declare a new variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LAJW Does this loop look alright? Or is there a better way of calling the function on the operands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you document what we do in this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!error is not necessary here because of the precondition above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+2 is not in keeping with the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using long here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might produce a huge nesting of expressions. See if we can replace it with a fixed-size array in the case where we have a non-empty index set (which should include the last element).
allredj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove added line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else case is not clear to me. Can put asserts on the kind of object we get from the solver here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to break after the first (.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also say here that we return a with expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now returns a array expression instead of with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the returned array is a with_exprt, it should be stated as such here, instead of something of the form { 24, 24, 24, 42, 42 } which implies the function returns an ID_array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should indeed be an ID_array expression and with_exprt is expected at the input so I think the documentation correctly reflects the intention of the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. Make it clear that the nature of the array doesn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does change now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more assertions on the nature of the input array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the input expression should be any expression, we will just replace inside it the arrays that needs to be.
I can update the documentation to make that clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh OK, then the function could maybe be renamed. It looks like it takes array expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I suggest concretize_arrays_in_expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have unit tests for that function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are we adding one? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added one for concretize_string which should cover this function https://github.com/diffblue/cbmc/pull/1241/files#diff-da82b969a8fb1d04473386fa029dcc4fR17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argument should be const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a unit test would be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added unit test for concretize_array_expression with should cover this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftmost index_to pad -> leftmost_index_to_pad or leftmost index to pad
232de69 to
d44a061
Compare
d44a061 to
884c1ae
Compare
smowton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are still problems with the pad functions
| { | ||
| exprt index=arr_val.operands()[i*2]; | ||
| unsigned idx; | ||
| if(!to_unsigned_integer(to_constant_expr(index), idx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Braces around multi-line if
| const with_exprt with_expr=to_with_expr(it); | ||
| const exprt &then_expr=with_expr.new_value(); | ||
| mp_integer index; | ||
| PRECONDITION(to_with_expr(it).where().id()==ID_constant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant (to_constant_expr already asserts this)
| /// \param expr: expression to interpret | ||
| /// \param string_max_length: maximum size of arrays to consider | ||
| /// \return the interpreted expression | ||
| exprt concretize_arrays_in_expression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is probably a good place to use @LAJW's new tree-walking copy-minimising expression iterator: https://github.com/diffblue/cbmc/blob/test-gen-support/src/util/expr_iterator.h
Basically it walks over the tree using const iterators, then when you find an ID_index node with an ID_with child you call .mutate() to get a writable iterator for that particular subexpression, cf. using non-const operands() below which will break all sharing even when there are no changes necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done that, it required implementing a new operation in expression iterator, because we want to skip to the next sibling once we changed an expression as it does not make sense to visit children of the old expression.
We also added unit tests for this next_sibling_or_parent operation.
| T1 last_concretized) const | ||
| /// \param initialized: set containing the indices of already concrete values | ||
| template <typename T> | ||
| void fill_in_vector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of repeated .find calls here, use an explicit iterator and *std::prev(it) to get the previous initialized index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed that now, I was thinking the set was unordered but it is in fact
| const std::size_t leftmost_index_to_pad= | ||
| pair!=initial_map.rend()?pair->first:0; | ||
| // pad down to the leftmost index to pad | ||
| for(std::size_t j=i; j+1!=leftmost_index_to_pad; j--) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? This pads down to leftmost_index_to_pad-1, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If "down to" is inclusive then it should be correct as the last where the inequality holds is leftmost_index_to_pad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes sorry. Nontheless is inclusive correct? This means if we have adjacent map entries (e.g. 0 -> 10, 1 -> 20, 2 -> 30) then we'll write 30 -> {1, 2} then 20 -> {0, 1} then 10 -> {0}, which gives the correct result but wastefully, and in a way that seems error-prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right the value of left_most_index_to_pad should be the next index +1, I've corrected that
| /// \return the interpreted expression | ||
| exprt concretize_arrays_in_expression(exprt expr, std::size_t string_max_length) | ||
| { | ||
| for(auto op=expr.depth_begin(); op!=expr.depth_end(); ++op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++op must only happen if next_sibling_or_child was not called. Remove from here and below add else ++op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
smowton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one comment to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment something like Conditon would be j>=leftmost_index_to_pad, but it's an unsigned type and leftmost_index_to_pad might be 0.
062f445 to
6b4675b
Compare
|
@LAJW can you check my changes to |
6b4675b to
e58d4da
Compare
src/util/expr_iterator.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decent.
Replace (*this)++ with ++(*this) to avoid unnecessary copies.
I knew I should ignore the standard and just forbid the post-increment operator.
d7f4fff to
0161397
Compare
bc21958 to
a697c62
Compare
allredj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good! Just a couple of suggestions.
| if(arr_val.id()=="array-list") | ||
| { | ||
| std::set<unsigned> initialized; | ||
| std::map<std::size_t, exprt> initial_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add: DATA_INVARIANT(arr_val.operands().size%2==0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are we adding one? :-)
| if(op->id()==ID_with && op->type().id()==ID_array) | ||
| { | ||
| op.mutate()=fill_in_array_with_expr(*op, string_max_length); | ||
| op.next_sibling_or_parent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was quite suspicious about that thing but I now believe it's OK on the basis that we can safely interrupt the depth-first search when we find an array, as we should not find nested arrays after the solver run (even though we don't really check that).
| std::vector<T1> &concrete_array, | ||
| std::set<T2> &initialized, | ||
| T1 last_concretized) const | ||
| /// \param initialized: set containing the indices of already concrete values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just be modify:
/// \param initialized: set containing the indices of concrete_array at which we already have a concrete value
This makes it clear that concrete_array[*it] (below) should indeed be initialised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think pad_array is no longer used so it can be removed. The function fill_in_map_as_array should be used instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be concretize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should document what this function does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already documented the template itself - T expr_cast template (just above that snippet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should make this change
bb0f902 to
b4fd43f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go into util/expr_utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, but I'm unsure whether that should go with this PR or with another (we've got refactor incoming, I have to stop adding changes at some point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use DATA_INVARIANT... (but see below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should throw. INVARIANT can be put in the catch section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a workaround to hide a bug in the solver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly how it worked in the original - failure was used as a filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot prevent (for now) the underlying solver to give value to arrays at indexes that have no meaning for us (<0 and >max_string_length). These values should be ignored but there is no bug here.
The clean solution would be to convert the expression to a mp_integer and then ignore it if it is negative or too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would it take to implement that clean solution you're talking about? Could that be put in a refactoring task? Or can it perhaps be part of this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to cherry pick some commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this is a genuine check (that cannot be avoided by fixing an alleged bug in the solver -- see below) I'd rather refactor this function to check_index (or similar) and make it return a boolean so that exceptions are not used as part of normal operation. Exceptions should be used for exceptional behaviour only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got two problems: concealing errors because they're returned as error codes and lack of unified way of handling those. Returning error code without any guards is extremely error prone - it's very easy not to check for the error (there is a solution for that, but that should be another task). Also it's not always obvious whether false means success or true. I'm just trying to be consistent with std::dynamic_cast<ref> and boost::lexical_cast - they do throw on failure.
In this case throwing terminates the application, as exception is not handled. That call could be wrapped as invariant but overall result is the same. What I'd like to avoid is having 10 different versions of the "cast exprt to int" each forgetting to check different invariants and having its own set of bugs and each taking 5-10 lines of code and polluting the codebase.
I'd like to swap those true/false returns and exceptions with boost/std::optional and/or expected, but I'd prefer not to include them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment that ++op; is intentionally omitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break after = and put the whole thing on the next line
|
@LAJW, the cosmetic changes do not touch any code created outside this PR. Please squash these changes into the respective commits in this PR that introduced the code that is being fixed. |
7610c58 to
1576d2a
Compare
f4cd015 to
90e7dbb
Compare
Cbmc can potentialy run out of memory if no maximum string length is set. This happens more often with the new version of check axioms because a concretization step is made to be more precise in the check.
This allows to skip childrens of a node in iteration. Useful when substituting an expression with another on which we do not want to iterate.
This function takes a map as argument which is more appropriate for the operation to perform. This also refactors `concretize_string` and `get_array` to use fill_in_map_as_vector
This interpret the result by propagating values that are set, to the left.
This interpret them in a way that makes sense to the string solver, by propagating values to the left.
This makes the string solver more likely to come up with a solution to the formulas that are passed to it.
90e7dbb to
8f51b01
Compare
This improves the method
check_axiomsof the string solver by padding the result gotten from the solver in a similar way to what is used in theconcretizemethod.In practice this means that the solver can find more quickly strings that are correct solutions once concretized.