Skip to content

Conversation

@TGWDB
Copy link
Contributor

@TGWDB TGWDB commented Jul 26, 2021

This adds conversion for the rol and ror operators from CBMC
internals into C code. This is done by bit-twiddling the values
to achieve the outcome.

Fixes #6241

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@TGWDB TGWDB added bugfix aws Bugs or features of importance to AWS CBMC users labels Jul 26, 2021
@TGWDB TGWDB force-pushed the Output_c_rol_ror branch from fbabfd2 to 313ec2d Compare July 26, 2021 10:53
@codecov
Copy link

codecov bot commented Jul 26, 2021

Codecov Report

Merging #6253 (4763289) into develop (3a2de48) will increase coverage by 0.01%.
The diff coverage is 96.22%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #6253      +/-   ##
===========================================
+ Coverage    76.16%   76.18%   +0.01%     
===========================================
  Files         1484     1485       +1     
  Lines       162164   162217      +53     
===========================================
+ Hits        123516   123579      +63     
+ Misses       38648    38638      -10     
Impacted Files Coverage Δ
src/ansi-c/expr2c_class.h 100.00% <ø> (ø)
src/ansi-c/expr2c.cpp 65.32% <92.00%> (+0.37%) ⬆️
unit/ansi-c/expr2c.cpp 100.00% <100.00%> (ø)
.../goto-instrument/goto_instrument_parse_options.cpp 68.25% <0.00%> (+0.23%) ⬆️
src/util/config.cpp 56.29% <0.00%> (+0.86%) ⬆️
src/util/message.cpp 83.92% <0.00%> (+5.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 932d41d...4763289. Read the comment docs.


else if(src.id() == ID_rol)
{
// AAAA rol n == (AAAA << n) | (AAAA >> (width(AAAA) - n))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if n is greater than bit width?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note the cases when shift can give undefined behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done with modulo now.

@TGWDB TGWDB force-pushed the Output_c_rol_ror branch 5 times, most recently from 51f1e45 to 8d70761 Compare July 27, 2021 11:00
@TGWDB TGWDB marked this pull request as ready for review July 27, 2021 11:01
Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good that we have a fix for the issue and that we have both regression and unit tests. However I think there are some issues which should be addressed for the sake of maintainability. The issues which I think really need to be addressed before we merge are marked with 🚫 Hopefully having the tests should make it easy to check the refactorings are sound as you address the issues.

}
else if(src.id() == ID_ror)
{
// AAAA rol n == (AAAA >> n % width(AAAA))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 This duplicated code will make maintenance more difficult. I suggest deduplicating it by parameterising the new function I suggested in my previous comment.
💡 I suggest creating a single use left_or_rightt enum class with left and right values to use as the type of the left_or_right parameter of the new function. This could alternatively be done without using a new enum with something like bool is_right, with true = right and false = left. But I think the enum would make for more self explanatory code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now consider this to be sorted.

{
// Shifts in C are arithmetic and care about sign, by forcing
// a cast to unsigned we force the shifts to perform rol/r
size_t ws = op0.type().get_size_t("width");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ This and all other variables in this new code which are not mutated should be declared const.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 Use bitvector_typet::get_width() instead of get_size_t("width"). By using get_size_t you are directly accessing the underlying data structure of the type. Fixing the accessibility to underlying irept would be a non-trivial refactor, but we should still use the well defined external interface where possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ What is ws supposed to be short for? Reminder - the coding standards state that we should "Avoid abbreviations".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be sorted, with the exception of the unreachable code.

{
// AAAA rol n == (AAAA << n % width(AAAA))
// | (AAAA >> (width(AAAA) - n % width(AAAA)))
auto bin_expr = to_binary_expr(src);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I suggest renaming bin_expr to rotate_expr, because at this point bin_expr can only be a rotate expression, even if the type only specifies that it is a binary_exprt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I suggest swapping out to_binary_expr(src) with expr_checked_cast<shift_exprt>(src). That way we are using a more specific type of expression and we are using the checks which are defined alongside that expression.

}
// Get n because we need its type to construct the "width(AAAA)"
// value to use in operations below.
exprt &op1 = bin_expr.op1();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I suggest renaming op1 to distance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ Use .distance() of shift_exprt instead of .op1(), because this is the specifically named getter for this data member.

auto bin_expr = to_binary_expr(src);
// Get AAAA and if it's signed wrap it in a cast to remove
// the sign since this messes with C shifts
exprt &op0 = bin_expr.op0();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ Use .op() of shift_exprt rather than op0(), because this is the specifically named getter for this data member.

// value to use in operations below.
exprt &op1 = bin_expr.op1();
// Construct the "width(AAAA)" constant
irep_idt width_val = op0.type().get("width");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ Duplicated extraction of getting the width from op0. It should be possible to move the previous getting of the width out of the if statement and reuse it. We can expect the type to be a bitvector_typet whether or not it is signed. So extracting the width can be - const size_t width = type_checked_cast<bitvector_typet>(op0.type()).get_width();

auto width = constant_exprt(width_val, op1.type());
// Apply modulo to n since shifting will overflow
// That is: 0001 << 4 == 0, but 0001 rol 4 == 0001
op1 = mod_exprt(op1, width);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I suggest creating a new variable named distance_modulo_width instead, in order to better specify what this is and in order to avoid mutating.

TGWDB added 5 commits July 27, 2021 21:20
This adds conversion for the rol and ror operators from CBMC
internals into C code. This is done by bit-twiddling the values
to achieve the outcome.

Fixes diffblue#6241
This adds regression tests for rol/ror using symtba2gb and
goto-instrument.
Adds unit test for expr2c of rol and ror operators, testing both
signed and unisgned conversions.
Tidies up the style and addresses reviewer comments:
- use to_shift_exprt before call to convert_rox
- remove "kind" path to try and guess bit width
@TGWDB TGWDB force-pushed the Output_c_rol_ror branch from 8d70761 to 1e79dd9 Compare July 28, 2021 09:35
Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks OK, though I think @thomasspriggs has some blocking review comments that it would be good to resolve. Also, how cross-platform/cross-architecture/cross endian is this work? In particular, are we confident this works on, say ARM architecture where its switchable big/little endian? I think cbmc has --little-endian and --big-endian options.

mod_exprt(rotate_expr.distance(), width_expr);
// Now put the pieces together
// width(AAAA) - (n % width(AAAA))
const auto complement_expr = minus_exprt(width_expr, distance_modulo_width);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually the complement expression, is the complement width expression ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


TEST_CASE("rol_2_c_conversion_unsigned", "[core][ansi-c][expr2c]")
{
auto lhs = from_integer(31, unsignedbv_typet(32));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference here :-) But for tests used for bit-twiddling operations, I'd be rather tempted to use hex representations of the various immediate constants in tests like this. Might just be my background, but I find that easier to mentally model, rather than also adding a decimal->hex->binary mental translation :-)

"[core][ansi-c][expr2c][convert_with_precedence]")
{
config.ansi_c.mode = configt::ansi_ct::flavourt::GCC;
config.ansi_c.set_arch_spec_i386();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a comment explaining why this is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@TGWDB TGWDB force-pushed the Output_c_rol_ror branch 2 times, most recently from 21ef742 to 2e1b6ac Compare August 2, 2021 08:46
@TGWDB TGWDB requested a review from peterschrammel as a code owner August 2, 2021 08:46
@TGWDB TGWDB force-pushed the Output_c_rol_ror branch 2 times, most recently from 227e044 to cca8198 Compare August 2, 2021 10:37
// | (AAAA rhs_op (width(AAAA) - n % width(AAAA)))
// Where lhs_op and rhs_op depend on whether it is rol or ror
// auto rotate_expr = to_binary_expr(src);
auto rotate_expr = expr_checked_cast<shift_exprt>(src);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In keeping with other functions I'd prefer for convert_rox to take a shift_exprt as first argument and to_shift_expr being used at the call site.

Comment on lines 2319 to 2321
// Type that can't be represented as bitvector
// get some kind of width in a potentially unsafe way
type_width = op0.type().get_size_t("width");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just give up in that else case. You might well get back a nil irep, which will make the from_integer fail. You're trying to be very kind, but I don't think you're doing the user any favour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'd just put an UNREACHABLE here. I think the comments from codecov suggest that it is in fact unreachable with the way it is currently used.

Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My blocking comments have been addressed. Therefore I am going to approve this PR.

// AAAA << complement_expr
rhs_expr = shl_exprt(op0, complement_expr);
// AAAA << complement_width_expr
rhs_expr = shl_exprt(op0, complement_width_expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ These changes appear to have been squashed into the wrong commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@TGWDB TGWDB force-pushed the Output_c_rol_ror branch from cca8198 to f85b0ec Compare August 3, 2021 08:37
@TGWDB TGWDB mentioned this pull request Aug 3, 2021
7 tasks
@TGWDB TGWDB merged commit b249ac4 into diffblue:develop Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aws Bugs or features of importance to AWS CBMC users bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

goto-instrument --dump-c does not propery generate for rol or ror operators.

5 participants