Add conversion for rol and ror operators to c output #6253

TGWDB · 2021-07-26T09:41:44Z

This adds conversion for the rol and ror operators from CBMC
internals into C code. This is done by bit-twiddling the values
to achieve the outcome.

Fixes #6241

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2021-07-26T10:59:44Z

Codecov Report

Merging #6253 (4763289) into develop (3a2de48) will increase coverage by 0.01%.
The diff coverage is 96.22%.

@@             Coverage Diff             @@
##           develop    #6253      +/-   ##
===========================================
+ Coverage    76.16%   76.18%   +0.01%     
===========================================
  Files         1484     1485       +1     
  Lines       162164   162217      +53     
===========================================
+ Hits        123516   123579      +63     
+ Misses       38648    38638      -10

Impacted Files	Coverage Δ
src/ansi-c/expr2c_class.h	`100.00% <ø> (ø)`
src/ansi-c/expr2c.cpp	`65.32% <92.00%> (+0.37%)`	⬆️
unit/ansi-c/expr2c.cpp	`100.00% <100.00%> (ø)`
.../goto-instrument/goto_instrument_parse_options.cpp	`68.25% <0.00%> (+0.23%)`	⬆️
src/util/config.cpp	`56.29% <0.00%> (+0.86%)`	⬆️
src/util/message.cpp	`83.92% <0.00%> (+5.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 932d41d...4763289. Read the comment docs.

martin-cs · 2021-07-26T21:28:14Z

src/ansi-c/expr2c.cpp


+  else if(src.id() == ID_rol)
+  {
+    // AAAA rol n == (AAAA << n) | (AAAA >> (width(AAAA) - n))


What if n is greater than bit width?

Also note the cases when shift can give undefined behaviour.

This is done with modulo now.

thomasspriggs

It is good that we have a fix for the issue and that we have both regression and unit tests. However I think there are some issues which should be addressed for the sake of maintainability. The issues which I think really need to be addressed before we merge are marked with 🚫 Hopefully having the tests should make it easy to check the refactorings are sound as you address the issues.

src/ansi-c/expr2c.cpp

thomasspriggs · 2021-07-27T15:22:21Z

src/ansi-c/expr2c.cpp

+  }
+  else if(src.id() == ID_ror)
+  {
+    // AAAA rol n ==   (AAAA >> n % width(AAAA))


🚫 This duplicated code will make maintenance more difficult. I suggest deduplicating it by parameterising the new function I suggested in my previous comment.
💡 I suggest creating a single use left_or_rightt enum class with left and right values to use as the type of the left_or_right parameter of the new function. This could alternatively be done without using a new enum with something like bool is_right, with true = right and false = left. But I think the enum would make for more self explanatory code.

I now consider this to be sorted.

thomasspriggs · 2021-07-27T15:26:33Z

src/ansi-c/expr2c.cpp

+    {
+      // Shifts in C are arithmetic and care about sign, by forcing
+      // a cast to unsigned we force the shifts to perform rol/r
+      size_t ws = op0.type().get_size_t("width");


⛏️ This and all other variables in this new code which are not mutated should be declared const.

🚫 Use bitvector_typet::get_width() instead of get_size_t("width"). By using get_size_t you are directly accessing the underlying data structure of the type. Fixing the accessibility to underlying irept would be a non-trivial refactor, but we should still use the well defined external interface where possible.

⛏️ What is ws supposed to be short for? Reminder - the coding standards state that we should "Avoid abbreviations".

This looks to be sorted, with the exception of the unreachable code.

thomasspriggs · 2021-07-27T15:28:43Z

src/ansi-c/expr2c.cpp

+  {
+    // AAAA rol n ==   (AAAA << n % width(AAAA))
+    //               | (AAAA >> (width(AAAA) - n % width(AAAA)))
+    auto bin_expr = to_binary_expr(src);


⛏️ I suggest renaming bin_expr to rotate_expr, because at this point bin_expr can only be a rotate expression, even if the type only specifies that it is a binary_exprt.

⛏️ I suggest swapping out to_binary_expr(src) with expr_checked_cast<shift_exprt>(src). That way we are using a more specific type of expression and we are using the checks which are defined alongside that expression.

thomasspriggs · 2021-07-27T15:33:24Z

src/ansi-c/expr2c.cpp

+    }
+    // Get n because we need its type to construct the "width(AAAA)"
+    // value to use in operations below.
+    exprt &op1 = bin_expr.op1();


⛏️ I suggest renaming op1 to distance.

⛏️ Use .distance() of shift_exprt instead of .op1(), because this is the specifically named getter for this data member.

thomasspriggs · 2021-07-27T15:37:17Z

src/ansi-c/expr2c.cpp

+    auto bin_expr = to_binary_expr(src);
+    // Get AAAA and if it's signed wrap it in a cast to remove
+    // the sign since this messes with C shifts
+    exprt &op0 = bin_expr.op0();


⛏️ Use .op() of shift_exprt rather than op0(), because this is the specifically named getter for this data member.

src/ansi-c/expr2c.cpp

thomasspriggs · 2021-07-27T16:01:03Z

src/ansi-c/expr2c.cpp

+    // value to use in operations below.
+    exprt &op1 = bin_expr.op1();
+    // Construct the "width(AAAA)" constant
+    irep_idt width_val = op0.type().get("width");


⛏️ Duplicated extraction of getting the width from op0. It should be possible to move the previous getting of the width out of the if statement and reuse it. We can expect the type to be a bitvector_typet whether or not it is signed. So extracting the width can be - const size_t width = type_checked_cast<bitvector_typet>(op0.type()).get_width();

thomasspriggs · 2021-07-27T16:10:30Z

src/ansi-c/expr2c.cpp

+    auto width = constant_exprt(width_val, op1.type());
+    // Apply modulo to n since shifting will overflow
+    // That is: 0001 << 4 == 0, but 0001 rol 4 == 0001
+    op1 = mod_exprt(op1, width);


⛏️ I suggest creating a new variable named distance_modulo_width instead, in order to better specify what this is and in order to avoid mutating.

This adds conversion for the rol and ror operators from CBMC internals into C code. This is done by bit-twiddling the values to achieve the outcome. Fixes diffblue#6241

This adds regression tests for rol/ror using symtba2gb and goto-instrument.

Adds unit test for expr2c of rol and ror operators, testing both signed and unisgned conversions.

Tidies up the style and addresses reviewer comments: - use to_shift_exprt before call to convert_rox - remove "kind" path to try and guess bit width

chrisr-diffblue

Mostly looks OK, though I think @thomasspriggs has some blocking review comments that it would be good to resolve. Also, how cross-platform/cross-architecture/cross endian is this work? In particular, are we confident this works on, say ARM architecture where its switchable big/little endian? I think cbmc has --little-endian and --big-endian options.

chrisr-diffblue · 2021-07-30T13:35:03Z

src/ansi-c/expr2c.cpp

+    mod_exprt(rotate_expr.distance(), width_expr);
+  // Now put the pieces together
+  // width(AAAA) - (n % width(AAAA))
+  const auto complement_expr = minus_exprt(width_expr, distance_modulo_width);


I don't think this is actually the complement expression, is the complement width expression ?

chrisr-diffblue · 2021-07-30T13:40:41Z

unit/ansi-c/expr2c.cpp

+
+TEST_CASE("rol_2_c_conversion_unsigned", "[core][ansi-c][expr2c]")
+{
+  auto lhs = from_integer(31, unsignedbv_typet(32));


Personal preference here :-) But for tests used for bit-twiddling operations, I'd be rather tempted to use hex representations of the various immediate constants in tests like this. Might just be my background, but I find that easier to mentally model, rather than also adding a decimal->hex->binary mental translation :-)

chrisr-diffblue · 2021-07-30T13:41:38Z

unit/ansi-c/expr2c.cpp

+  "[core][ansi-c][expr2c][convert_with_precedence]")
+{
+  config.ansi_c.mode = configt::ansi_ct::flavourt::GCC;
+  config.ansi_c.set_arch_spec_i386();


Might be worth a comment explaining why this is necessary?

tautschnig · 2021-08-02T15:00:49Z

src/ansi-c/expr2c.cpp

+  //               | (AAAA rhs_op (width(AAAA) - n % width(AAAA)))
+  // Where lhs_op and rhs_op depend on whether it is rol or ror
+  // auto rotate_expr = to_binary_expr(src);
+  auto rotate_expr = expr_checked_cast<shift_exprt>(src);


In keeping with other functions I'd prefer for convert_rox to take a shift_exprt as first argument and to_shift_expr being used at the call site.

tautschnig · 2021-08-02T15:04:48Z

src/ansi-c/expr2c.cpp

+    // Type that can't be represented as bitvector
+    // get some kind of width in a potentially unsafe way
+    type_width = op0.type().get_size_t("width");


I think you should just give up in that else case. You might well get back a nil irep, which will make the from_integer fail. You're trying to be very kind, but I don't think you're doing the user any favour.

Agreed. I'd just put an UNREACHABLE here. I think the comments from codecov suggest that it is in fact unreachable with the way it is currently used.

thomasspriggs

My blocking comments have been addressed. Therefore I am going to approve this PR.

thomasspriggs · 2021-08-02T19:24:35Z

src/ansi-c/expr2c.cpp

-    // AAAA << complement_expr
-    rhs_expr = shl_exprt(op0, complement_expr);
+    // AAAA << complement_width_expr
+    rhs_expr = shl_exprt(op0, complement_width_expr);


⛏️ These changes appear to have been squashed into the wrong commit.

TGWDB added bugfix aws Bugs or features of importance to AWS CBMC users labels Jul 26, 2021

TGWDB force-pushed the Output_c_rol_ror branch from fbabfd2 to 313ec2d Compare July 26, 2021 10:53

martin-cs reviewed Jul 26, 2021

View reviewed changes

TGWDB force-pushed the Output_c_rol_ror branch 5 times, most recently from 51f1e45 to 8d70761 Compare July 27, 2021 11:00

TGWDB marked this pull request as ready for review July 27, 2021 11:01

TGWDB requested review from chrisr-diffblue, kroening and tautschnig as code owners July 27, 2021 11:01

thomasspriggs suggested changes Jul 27, 2021

View reviewed changes

TGWDB added 5 commits July 27, 2021 21:20

Add conversion for rol and ror operators to c output

c750f5e

This adds conversion for the rol and ror operators from CBMC internals into C code. This is done by bit-twiddling the values to achieve the outcome. Fixes diffblue#6241

Add regression tests for rol/ror

9a26a63

This adds regression tests for rol/ror using symtba2gb and goto-instrument.

Add unit tests for expr2c for rol/ror

e690c5f

Adds unit test for expr2c of rol and ror operators, testing both signed and unisgned conversions.

Tidy up and address comments

f85b0ec

Tidies up the style and addresses reviewer comments: - use to_shift_exprt before call to convert_rox - remove "kind" path to try and guess bit width

Clang format

4763289

TGWDB force-pushed the Output_c_rol_ror branch from 8d70761 to 1e79dd9 Compare July 28, 2021 09:35

chrisr-diffblue approved these changes Jul 30, 2021

View reviewed changes

TGWDB force-pushed the Output_c_rol_ror branch 2 times, most recently from 21ef742 to 2e1b6ac Compare August 2, 2021 08:46

TGWDB requested a review from peterschrammel as a code owner August 2, 2021 08:46

TGWDB force-pushed the Output_c_rol_ror branch 2 times, most recently from 227e044 to cca8198 Compare August 2, 2021 10:37

tautschnig approved these changes Aug 2, 2021

View reviewed changes

tautschnig assigned TGWDB Aug 2, 2021

thomasspriggs approved these changes Aug 2, 2021

View reviewed changes

thomasspriggs reviewed Aug 2, 2021

View reviewed changes

TGWDB force-pushed the Output_c_rol_ror branch from cca8198 to f85b0ec Compare August 3, 2021 08:37

TGWDB mentioned this pull request Aug 3, 2021

Dump c var overlap #6257

Merged

7 tasks

TGWDB merged commit b249ac4 into diffblue:develop Aug 3, 2021

Add conversion for rol and ror operators to c output #6253

Add conversion for rol and ror operators to c output #6253

Uh oh!

Conversation

TGWDB commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasspriggs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrisr-diffblue left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasspriggs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

TGWDB commented Jul 26, 2021 •

edited

Loading

codecov bot commented Jul 26, 2021 •

edited

Loading