Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Rewrite unions such that all members are of the same size #5738

Closed
wants to merge 1 commit into from

Conversation

tautschnig
Copy link
Collaborator

@tautschnig tautschnig commented Jan 12, 2021

In the C front-end, rewrite union components with types smaller than the
union's size to anonymous structs. Each such struct contains the
original union component plus padding. Assignments to union members thus
always assign all bytes that make up the object representation of a
union. The use of an anonymous struct ensures that member accesses can
still be resolved.

As this is a change in the semantics of goto programs, the goto binary
version is incremented. rewrite_union is no longer necessary, but bugs
(hidden by rewrite_union) in handling endianness in simplify_expr_member
and convert_member surfaced and had to be fixed.

Although this is stable and works well in all the testing I have done over the last weeks [edit: in a branch that has other bugfixes, most of which can be found in pull requests already], I am marking this [RFC] as this is a substantial change that could do with input especially from @kroening @martin-cs. A very natural further step would be making configurable how we handle the behaviour left unspecified by the C standard (cf. #5704 and #5705). Performance appears to be ~10% better on some large benchmarks, but I cannot currently claim having done a precise performance evaluation.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

zero_initializer(component.type(), value.source_location(), *this);
if(!zero.has_value())
// 6.2.6.1(7)). In practice, objects of static lifetime are fully zero
// initialized.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part one we could make configurable to cover all behaviours permitted by the C standard.

// a.c=e
// into
// a'== { .c=e }
// as the front-end guarantees that .c will have the full size of the union
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part two to make configurable: symex_assignt::assign_struct_member will have generated a with_exprt that selectively updates a single member (of what now is a struct with padding), keeping the rest intact. We could instead populate the rest of the struct with nondets.

@codecov
Copy link

codecov bot commented Jan 12, 2021

Codecov Report

Merging #5738 (5d95919) into develop (0dfbd1e) will increase coverage by 0.01%.
The diff coverage is 93.10%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5738      +/-   ##
===========================================
+ Coverage    69.72%   69.73%   +0.01%     
===========================================
  Files         1242     1242              
  Lines       100897   100988      +91     
===========================================
+ Hits         70347    70426      +79     
- Misses       30550    30562      +12     
Flag Coverage Δ
cproversmt2 43.45% <68.96%> (+0.07%) ⬆️
regression 66.69% <93.10%> (+<0.01%) ⬆️
unit 32.25% <1.72%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jbmc/src/jdiff/jdiff_parse_options.cpp 68.25% <ø> (-0.25%) ⬇️
src/cbmc/cbmc_parse_options.cpp 77.62% <ø> (-0.05%) ⬇️
src/goto-diff/goto_diff_parse_options.cpp 65.97% <ø> (-0.24%) ⬇️
src/goto-symex/symex_assign.h 100.00% <ø> (ø)
src/solvers/flattening/bv_endianness_map.h 100.00% <ø> (ø)
src/ansi-c/c_typecheck_initializer.cpp 76.12% <63.63%> (-0.29%) ⬇️
src/ansi-c/c_typecheck_expr.cpp 75.54% <83.33%> (+0.06%) ⬆️
src/ansi-c/c_typecheck_type.cpp 78.78% <96.61%> (+1.69%) ⬆️
src/goto-symex/symex_assign.cpp 84.25% <100.00%> (+2.52%) ⬆️
src/pointer-analysis/value_set.cpp 77.98% <100.00%> (-0.41%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dfbd1e...5d95919. Read the comment docs.

@martin-cs
Copy link
Collaborator

I haven't had a chance to look in detail yet but in principle this sounds like a good idea as it should simplify some of the handling of padding bits and how they are modified or not. I think this may need a major version number change though. Do you have a particular use-case in mind for this or is it a general "I think this would be simpler"?

@tautschnig
Copy link
Collaborator Author

I haven't had a chance to look in detail yet but in principle this sounds like a good idea as it should simplify some of the handling of padding bits and how they are modified or not. I think this may need a major version number change though. Do you have a particular use-case in mind for this or is it a general "I think this would be simpler"?

There probably isn't a need for doing in-depth looks, I'm happy to discuss the high-level idea first. What I have done is bumping the goto-program version, because previously compiled programs wouldn't have seen the changes now done in the front-end.

Why am I proposing to make this change? union expressions are a major source of byte_extract expressions being introduced, which are very general and ensure we get the semantics right, but often block constant propagation. It is my understanding that verification tasks involving unions are deemed performance bottlenecks for this very reason. I had also thought we could thereby get a grip on unsoundness when involving pointers, but that turned out to be much simpler (cf. #5713).

@martin-cs
Copy link
Collaborator

Fair point; byte_extract and the like are cases where the memory typing model breaks and for performance for anything it's best if we avoid them. I can see how it could help during symex.

I guess one question is how common are the cases where this actually helps. It seems there are two (kinds?) of use cases for unions, one where you actually want type-unsafe access because you want to manipulate the representation of some data in a way that is not normally possible and one where you have a some kind of enum or discriminant that fixes which view of the union you will take. It seems that only the later would benefit from this.

I presume during symex you are tracking which 'view' of the union is being used and then only using byte_extract when changing view? If so then unions should have the same 'cost' as a struct.

@tautschnig
Copy link
Collaborator Author

[...]

I presume during symex you are tracking which 'view' of the union is being used and then only using byte_extract when changing view? If so then unions should have the same 'cost' as a struct.

symex treats the entire union as one object, but WITH expressions will effectively yield that concept of a "view."

In the C front-end, rewrite union components with types smaller than the
union's size to anonymous structs. Each such struct contains the
original union component plus padding. Assignments to union members thus
always assign all bytes that make up the object representation of a
union. The use of an anonymous struct ensures that member accesses can
still be resolved.

As this is a change in the semantics of goto programs, the goto binary
version is incremented. rewrite_union is no longer necessary, but bugs
(hidden by rewrite_union) in handling endianness in simplify_expr_member
and convert_member surfaced and had to be fixed.
@tautschnig
Copy link
Collaborator Author

Closing in favour of #7230.

@tautschnig tautschnig closed this Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants