-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Rewrite unions such that all members are of the same size #5738
Conversation
zero_initializer(component.type(), value.source_location(), *this); | ||
if(!zero.has_value()) | ||
// 6.2.6.1(7)). In practice, objects of static lifetime are fully zero | ||
// initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part one we could make configurable to cover all behaviours permitted by the C standard.
// a.c=e | ||
// into | ||
// a'== { .c=e } | ||
// as the front-end guarantees that .c will have the full size of the union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part two to make configurable: symex_assignt::assign_struct_member
will have generated a with_exprt
that selectively updates a single member (of what now is a struct with padding), keeping the rest intact. We could instead populate the rest of the struct with nondets.
60b3d32
to
d049450
Compare
Codecov Report
@@ Coverage Diff @@
## develop #5738 +/- ##
===========================================
+ Coverage 69.72% 69.73% +0.01%
===========================================
Files 1242 1242
Lines 100897 100988 +91
===========================================
+ Hits 70347 70426 +79
- Misses 30550 30562 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I haven't had a chance to look in detail yet but in principle this sounds like a good idea as it should simplify some of the handling of padding bits and how they are modified or not. I think this may need a major version number change though. Do you have a particular use-case in mind for this or is it a general "I think this would be simpler"? |
There probably isn't a need for doing in-depth looks, I'm happy to discuss the high-level idea first. What I have done is bumping the goto-program version, because previously compiled programs wouldn't have seen the changes now done in the front-end. Why am I proposing to make this change? |
d049450
to
60c118a
Compare
Fair point; I guess one question is how common are the cases where this actually helps. It seems there are two (kinds?) of use cases for unions, one where you actually want type-unsafe access because you want to manipulate the representation of some data in a way that is not normally possible and one where you have a some kind of enum or discriminant that fixes which view of the union you will take. It seems that only the later would benefit from this. I presume during symex you are tracking which 'view' of the union is being used and then only using |
1abfc62
to
ddbfd00
Compare
cbaa995
to
7b03770
Compare
7b03770
to
4956016
Compare
[...]
symex treats the entire union as one object, but |
In the C front-end, rewrite union components with types smaller than the union's size to anonymous structs. Each such struct contains the original union component plus padding. Assignments to union members thus always assign all bytes that make up the object representation of a union. The use of an anonymous struct ensures that member accesses can still be resolved. As this is a change in the semantics of goto programs, the goto binary version is incremented. rewrite_union is no longer necessary, but bugs (hidden by rewrite_union) in handling endianness in simplify_expr_member and convert_member surfaced and had to be fixed.
4956016
to
5d95919
Compare
Closing in favour of #7230. |
In the C front-end, rewrite union components with types smaller than the
union's size to anonymous structs. Each such struct contains the
original union component plus padding. Assignments to union members thus
always assign all bytes that make up the object representation of a
union. The use of an anonymous struct ensures that member accesses can
still be resolved.
As this is a change in the semantics of goto programs, the goto binary
version is incremented. rewrite_union is no longer necessary, but bugs
(hidden by rewrite_union) in handling endianness in simplify_expr_member
and convert_member surfaced and had to be fixed.
Although this is stable and works well in all the testing I have done over the last weeks [edit: in a branch that has other bugfixes, most of which can be found in pull requests already], I am marking this [RFC] as this is a substantial change that could do with input especially from @kroening @martin-cs. A very natural further step would be making configurable how we handle the behaviour left unspecified by the C standard (cf. #5704 and #5705). Performance appears to be ~10% better on some large benchmarks, but I cannot currently claim having done a precise performance evaluation.