Consider using a discriminating "tracking variable" or "discriminating function" for untagged unions in testing/debug builds. #1907
Replies: 2 comments 5 replies
-
(I'm converting this issue to a discussion because at present it doesn't seem like there's any action we can take to resolve it.) I was thinking along somewhat similar lines in #139, an early proposal for untagged unions. Under that proposal, you can't type-pun through a union, so there would always be a definite answer to "which union member (if any) is currently active?", although that answer isn't available to Carbon code. In principle a debug or sanitizer build could track that state and use it to diagnose errors. However, it's hard to see how to do that efficiently when a union member is accessed through a pointer, for roughly the same reasons that the "discriminating function" approach isn't a complete solution (see the "Safety" section of the proposal doc).
I think it will be simpler and safer to keep the language-internal discriminator completely separate from any discriminator mechanism that's accessible to Carbon code. See p0157 for details on how we plan to support sum types with user-defined layout and discrimination -- I think that should extend reasonably well to wrapping sum types defined in C++. |
Beta Was this translation helpful? Give feedback.
-
Just a couple of perspectives on tagged and untagged unions: We can view a tagged union as a "allocator-optimization" of a class hierarchy. The unpopulated union is the superclass, and the union-members are the attributes of subclasses. The tag is the class-type. For untagged union we can take a similar "allocator-optimization" perspective. We can often replace a record with a union-of-N-fields with N records. If we interpret this case with the same modelling mindset as when we are applying TypeScript types to JSON data then we can view this as safe reinterpret-casting a memory-pointer. If you add type-state analysis you get some interesting and more flexible alternative takes on records-with-unions. (The concept of unions might be considered somewhat superflous semantically, so it might be interesting to think about to what extent it has to be a type, and to what extent it can just be viewed as an "optimization problem". I sense that this is the angle the |
Beta Was this translation helpful? Give feedback.
-
One problem with untagged unions is correct triggering of destructors and preventing type-punning (writing as one type, reading as another). It would make testing and debugging easier if there was a mechanism tracking construction, access and destruction of a type in a union.
This "implicit" state might be tracked by a variable outside the program proper. A ghost variable is an "imagined" variable that is used for reasoning about a program (e.g. verification), not for actual computations of the program. The "tracking variable" is similar except it might have a physical representation, yet, it cannot affect the program and thus is only used for tracking the correctness of the program during testing or debugging. This tracking should not be part of the program, and is as such only a testing or debugging feature. We can think of it as a feature of the machine executing the program rather than as being part of the program.
How this "tracking variable" is realized (or if it is at all needed) should be up to the compiler, if it can find free space for it in a struct that cannot be overwritten, or if it can put it at an offset, then that is an option. Another option is to just use a hash-table.
A more flexible solution is to add a "discriminating function" that takes a context-parameter which indicates the type that is active in the union. Usually the context would be the record the union is embedded in, as this record often contains an enumeration value or some other state-carrier that indicates the type that is allowed to be used in the union.
Complication: Finding the indicator could also require a lookup in a container object or some other structure, which is more complicated in the case where you have no back-pointer. In that case it would require either passing a hidden parameter to the container or that the record containing the union has a back-pointer at a negative offset, or some other mechanism. Which could be tricky.
As such, a "discriminating function" cannot be a general solution, but it has more potential for becoming part of the computation than the "tracking variable" approach. In some cases it can be added to existing C++ codebases without changing the record or touching the existing code that use the record. Which could be a welcome feature for Carbon users who want to interact with C++ libraries without modifying them.
Beta Was this translation helpful? Give feedback.
All reactions