Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove canonical option bytes and split up option interpretation into…
… two phases (bufbuild#261) There were several Editions-specific issues with the "canonical bytes" feature, in cases where the custom option value being interpreted is in the same file that defines the custom option. In order to know how to encode "delimited" message fields, we have to have already interpreted the options for that field, but when they are defined in the same file, we may have not have done so yet. There was also a pre-existing issue that was related: knowing whether or not a repeated field is packed also relies on interpreting options. Finally, there was a pre-existing issue that was unrelated: message literals that used the expanded `Any` form were completely absent from the canonical bytes representation 🤦. In addition to the "canonical bytes" encoding issues described above, there was a related issue with the (non-canonical) encoding of `Any` messages in an option value: if an expanded `Any` literal refers to a packed field, it could previously have been encoded in non-packed form since we may not yet know if it's packed (because we're still interpreting options). This issue of not yet knowing how to encode or handle a type because we're still interpreting options also came up in bufbuild#260, where we needed to know if an enum was open or closed. But in an Editions file, this requires interpreting the enum's options. In that PR, we deferred the decision until the end. So we could accumulate the set of enums that needed to be open, and validate that they are open after all options have been interpreted. I looked into solving the other issues in the same way: defer checks until all options have been interpreted. But the real problem with this approach is that the text format (which is what is used in message literals) is influenced by whether a field is a group or not. So if we don't yet know if the field is a group (because, in an editions file, we have to interpret the `features.message_encoding` option first), we can't even process message literals. One possible solution I thought of would be to interpret options in dependency order (so go interpret options for the element if we need to). This would be a bit tricky to implement, and might incur a performance hit because we'd need more book-keeping to track what's been interpreted and what hasn't. But even this approach couldn't handle cases where this is a **dependency cycle**: where interpreting the options for an element depend on that element already having its options interpreted 🤯 (see comment below for such an example). This PR does two things. 1. The main solution to the "needing options in the same file to already be interpreted" problem is to interpret the **non**-custom options first. That way features and other relevant options get interpreted first, so we can then assume, when interpreting **custom** options, that we know enough to correctly interpret and serialize everything. This is also [the strategy used by `protoc`](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.cc#L5899-L5943) and is adequate for now (not necessarily bullet-proof, but will work as long as Google doesn't make certain kinds of changes to `descriptor.proto`). 2. The bigger change in here, that accounts for the vast majority of lines of code changed/deleted, is removal of the "canonical option bytes" feature. This was a feature of protocompile to generate the same output as `protoc`, byte-for-byte. Special handling was needed to do so because the protobuf-go library serializes the option messages differently than the way they are serialized by the C++ `libprotoc`. So we basically re-implemented serialization in the `option` sub-package with the `interpreted*` types. Since this was a source of bugs and Editions-related issues, as described in the first paragraph above, and the feature is basically unused (it was never actually incorporated into the buf CLI), it will be a quality of life improvement to not maintain this complicated code anymore. (cherry picked from commit 869ef58)
- Loading branch information