Skip to content

Commit

Permalink
Remove canonical option bytes and split up option interpretation into…
Browse files Browse the repository at this point in the history
… two phases (bufbuild#261)

There were several Editions-specific issues with the "canonical bytes"
feature, in cases where the custom option value being interpreted is in
the same file that defines the custom option. In order to know how to
encode "delimited" message fields, we have to have already interpreted
the options for that field, but when they are defined in the same file,
we may have not have done so yet. There was also a pre-existing issue
that was related: knowing whether or not a repeated field is packed also
relies on interpreting options. Finally, there was a pre-existing issue
that was unrelated: message literals that used the expanded `Any` form
were completely absent from the canonical bytes representation 🤦.

In addition to the "canonical bytes" encoding issues described above,
there was a related issue with the (non-canonical) encoding of `Any`
messages in an option value: if an expanded `Any` literal refers to a
packed field, it could previously have been encoded in non-packed form
since we may not yet know if it's packed (because we're still
interpreting options).

This issue of not yet knowing how to encode or handle a type because
we're still interpreting options also came up in bufbuild#260, where we needed
to know if an enum was open or closed. But in an Editions file, this
requires interpreting the enum's options. In that PR, we deferred the
decision until the end. So we could accumulate the set of enums that
needed to be open, and validate that they are open after all options
have been interpreted. I looked into solving the other issues in the
same way: defer checks until all options have been interpreted. But the
real problem with this approach is that the text format (which is what
is used in message literals) is influenced by whether a field is a group
or not. So if we don't yet know if the field is a group (because, in an
editions file, we have to interpret the `features.message_encoding`
option first), we can't even process message literals. One possible
solution I thought of would be to interpret options in dependency order
(so go interpret options for the element if we need to). This would be a
bit tricky to implement, and might incur a performance hit because we'd
need more book-keeping to track what's been interpreted and what hasn't.
But even this approach couldn't handle cases where this is a
**dependency cycle**: where interpreting the options for an element
depend on that element already having its options interpreted 🤯 (see
comment below for such an example).

This PR does two things.

1. The main solution to the "needing options in the same file to already
be interpreted" problem is to interpret the **non**-custom options
first. That way features and other relevant options get interpreted
first, so we can then assume, when interpreting **custom** options, that
we know enough to correctly interpret and serialize everything. This is
also [the strategy used by
`protoc`](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.cc#L5899-L5943)
and is adequate for now (not necessarily bullet-proof, but will work as
long as Google doesn't make certain kinds of changes to
`descriptor.proto`).

2. The bigger change in here, that accounts for the vast majority of
lines of code changed/deleted, is removal of the "canonical option
bytes" feature. This was a feature of protocompile to generate the same
output as `protoc`, byte-for-byte. Special handling was needed to do so
because the protobuf-go library serializes the option messages
differently than the way they are serialized by the C++ `libprotoc`. So
we basically re-implemented serialization in the `option` sub-package
with the `interpreted*` types. Since this was a source of bugs and
Editions-related issues, as described in the first paragraph above, and
the feature is basically unused (it was never actually incorporated into
the buf CLI), it will be a quality of life improvement to not maintain
this complicated code anymore.

(cherry picked from commit 869ef58)
  • Loading branch information
jhump authored and kralicky committed Apr 12, 2024
1 parent c14f422 commit 191b710
Show file tree
Hide file tree
Showing 22 changed files with 1,080 additions and 1,736 deletions.
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,9 @@ internal/testdata/options/options2/test.protoset: $(PROTOC) internal/testdata/op
internal/testdata/options/test_proto3.protoset: $(PROTOC) internal/testdata/options/test_proto3.proto
cd $(@D) && $(PROTOC) --descriptor_set_out=$(@F) -I. $(filter-out protoc,$(^F))

internal/testdata/options/test_editions.protoset: $(PROTOC) internal/testdata/options/test_editions.proto
cd $(@D) && $(PROTOC) --experimental_editions --descriptor_set_out=$(@F) -I. $(filter-out protoc,$(^F))

.PHONY: test-descriptors
test-descriptors: internal/testdata/all.protoset
test-descriptors: internal/testdata/desc_test_complex.protoset
Expand All @@ -177,5 +180,7 @@ test-descriptors: internal/testdata/descriptor_impl_tests.protoset
test-descriptors: internal/testdata/descriptor_editions_impl_tests.protoset
test-descriptors: internal/testdata/editions/all.protoset
test-descriptors: internal/testdata/source_info.protoset
test-descriptors: internal/testdata/options/options.protoset
test-descriptors: internal/testdata/options/test.protoset
test-descriptors: internal/testdata/options/test_proto3.protoset
test-descriptors: internal/testdata/options/test_editions.protoset
31 changes: 7 additions & 24 deletions internal/benchmarks/benchmark_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ import (
"github.com/kralicky/protocompile"
"github.com/kralicky/protocompile/ast"
"github.com/kralicky/protocompile/internal/protoc"
"github.com/kralicky/protocompile/linker"
"github.com/kralicky/protocompile/parser"
"github.com/kralicky/protocompile/parser/fastscan"
"github.com/kralicky/protocompile/protoutil"
Expand Down Expand Up @@ -235,7 +234,7 @@ func downloadAndExpand(url, targetDir string) (e error) {
}

func BenchmarkGoogleapisProtocompile(b *testing.B) {
benchmarkGoogleapisProtocompile(b, false, func() *protocompile.Compiler {
benchmarkGoogleapisProtocompile(b, func() *protocompile.Compiler {
return &protocompile.Compiler{
Resolver: protocompile.WithStandardImports(&protocompile.SourceResolver{
ImportPaths: []string{googleapisDir},
Expand All @@ -246,20 +245,8 @@ func BenchmarkGoogleapisProtocompile(b *testing.B) {
})
}

func BenchmarkGoogleapisProtocompileCanonical(b *testing.B) {
benchmarkGoogleapisProtocompile(b, true, func() *protocompile.Compiler {
return &protocompile.Compiler{
Resolver: protocompile.WithStandardImports(&protocompile.SourceResolver{
ImportPaths: []string{googleapisDir},
}),
SourceInfoMode: protocompile.SourceInfoStandard,
// leave MaxParallelism unset to let it use all cores available
}
})
}

func BenchmarkGoogleapisProtocompileNoSourceInfo(b *testing.B) {
benchmarkGoogleapisProtocompile(b, false, func() *protocompile.Compiler {
benchmarkGoogleapisProtocompile(b, func() *protocompile.Compiler {
return &protocompile.Compiler{
Resolver: protocompile.WithStandardImports(&protocompile.SourceResolver{
ImportPaths: []string{googleapisDir},
Expand All @@ -270,13 +257,13 @@ func BenchmarkGoogleapisProtocompileNoSourceInfo(b *testing.B) {
})
}

func benchmarkGoogleapisProtocompile(b *testing.B, canonicalBytes bool, factory func() *protocompile.Compiler) {
func benchmarkGoogleapisProtocompile(b *testing.B, factory func() *protocompile.Compiler) {
for i := 0; i < b.N; i++ {
benchmarkProtocompile(b, factory(), googleapisSources, canonicalBytes)
benchmarkProtocompile(b, factory(), googleapisSources)
}
}

func benchmarkProtocompile(b *testing.B, c *protocompile.Compiler, sources []string, canonicalBytes bool) {
func benchmarkProtocompile(b *testing.B, c *protocompile.Compiler, sources []string) {
resolvedSources := make([]protocompile.ResolvedPath, 0, len(sources))
for _, source := range sources {
resolvedSources = append(resolvedSources, protocompile.ResolvedPath(source))
Expand All @@ -286,11 +273,7 @@ func benchmarkProtocompile(b *testing.B, c *protocompile.Compiler, sources []str
var fdSet descriptorpb.FileDescriptorSet
fdSet.File = make([]*descriptorpb.FileDescriptorProto, len(fds.Files))
for i, fd := range fds.Files {
if canonicalBytes {
fdSet.File[i] = fd.(linker.Result).CanonicalProto()
} else {
fdSet.File[i] = protoutil.ProtoFromFileDescriptor(fd)
}
fdSet.File[i] = protoutil.ProtoFromFileDescriptor(fd)
}
// protoc is writing output to file descriptor set, so we should, too
writeToNull(b, &fdSet)
Expand Down Expand Up @@ -488,7 +471,7 @@ func BenchmarkGoogleapisProtocompileSingleThreaded(b *testing.B) {
// need to run a single-threaded compile
MaxParallelism: 1,
}
benchmarkProtocompile(b, c, googleapisSources, false)
benchmarkProtocompile(b, c, googleapisSources)
}
})
}
Expand Down
Binary file added internal/testdata/all.protoset
Binary file not shown.
6 changes: 3 additions & 3 deletions internal/testdata/formatting_tests.proto
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ syntax = "proto2";

package testprotos;

import "github.com/kralicky/protocompile/internal/testdata/desc_test_complex.proto";
import "github.com/kralicky/protocompile/internal/testdata/desc_test_options.proto";
import "desc_test_complex.proto";
import "desc_test_options.proto";
import "google/protobuf/descriptor.proto";

option go_package = "github.com/kralicky/protocompile/internal/testprotos";
Expand Down Expand Up @@ -66,7 +66,7 @@ extend google.protobuf.FieldOptions {

message KeywordCollisionOptions {
optional uint64 id = 1 [
(foo.bar.float) = inf,
// (foo.bar.float) = inf,
(foo.bar.syntax) = true,
(foo.bar.import) = true,
(foo.bar.public) = true,
Expand Down
42 changes: 42 additions & 0 deletions internal/testdata/options/options.proto
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,32 @@ syntax = "proto2";

package bufbuild.protocompile.test3;

import "google/protobuf/any.proto";
import "google/protobuf/descriptor.proto";

option (any) = {
[type.googleapis.com/bufbuild.protocompile.test3.AllTypes]: {
pr_i32: [0, 1, 2, 3],
str: "foo",
},
};

option (filegroup) = {
array: [1, 2, 3, 4, 5, 6, 7, 8],
};
option (filegroup).i32 = -123;
option (filegroup).s = "abc";

option (filegroups) = {
s: "abc",
i32: 123,
};

option (filegroups) = {
s: "xyz",
i32: 456,
};

message Extendable {
optional string foo = 1;
optional int32 bar = 2;
Expand Down Expand Up @@ -186,3 +210,21 @@ extend google.protobuf.MethodOptions {
optional AllTypes rpc = 3001;
repeated int32 rpc_i = 3002 [packed = true];
}

// Also test encoding of options where option is defined in
// the same file as the usage. This makes sure we correctly
// defer computation of option bytes until we know enough
// to do so correctly (since option bytes encoding could
// depend on interpretation of other options in this file).
extend google.protobuf.FileOptions {
optional group FileGroup = 1003 {
optional string s = 1;
optional int32 i32 = 2;
repeated int32 array = 3 [packed = true];
}
repeated group FileGroups = 1004 {
optional string s = 1;
optional int32 i32 = 2;
}
optional google.protobuf.Any any = 1005;
}
Binary file added internal/testdata/options/options.protoset
Binary file not shown.
83 changes: 83 additions & 0 deletions internal/testdata/options/test_editions.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
edition = "2023";

package bufbuild.protocompile.test3.editions;

import "google/protobuf/any.proto";
import "google/protobuf/descriptor.proto";

option (delimited).foo.children = {
name: "abc-1",
};
option (delimited).foo.children = {
name: "abc-2",
};
option (delimited).foo.name = "abc";
option (delimited).foo.val = VAL1;

option (delimited).name = "123";
option (delimited).other.name = "xyz";
option (delimited).other.val = VAL0;

option (delimiteds) = {
name: "ABC",
val: 1,
};
option (delimiteds) = {
name: "XYZ",
val: 1,
};
option (delimiteds) = {
name: "1234",
val: 0,
};

option (other) = {
name: "123",
val: VAL0,
Foo: <name: "456">,
// NOTE: We can't currently refer to children in here
// because referring to delimited-encoded fields whose
// name != lower-case(type-name) inside a message
// literal is currently broken in protoc :(
// https://github.com/protocolbuffers/protobuf/issues/16239
};

option (others) = {
name: "123",
val: 0,
};

message Foo {
string name = 1;
Foo foo = 2 [
(any) = {
[type.googleapis.com/bufbuild.protocompile.test3.editions.Foo]: {
Foo: {
name: "abc",
Foo: {name: "xyz"},
},
},
},
features.message_encoding = DELIMITED
];
Foo other = 3;
Val val = 4;
repeated Foo children = 5 [features.message_encoding = DELIMITED];
}

enum Val {
option features.enum_type = CLOSED;
VAL0 = 0;
VAL1 = 1;
}

extend google.protobuf.FileOptions {
Foo delimited = 10101 [features.message_encoding = DELIMITED];
Foo other = 10102;
repeated Foo delimiteds = 10103 [features.message_encoding = DELIMITED];
repeated Foo others = 10104;
}

extend google.protobuf.FieldOptions {
google.protobuf.Any any = 10101;
}
Binary file added internal/testdata/options/test_editions.protoset
Binary file not shown.
Loading

0 comments on commit 191b710

Please sign in to comment.