Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/collectproto: a new tool for collecting protobuf #659

Closed
wants to merge 4 commits into from

Conversation

husio
Copy link
Contributor

@husio husio commented May 22, 2019

A new command line tool for collecting all protobuf declarations was
added.

fix #618


From the README.md:

Collect, combine and produce a single protobuf file with all declarations.

This program reads provided protobuf declarations and:

  • removes all non standard plugin information (ie gogoproto)
  • removes all package declaration
  • removes all but one syntax declaration
  • removes all import declarations, inlines all weave protobuf imports

The combined result is written to stdout.

Example usage:

$ go run cmd/collectproto/collectproto.go cmd/bnsd/app/codec.proto | wc -l
930

Please let me know if the output is as expected.

A new command line tool for collecting all protobuf declarations was
added.

From the `README.md`:

Collect, combine and produce a single protobuf file with all declarations.

This program reads provided protobuf declarations and:
- removes all non standard plugin information (ie gogoproto)
- removes all package declaration
- removes all but one syntax declaration
- removes all import declarations, inlines all weave protobuf imports

Combined result is written to stdout.

Example usage:
```
$ go run cmd/collectproto/collectproto.go cmd/bnsd/app/codec.proto | wc -l
930
```
@husio husio requested a review from ethanfrey May 22, 2019 10:48
@codecov-io

This comment has been minimized.

Copy link
Contributor

@ethanfrey ethanfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ambitious.
I agree on the desired result.

I will review it tomorrow when the gov / batch stuff is not blocking.
But some solid tests cases is great and feel free to continue on this path. I have not seen tools that do such well, and go is more stable than shell scripts.


Collect, combine and produce a single protobuf file with all declarations.

This program reads provided protobuf declarations and:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought you found this program. Then I realized you wrote it.
Above and beyond the call of duty!

Copy link
Contributor Author

@husio husio May 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I only typed package main. The rest just happened 😉

Protobuf is having a trivial syntax to parse. After implementing the cutting of the plugin information I have added more and more. This code is not using regular expressions and is still below 200 lines of a very verbose code! 💪

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No regexp? How dare you...

husio added a commit that referenced this pull request May 23, 2019
We no longer use `GOPATH` and `dep` for dependency management. Using Go
modules requires changes to how protobuf files are declared and
included.

- `protofmt` command no longer works as it does not accept `-I` flag
- `protodoc` is disabled for now as it requires a single protobuf. This
   is waiting for #659
- some packages cannot be compiled unless weave top level `codec.proto`
  is included. Compiler is printing a warning that this file is not
  used, but it does not compile without it. Weird.

fix #568
@ethanfrey
Copy link
Contributor

First comments when trying it out (precurser to code review for most scripts):

  • Not built by make install, nor any reference to how to run it (besides your go run command in this PR).
  • When looking at the output from go run cmd/collectproto/collectproto.go cmd/bnsd/app/codec.proto, I noticed that (a) message Metadata is defined twice and (b) references were still package delimited (weave.Metadata metadata = 1;)
  • It works well even for other files. eg. go run cmd/collectproto/collectproto.go x/cash/codec.proto
  • There are some message-level gogoprotobuf definitons still there. Eg.:
message Coin {
  option (gogoproto.goproto_stringer) = false;

I will look at the code soon, but I think those are good places to start to fix

@ethanfrey
Copy link
Contributor

ethanfrey commented May 23, 2019

Trying to compile the output (with gogoproto... which should be more forgiving that other languages):

go run cmd/collectproto/collectproto.go x/cash/codec.proto > foo.proto
protoc --gogoproto_out=. foo.proto

I get foo.proto:1:8: Expected "=".

Top of the output file looks like:

syntax proto3;

// definitions from github.com/iov-one/weave/x/cash/codec.proto

It should look like:

syntax = "proto3";

package bnsd;

(I guess the package name can be taken from the last directory name in the path of the input file?)

It would be nice for a simple test script there, which tries to generate output from bnsd and compile it (must not be done in the CI, but at least demo the above works)

Note that after I fixed the issue with syntax = "proto3", I get the following error I mentioned above:

protoc --gogoproto_out=. foo.proto
foo.proto:12:3: "weave.Metadata" is not defined.
foo.proto:13:12: "coin.Coin" is not defined.
foo.proto:22:3: "weave.Metadata" is not defined.
foo.proto:25:3: "coin.Coin" is not defined.
foo.proto:35:3: "weave.Metadata" is not defined.
foo.proto:37:3: "coin.Coin" is not defined.
foo.proto:44:3: "coin.Coin" is not defined.

@husio
Copy link
Contributor Author

husio commented May 23, 2019

I have fixed syntax = "proto3" declaration and removal of the stringer option.

I am not sure what to do about the import scope removal. This tool is not using proper parser and lexer but just some tricks to ignore the content. To add functionality to remove package namespace (because it is all in one file) it would take much more time. It would be also a reason to change the current approach to using a tokenizer and lexer.
Is the combining worth implementing or should I remove import inlining and keep only removal of nonstandard protobuf declarations?

@@ -18,7 +18,7 @@ func main() {

// Syntax declaration is never rewritten. Initialize combined file with
// one as it is required.
fmt.Fprintln(&out, "syntax proto3;")
fmt.Fprintln(&out, "syntax = proto3;")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please test this with protoc??

It is syntax = "proto3";

@ethanfrey
Copy link
Contributor

With regards to the namespacing, there is a quite simple trick with no parser:

go run cmd/collectproto/collectproto.go cmd/bnsd/app/codec.proto > foo.proto
grep '\w\.\w' foo.proto  | grep -v '//'

Returns:

  cash.FeeInfo fees = 1;
  repeated sigs.StdSignature signatures = 2;
    cash.SendMsg send_msg = 51;
    escrow.CreateEscrowMsg create_escrow_msg = 52;
    escrow.ReleaseEscrowMsg release_escrow_msg = 53;
    escrow.ReturnEscrowMsg return_escrow_msg = 54;
    escrow.UpdateEscrowPartiesMsg update_escrow_msg = 55;
    multisig.CreateContractMsg create_contract_msg = 56;
    multisig.UpdateContractMsg update_contract_msg = 57;
    validators.SetValidatorsMsg set_validators_msg = 58;
    currency.NewTokenInfoMsg new_token_info_msg = 59;
    nft.AddApprovalMsg add_approval_msg = 61;
    nft.RemoveApprovalMsg remove_approval_msg = 62;
    username.IssueTokenMsg issue_username_nft_msg = 63;
    username.AddChainAddressMsg add_username_address_nft_msg = 64;
    username.RemoveChainAddressMsg remove_username_address_msg = 65;
    distribution.NewRevenueMsg new_revenue_msg = 66;
    distribution.DistributeMsg distribute_msg = 67;
    distribution.ResetRevenueMsg reset_revenue_msg = 68;
    migration.UpgradeSchemaMsg upgrade_schema_msg = 69;
    aswap.CreateSwapMsg create_swap_msg = 70;
    aswap.ReleaseSwapMsg release_swap_msg = 71;
    aswap.ReturnSwapMsg return_swap_msg = 72;
  nft.NonFungibleToken base = 1;
  repeated nft.ActionApprovals approvals = 3 ;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  repeated coin.Coin amount = 5;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  repeated coin.Coin coins = 2;
  weave.Metadata metadata = 1;
  coin.Coin amount = 4;
  weave.Metadata metadata = 1;
  coin.Coin fees = 3;
  coin.Coin minimal_fee = 4 ;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  repeated coin.Coin amount = 5;
  weave.Metadata metadata = 1;
  repeated coin.Coin amount = 3;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  crypto.PublicKey pubkey = 2;
  weave.Metadata metadata = 1;
  crypto.PublicKey pubkey = 3;
  crypto.Signature signature = 4;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;
  weave.Metadata metadata = 1;

I will make a push to this branch

@ethanfrey
Copy link
Contributor

Okay, I pushed a simple fix for the syntax issue.
Then I couldn't change your parse code, so ran this simple sed script on the output:

go run cmd/collectproto/collectproto.go cmd/bnsd/app/codec.proto > foo.proto
sed -i -E 's/[a-z]+\.([a-zA-Z])/\1/' foo.proto
protoc --gogoproto_out=. foo.proto

Got a bunch more errors...
Some like: foo.proto:874:9: "Metadata" is already defined. are due to multiple imports of the same file

$ grep 'message Metadata' foo.proto | wc -l
11

Others may be due to the fact we use the same name (Configuration) in multiple files: foo.proto:257:9: "Configuration" is already defined

Unless you have a clever idea to fix those issues, I am not sure it is worth continuing this road

@ethanfrey
Copy link
Contributor

ethanfrey commented May 23, 2019

What about https://github.com/tallstoat/pbparser ?

This (go library) can parse the protobuf files.
You can then just output text, even using go templates.

One idea I had was to mangle all names to be <original package>_<original message>. Then this could easily be flat.

Is the combining worth implementing or should I remove import inlining and keep only removal of nonstandard protobuf declarations?

I would say it is not worth more than a few hours of work.
Just do your current script file by file and export them into an output directory, either preserving the tree structure, or collapsing the paths into long names eg. x_cash_codec.proto.

I spent some time and actually combining these is going to be very hard. The readability cleanup is nice though. So let's just simplify the ambitions and we can use this as a nice alternative to flaky shell scripts.

@husio
Copy link
Contributor Author

husio commented May 23, 2019

I thought this through and I would back off from the idea of merging files. I think we should focus on minimal functionality here, without trying to do too much.
How about the scope of this will be
Create a command line tool in Go that reads from stdin and writes to stdout. This tool should accept a protobuf file and do several oprations:

  • remove all plugin options declared for an attribute (content between [..])
  • remove all plugin options declared per message
  • remove go_module option

This is much smaller scope. Almost exactly what #618 was about. What do you think?

@ethanfrey
Copy link
Contributor

Yes I agree with the last comment. Just add

  • remove gogo/protobuf import lines

So it will make a clean version of our protobuf files. And we can use that in some script (find, etc) but avoid crazy sed/python/perl mangling

@husio
Copy link
Contributor Author

husio commented May 24, 2019

Continued in #666 🤘

@husio husio deleted the protobuf_cleanup_tool_issue_618 branch May 27, 2019 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep protobuf fields on one line
3 participants