Skip to content
This repository has been archived by the owner on Feb 8, 2022. It is now read-only.

Parser Improvements #48

Open
1 of 6 tasks
jhugard opened this issue Jul 2, 2016 · 8 comments
Open
1 of 6 tasks

Parser Improvements #48

jhugard opened this issue Jul 2, 2016 · 8 comments
Assignees

Comments

@jhugard
Copy link
Collaborator

jhugard commented Jul 2, 2016

The parser currently correctly turns proto2 and proto3 into an AST, but does no further processing or semantic validation.

Proposed additions:

Process import statements
Parser methods need to be enhanced with, e.g., a function parameter that can satisfy includes. For Parse.fromFile, this could be an include path or paths. How to provide this for strings and streams is an open question.

Normalize Type Names
Convert various naming conventions into a single normalized form. Should support outputting in caller-selectable camelCase, PascalCase, snake_case, etc.

Flatten Type Names
Messages can contain inner message and enum definitions. These are publically visible and can be referenced from other types. It will probably be convenient to flatten these symbols and move them from an inner scope to the top level scope, while renaming the type using the fully-qualified scope name; e.g., "Outer.Inner".

In addition, Record types cannot contain inner types. Since .NET can handle "+" in a symbol name, the names could be normalized from "Outer.Inner" to "Outer+Inner" either in this step, or when doing code-gen for Records.

Validate Type Usage
Ensure every message used as a field type and enum used as a constant are defined somewhere. Must handle use before definition (so cannot be single pass). Must also handle qualification with the package name or unqualified using the current .proto file's package.

Resolve Constants
An enum can be used as a constant in an Option definition. Consider either resolving these, or exposing the symbol table used during type usage validation so that client code can do the substitution.

Capture Comments
Google's protobuf compiler can capture comments related to a symbol. This can be quite useful for generating documentation, such as in a type provider. It is also needed to create a full FileDescriptorSet (a binary protobuf representation of a set of .proto files).

@jhugard
Copy link
Collaborator Author

jhugard commented Jul 2, 2016

Open question: does this belong in the Parser project? Or, in a separate Compiler project?

@jhugard
Copy link
Collaborator Author

jhugard commented Oct 11, 2016

@ctaggart @bhandfast @agbogomolov @takemyoxygen Starting to take a look at the current Parser, and I'm afraid adding the above functionality will involve either breaking the existing API in several ways, or mean these steps will require extending the API:

  • Resolving import statements will require passing in a function which returns a string, stream, or file based on the import name.
    • Parsing a file + imports also means returning a list of tuples containing the filename(s) and associated PProto ASTs; e.g., ( string * PProto ) list.
    • It might be useful to return the symbol table, which is needed to resolve constants..
  • Capturing comments involves either adding terminals to the AST, or returning a separate table of comments (like Google's protoc.exe).
  • Normalizing type names requires passing in an option (or function) to select camelCase, PascalCase, or snake_case.
  • Flattening and validating won't change the API.

I'm not really in favor of adding to the public Parser.xxx functions, as the current API is fairly clean: there are only 3 functions - parse strings, streams, and files - each with a variation taking either an explicit parser or implicitly using the default parser (Parsers.pProto). This, IMHO, makes the API readily discoverable and easy to use. Requiring that the client call a "postProcess" function (or set of functions) does not make usage obvious.

I could add a Compile module (or some-such) to hold the new functionality.

Or, I could redefine the Parse module functions to handle all the above.

Any opinions?

@takemyoxygen
Copy link
Collaborator

@jhugard, I see the process as a sequence of two steps:

  1. parsing + lexical analysis. I guess that's what ProtoFile.fromFile does now.
  2. semantic analysis: types usage validation, symbol lookup tables etc.

And then, results of the 2nd step can be used by code generator or the type provider.
So, it looks to me that second step should probably have a separate module.

The question is, is that necessary to expose results of the first step as public API? Maybe it would be better to expose single high-level function Compile.file: string -> CompiledProtobuf * LookupTable (maybe a few overloads to deal with streams etc) and make current ProtoFile model internal?

@jhugard jhugard self-assigned this Oct 13, 2016
@jhugard
Copy link
Collaborator Author

jhugard commented Oct 13, 2016

Starting to iterate on this in branch parser-improvements-48.

@7sharp9
Copy link
Collaborator

7sharp9 commented Sep 28, 2018

So, are import statements currently processed, or was then never completed?
cc @jhugard @ctaggart

@ctaggart
Copy link
Owner

ctaggart commented Sep 28, 2018

@7sharp9 I don't remember. Since you are asking, I'm guessing not. I'll help clean up these projects you are using soon.

@jhugard
Copy link
Collaborator Author

jhugard commented Sep 29, 2018

I think I had it working on this branch:
https://github.com/ctaggart/froto/compare/parser-improvements-48

May have been holding the branch open to implement the other items on the list above, but you'll need to look over the code to be sure...

@ctaggart
Copy link
Owner

@7sharp9 I've reopened #65 and changed the title for review.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants