Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a General Purpose 'MetadataValidator' to the Slice Parser #3067

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

InsertCreativityHere
Copy link
Member

This PR lays the groundwork for implementing #117.

Currently, each language has it's own completely separate machinery for validating metadata, if it has any at all.
This is not a great setup.

To fix this, I added a new MetadataValidator class to the parser, which all the individual languages can easily re-use.
When you construct it, you pass in the language prefix to check, and a list of validation functions it can use.
It will automatically visit all the Slice definitions, and call the provided validation functions when appropriate.

These validation functions take the metadata to be validated, and a reference to what it's applied to.
They all return an optional<string>.nullopt indicates the validation succeeded. If a string is returned, we emit it to the user, and remove the offending metadata (filter it out).


Note that this PR doesn't actually use this new class in the language compilers yet.
I first wanted to make sure that the initial concept was good and sounded okay to everyone.
If it is, then I'll start switching over the languages, a few at a time to try and keep the PRs from being too huge.

<ClCompile Include="..\Parser.cpp" />
<ClCompile Include="..\Preprocessor.cpp" />
<ClCompile Include="..\Scanner.cpp" />
<ClCompile Include="..\SliceUtil.cpp" />
<ClCompile Include="..\StringLiteralUtil.cpp" />
<ClCompile Include="..\FileTracker.cpp" />
Copy link
Member Author

@InsertCreativityHere InsertCreativityHere Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what was going on in this file, but it seems super outdated.
There were tons of duplicate references to files, and references to files that were removed from this folder.

If you remember back in the day, we used to have all the PythonUtil, RubyUtil, etc. files in the Slice folder.

I removed all the duplicates and the files that were moved from here, and haven't noticed anything breaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best is to update it from Visual Studio.

Copy link
Member Author

@InsertCreativityHere InsertCreativityHere Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried opening and re-saving it from VScode but it didn't remove the duplicates.
So I thought to just remove them by hand.

I'm using Visual Studio for Mac, which is probably not as good as the windows counterpart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I could revert this and leave it to you.
I'm certainly not as familiar with the VS build system!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to keep this changes.


namespace Slice
{
/// TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to review a proposed new API when the description is TODO.

class MetadataValidator final : public ParserVisitor
{
public:
MetadataValidator(std::string language, std::map<std::string, ValidationFunc> validators);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this API very odd. This map of functions is like reimplementing dynamic dispatch "by hand".

A much simpler and idiomatic API is to make MetadataValidator an abstract base class with a pure virtual function such as:

std::optional<string> validate(const MetadataPtr& metadata, const SyntaxTreeBasePtr& p) = 0;

And the derived class can use a map or case/switch statement to select the "validator" it wants to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the switch as you suggested. I guess I've gotten to used to avoiding class inheritance with Rust!

It is nice not to have this cumbersome std::function type, and gives implementations more flexibility.
But it does lead to more boilerplate in each of the compilers, since now they each need to:

  1. Declare their own subclass of this type,
  2. Implement their own 'switch' statement (or equivalent).

Both are very minor though.

If you have any other ideas on the API, let me know.
I want to get it nailed down before I spend any time rolling it out to all the languages.

}

// Check each of the language-agnostic metadata directives that the parser is aware of.
if (directive == "amd")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot like the logic we have for dispatching operations: we receive an operation name ("op") and we need to select the correct handler that unmarshals the parameters and more. It would make sense to use the same "dispatch" logic. See for example:

Ice::Object::dispatch(IncomingRequest& request, std::function<void(OutgoingResponse)> sendResponse)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation dispatch logic is definitely more performant than my code,
but I also think it's much more complicated to follow, and there's more places for things to go wrong (we need to keep indices, the list, and it's length in sync)...

So, personally I'd prefer to keep the simpler logic I have here, since the Slice compiler's performance is a secondary concern.

{
// If a warning message was returned from the validation function, it means the metadata was invalid.
// We remove it from the list and emit a warning to the user about it.
metadata.remove(meta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works because MetadaList is a std::list and not a std::vector.

Since you are using an iterator, you should remove this element using this iterator.

// Iterate through the provided metadata and check each one for validity.
for (MetadataList::const_iterator i = metadata.begin(); i != metadata.end(); ++i)
{
const MetadataPtr& meta = *i++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're incrementing the iterator twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.
My fingers are too used to typing the ++i in the for loop...

@@ -796,6 +796,7 @@ namespace Slice
MetadataList typeMetadata);
TypePtr type() const;
MetadataList typeMetadata() const;
void setTypeMetadata(MetadataList metadata);
Copy link
Member

@bernardnormier bernardnormier Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time following this API wrt metadata.

The gist of this PR seems to be that we only "set" metadata on a SyntaxTreeBase element after validating it. However here, the constructor takes a MetadatList named typeMetadata (not validated??). And Sequence has apparently two metadatalist: the plain metadata inherited from Contained, and this type metadata.

How does all this fit together?

What the rationale for having two different kinds of metadata for sequence definitions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all much nicer in icerpc.


Sequence, Dictionary, and Const are special because there's multiple places where metadata can appear:

[foo] sequence<[bar] string> StringSeq;

All Slice definitions support having metadata placed right before the definition (foo). This comes from the Contained class, which Sequence (and everything else) inherit from.

But due to the syntax of the language and structure of the AST, then we have this extra set of metadata with nowhere good to live (bar). So for these types, we create extra fields for storing this metadata:
typeMetadata for sequence and keyMetadata/valueMetadata for dictionary.

As to why we keep these two separate: the real reason is: "we always have". But I guess one benefit is to have more tight metadata validation (even if we haven't done it...) since some metadata only makes sense on the sequence as a whole (deprecated, cs:identifier (in the future)), and others only on the inner type (cpp:wstring).


As for the weird order of things, this is probably temporary. But there's 2 reasons for it:

  1. Parsing is a fragile time for the compiler. We go through Slice files line by line, so until parsing is fully completed, we don't have a full picture of their Slice. So we store the un-validated metadata until parsing is finished. Then we perform the validation after parsing is over, and we have all the information we may need.
  2. We don't have a way to 'inject' the MetadataVisitor from the languages into the core Slice library.

2 can be addressed. Just a matter of adding the functionality for it.
But 1 depends on whether there's any metadata validation that needs extra context about it's surroundings.
ie. Anything beyond "are my arguments correct", "am I applied to the right thing", and other very limited things.

None of the parser metadata need this, but it's possible one of the language ones does.
If none of them do, then yeah, we can delete the setMetadata functions and validate at construction time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is about adding the "metadata validation framework" to the Slice compilers, it would be helpful to first describe the steps that you envision when parsing and later validating the metadata. It's not clear to me.

It sounds like it's:

  • parser first adds metadata to SyntaxTreeBase elements (or Contained??), unvalidated
  • then we give both metadata and these elements (weird) to Slice-compiler-supplied validators
  • then based on the validator return value, the Parser code (visitor base class?) keeps or filters-out the metadata in the elements

I find this pretty confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're getting caught up on the smaller details that aren't important to the overall picture. But it is:

  1. Parsing: The parser parses metadata and stores it as-is (without validating it)
  2. Validating: After parsing finished, then we go back and validate the metadata
  3. Filtering: If any metadata failed validation, we remove it.

Note that none of this is changing in my PR. In the languages that validate metadata, this is already what they do:

slice2cpp:

ice/cpp/src/slice2cpp/Gen.cpp

Lines 1110 to 1114 in a2cbac1

MetadataList
Slice::Gen::MetadataVisitor::validate(const SyntaxTreeBasePtr& cont, MetadataList metadata, bool operation)
{
for (MetadataList::const_iterator q = metadata.begin(); q != metadata.end();)
{
slice2java:
MetadataList validateType(const SyntaxTreeBasePtr& p, const MetadataList& metadata)
{
MetadataList newMetadata;
for (const auto& m : metadata)
slice2cs:
void
Slice::CsGenerator::MetadataVisitor::validate(const ContainedPtr& cont)
{
MetadataList newLocalMetadata;
for (const auto& metadata : cont->getMetadata())
(This one returns void because it calls setMetadata itself, instead of letting the caller do it. This means it skips metadata inside sequences, dictionaries, and consts; this is a bug).

This PRs primary goal is to centralize and simplify all these separate validators.

Copy link
Member Author

@InsertCreativityHere InsertCreativityHere Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for the sake of completeness!
Here's a full answer to your questions, explaining how/why the parser handles metadata the way it does:

In the Ice compiler, we never use the term 'AST'. But if we did, SyntaxTreeBase is the base type for everything that lives within the AST.

Contained is any type that is 'contained' within a container of some kind. Since all Slice definitions have to appear within a module, almost everything inherits from Contained except for smaller things like Comment and Metadata, or special things like Unit (the whole compilation job) and DefinitionContext (a single file).
Because of this Contained is effectively synonymous with "A Slice definition" (even including things like operations, parameters, enumerators, and data members).

Slice supports metadata in lots of places:

  • At the top of a Slice file (a DefinitionContext)
  • In front of a Slice definition (a Contained)
  • Or in special places like on the inner type of sequences, dictionaries, and constants.

Note that the least common denominator of all these types is SyntaxTreeBase.


Now, let's say we start up the parser. Bison parses tokens from the inside out.
What I mean is if we take this grammar rule (paraphrased): definitions : metadata definition
Before this rule will be called, the parser has already parsed the metadata and the definition separately.
Only after they've both been parsed, will the 'outer' definitions rule be called.

Also, our parser goes from the top of files to the bottom.
So, it's possible that we'll encounter types before they're fully defined yet.

Because of these two facts, when the parser first sees metadata directives, there's information that we don't have access to, simply because the parser hasn't seen the whole file yet, or some outer containers haven't been fully constructed yet.

So. The parser parses the metadata, without performing validation, and stores it, wherever it's supposed to be stored (DefinitionContext, Contained, Sequence, etc).

Then, eventually, the parser finishes. We've now seen the entire file, we have all the information, and everything has been fully constructed. So now, with this full picture of everything, we go back and validate the metadata.

During validation, if we find any metadata that is invalid, we retroactively remove it. So that when the code-gen runs, it can assume that all the metadata it sees is correct.


And finally, there's the actual metadata validation itself.
In icerpc, the AST is a simple Vec<Node> that we can iterate through. We don't have this in Ice.
The only way to 'iterate' through Slice definitions is with a Visitor.

So, we create a subclass of visitor, that visits every single place where metadata can exist, and calls a function for each place. This function runs the actual validation (so that the validation can be customized by overriding it).

This function needs to take the metadata (since that's what we're validating), and it also needs a reference to what the metadata has been applied to (since Metadata doesn't hold this information internally).
And the function needs some way to communicate back to it's caller - whether the metadata was okay.
We could just use a bool. But optional<string> lets us provide a nice message too.


Is this the only way to structure metadata validation? Of course not.
But it's the best way I thought of that doesn't involve a single PR that changes thousands of lines of code.

Copy link
Member

@bernardnormier bernardnormier Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, eventually, the parser finishes. We've now seen the entire file, we have all the information, and everything has been fully constructed. So now, with this full picture of everything, we go back and validate the metadata.

During validation, if we find any metadata that is invalid, we retroactively remove it. So that when the code-gen runs, it can assume that all the metadata it sees is correct.

Let's try to be clear here. There is code in the Slice library and code in the Slice compiler.

I assume both perform some level of validation: the code in the Slice library validates amd and other non-language specific metadata while slice2cpp validates cpp: metadata.

And, this validation is combined in some manner with filtering: we warn about invalid metadata and remove it before the actual code generation. Which means slice2cpp better instantiate and run its metadata visitor (/validator/filter) first.

Questions:

  • does it make sense to use a single Visitor instance--created by slice2cpp--to perform the amd and cpp:xxx metadata validation+filtering? For me, the answer is no. We create many visitors during a compilation; I don't see a reason to optimize the validation+filtering by using a single instance for everything.

  • when slice2cpp validates and filters metadata using a visitor, is the visiting API really so inappropriate that we need a different API? And is a single validate function the best API for the job? It's a legitimate question--I don't find the amd etc. validation in this PR particularly elegant.

And I completely agree we need to figure this out before you update all the compilers.

cpp/src/Slice/MetadataValidator.cpp Outdated Show resolved Hide resolved
cpp/src/Slice/MetadataValidator.h Outdated Show resolved Hide resolved
<ClCompile Include="..\Parser.cpp" />
<ClCompile Include="..\Preprocessor.cpp" />
<ClCompile Include="..\Scanner.cpp" />
<ClCompile Include="..\SliceUtil.cpp" />
<ClCompile Include="..\StringLiteralUtil.cpp" />
<ClCompile Include="..\FileTracker.cpp" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to keep this changes.

Co-authored-by: Jose  <pepone@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants