Add a General Purpose 'MetadataValidator' to the Slice Parser #3067

InsertCreativityHere · 2024-11-05T18:26:23Z

This PR lays the groundwork for implementing #117.

Currently, each language has it's own completely separate machinery for validating metadata, if it has any at all.
This is not a great setup.

To fix this, I added a new MetadataValidator class to the parser, which all the individual languages can easily re-use.
When you construct it, you pass in the language prefix to check, and a list of validation functions it can use.
It will automatically visit all the Slice definitions, and call the provided validation functions when appropriate.

These validation functions take the metadata to be validated, and a reference to what it's applied to.
They all return an optional<string>.nullopt indicates the validation succeeded. If a string is returned, we emit it to the user, and remove the offending metadata (filter it out).

Note that this PR doesn't actually use this new class in the language compilers yet.
I first wanted to make sure that the initial concept was good and sounded okay to everyone.
If it is, then I'll start switching over the languages, a few at a time to try and keep the PRs from being too huge.

InsertCreativityHere · 2024-11-05T18:28:09Z

cpp/src/Slice/msbuild/slice.vcxproj.filters

    <ClCompile Include="..\Parser.cpp" />
    <ClCompile Include="..\Preprocessor.cpp" />
    <ClCompile Include="..\Scanner.cpp" />
    <ClCompile Include="..\SliceUtil.cpp" />
    <ClCompile Include="..\StringLiteralUtil.cpp" />
-    <ClCompile Include="..\FileTracker.cpp" />


I don't know what was going on in this file, but it seems super outdated.
There were tons of duplicate references to files, and references to files that were removed from this folder.

If you remember back in the day, we used to have all the PythonUtil, RubyUtil, etc. files in the Slice folder.

I removed all the duplicates and the files that were moved from here, and haven't noticed anything breaking.

Best is to update it from Visual Studio.

I tried opening and re-saving it from VScode but it didn't remove the duplicates.
So I thought to just remove them by hand.

I'm using Visual Studio for Mac, which is probably not as good as the windows counterpart.

If you want, I could revert this and leave it to you.
I'm certainly not as familiar with the VS build system!

seems fine to keep this changes.

bernardnormier · 2024-11-05T19:15:21Z

cpp/src/Slice/MetadataValidator.h

+
+namespace Slice
+{
+    /// TODO


It's difficult to review a proposed new API when the description is TODO.

bernardnormier · 2024-11-05T19:27:39Z

cpp/src/Slice/MetadataValidator.h

+    class MetadataValidator final : public ParserVisitor
+    {
+    public:
+        MetadataValidator(std::string language, std::map<std::string, ValidationFunc> validators);


I find this API very odd. This map of functions is like reimplementing dynamic dispatch "by hand".

A much simpler and idiomatic API is to make MetadataValidator an abstract base class with a pure virtual function such as:

std::optional<string> validate(const MetadataPtr& metadata, const SyntaxTreeBasePtr& p) = 0;

And the derived class can use a map or case/switch statement to select the "validator" it wants to.

I made the switch as you suggested. I guess I've gotten to used to avoiding class inheritance with Rust!

It is nice not to have this cumbersome std::function type, and gives implementations more flexibility.
But it does lead to more boilerplate in each of the compilers, since now they each need to:

Declare their own subclass of this type,

Implement their own 'switch' statement (or equivalent).

Both are very minor though.

If you have any other ideas on the API, let me know.
I want to get it nailed down before I spend any time rolling it out to all the languages.

bernardnormier · 2024-11-05T20:25:23Z

cpp/src/Slice/MetadataValidator.cpp

+    }
+
+    // Check each of the language-agnostic metadata directives that the parser is aware of.
+    if (directive == "amd")


This looks a lot like the logic we have for dispatching operations: we receive an operation name ("op") and we need to select the correct handler that unmarshals the parameters and more. It would make sense to use the same "dispatch" logic. See for example:

ice/cpp/src/Ice/Object.cpp

Line 103 in 40171af

Ice::Object::dispatch(IncomingRequest& request, std::function<void(OutgoingResponse)> sendResponse)

The operation dispatch logic is definitely more performant than my code,
but I also think it's much more complicated to follow, and there's more places for things to go wrong (we need to keep indices, the list, and it's length in sync)...

So, personally I'd prefer to keep the simpler logic I have here, since the Slice compiler's performance is a secondary concern.

bernardnormier · 2024-11-05T20:41:00Z

cpp/src/Slice/MetadataValidator.cpp

+        {
+            // If a warning message was returned from the validation function, it means the metadata was invalid.
+            // We remove it from the list and emit a warning to the user about it.
+            metadata.remove(meta);


This works because MetadaList is a std::list and not a std::vector.

Since you are using an iterator, you should remove this element using this iterator.

bernardnormier · 2024-11-05T20:41:45Z

cpp/src/Slice/MetadataValidator.cpp

+    // Iterate through the provided metadata and check each one for validity.
+    for (MetadataList::const_iterator i = metadata.begin(); i != metadata.end(); ++i)
+    {
+        const MetadataPtr& meta = *i++;


Looks like you're incrementing the iterator twice.

Good catch. Fixed.
My fingers are too used to typing the ++i in the for loop...

bernardnormier · 2024-11-05T20:58:36Z

cpp/src/Slice/Parser.h

@@ -796,6 +796,7 @@ namespace Slice
            MetadataList typeMetadata);
        TypePtr type() const;
        MetadataList typeMetadata() const;
+        void setTypeMetadata(MetadataList metadata);


I have a hard time following this API wrt metadata.

The gist of this PR seems to be that we only "set" metadata on a SyntaxTreeBase element after validating it. However here, the constructor takes a MetadatList named typeMetadata (not validated??). And Sequence has apparently two metadatalist: the plain metadata inherited from Contained, and this type metadata.

How does all this fit together?

What the rationale for having two different kinds of metadata for sequence definitions?

This is all much nicer in icerpc.

Sequence, Dictionary, and Const are special because there's multiple places where metadata can appear:

[foo] sequence<[bar] string> StringSeq;

All Slice definitions support having metadata placed right before the definition (foo). This comes from the Contained class, which Sequence (and everything else) inherit from.

But due to the syntax of the language and structure of the AST, then we have this extra set of metadata with nowhere good to live (bar). So for these types, we create extra fields for storing this metadata:
typeMetadata for sequence and keyMetadata/valueMetadata for dictionary.

As to why we keep these two separate: the real reason is: "we always have". But I guess one benefit is to have more tight metadata validation (even if we haven't done it...) since some metadata only makes sense on the sequence as a whole (deprecated, cs:identifier (in the future)), and others only on the inner type (cpp:wstring).

As for the weird order of things, this is probably temporary. But there's 2 reasons for it:

Parsing is a fragile time for the compiler. We go through Slice files line by line, so until parsing is fully completed, we don't have a full picture of their Slice. So we store the un-validated metadata until parsing is finished. Then we perform the validation after parsing is over, and we have all the information we may need.

We don't have a way to 'inject' the MetadataVisitor from the languages into the core Slice library.

2 can be addressed. Just a matter of adding the functionality for it.
But 1 depends on whether there's any metadata validation that needs extra context about it's surroundings.
ie. Anything beyond "are my arguments correct", "am I applied to the right thing", and other very limited things.

None of the parser metadata need this, but it's possible one of the language ones does.
If none of them do, then yeah, we can delete the setMetadata functions and validate at construction time.

Since this PR is about adding the "metadata validation framework" to the Slice compilers, it would be helpful to first describe the steps that you envision when parsing and later validating the metadata. It's not clear to me.

It sounds like it's:

parser first adds metadata to SyntaxTreeBase elements (or Contained??), unvalidated

then we give both metadata and these elements (weird) to Slice-compiler-supplied validators

then based on the validator return value, the Parser code (visitor base class?) keeps or filters-out the metadata in the elements

I find this pretty confusing.

I think you're getting caught up on the smaller details that aren't important to the overall picture. But it is:

Parsing: The parser parses metadata and stores it as-is (without validating it)

Validating: After parsing finished, then we go back and validate the metadata

Filtering: If any metadata failed validation, we remove it.

Note that none of this is changing in my PR. In the languages that validate metadata, this is already what they do:

slice2cpp:

ice/cpp/src/slice2cpp/Gen.cpp

Lines 1110 to 1114 in a2cbac1

MetadataList

Slice::Gen::MetadataVisitor::validate(const SyntaxTreeBasePtr& cont, MetadataList metadata, bool operation)

{

for (MetadataList::const_iterator q = metadata.begin(); q != metadata.end();)

{

slice2java:

ice/cpp/src/slice2java/JavaUtil.cpp

Lines 354 to 357 in a2cbac1

MetadataList validateType(const SyntaxTreeBasePtr& p, const MetadataList& metadata)

{

MetadataList newMetadata;

for (const auto& m : metadata)

slice2cs:

ice/cpp/src/slice2cs/CsUtil.cpp

Lines 2075 to 2079 in a2cbac1

void

Slice::CsGenerator::MetadataVisitor::validate(const ContainedPtr& cont)

{

MetadataList newLocalMetadata;

for (const auto& metadata : cont->getMetadata())

(This one returns void because it calls setMetadata itself, instead of letting the caller do it. This means it skips metadata inside sequences, dictionaries, and consts; this is a bug).

This PRs primary goal is to centralize and simplify all these separate validators.

But for the sake of completeness!
Here's a full answer to your questions, explaining how/why the parser handles metadata the way it does:

In the Ice compiler, we never use the term 'AST'. But if we did, SyntaxTreeBase is the base type for everything that lives within the AST.

Contained is any type that is 'contained' within a container of some kind. Since all Slice definitions have to appear within a module, almost everything inherits from Contained except for smaller things like Comment and Metadata, or special things like Unit (the whole compilation job) and DefinitionContext (a single file).
Because of this Contained is effectively synonymous with "A Slice definition" (even including things like operations, parameters, enumerators, and data members).

Slice supports metadata in lots of places:

At the top of a Slice file (a DefinitionContext)

In front of a Slice definition (a Contained)

Or in special places like on the inner type of sequences, dictionaries, and constants.

Note that the least common denominator of all these types is SyntaxTreeBase.

Now, let's say we start up the parser. Bison parses tokens from the inside out.
What I mean is if we take this grammar rule (paraphrased): definitions : metadata definition
Before this rule will be called, the parser has already parsed the metadata and the definition separately.
Only after they've both been parsed, will the 'outer' definitions rule be called.

Also, our parser goes from the top of files to the bottom.
So, it's possible that we'll encounter types before they're fully defined yet.

Because of these two facts, when the parser first sees metadata directives, there's information that we don't have access to, simply because the parser hasn't seen the whole file yet, or some outer containers haven't been fully constructed yet.

So. The parser parses the metadata, without performing validation, and stores it, wherever it's supposed to be stored (DefinitionContext, Contained, Sequence, etc).

Then, eventually, the parser finishes. We've now seen the entire file, we have all the information, and everything has been fully constructed. So now, with this full picture of everything, we go back and validate the metadata.

During validation, if we find any metadata that is invalid, we retroactively remove it. So that when the code-gen runs, it can assume that all the metadata it sees is correct.

And finally, there's the actual metadata validation itself.
In icerpc, the AST is a simple Vec<Node> that we can iterate through. We don't have this in Ice.
The only way to 'iterate' through Slice definitions is with a Visitor.

So, we create a subclass of visitor, that visits every single place where metadata can exist, and calls a function for each place. This function runs the actual validation (so that the validation can be customized by overriding it).

This function needs to take the metadata (since that's what we're validating), and it also needs a reference to what the metadata has been applied to (since Metadata doesn't hold this information internally).
And the function needs some way to communicate back to it's caller - whether the metadata was okay.
We could just use a bool. But optional<string> lets us provide a nice message too.

Is this the only way to structure metadata validation? Of course not.
But it's the best way I thought of that doesn't involve a single PR that changes thousands of lines of code.

Then, eventually, the parser finishes. We've now seen the entire file, we have all the information, and everything has been fully constructed. So now, with this full picture of everything, we go back and validate the metadata.

During validation, if we find any metadata that is invalid, we retroactively remove it. So that when the code-gen runs, it can assume that all the metadata it sees is correct.

Let's try to be clear here. There is code in the Slice library and code in the Slice compiler.

I assume both perform some level of validation: the code in the Slice library validates amd and other non-language specific metadata while slice2cpp validates cpp: metadata.

And, this validation is combined in some manner with filtering: we warn about invalid metadata and remove it before the actual code generation. Which means slice2cpp better instantiate and run its metadata visitor (/validator/filter) first.

Questions:

does it make sense to use a single Visitor instance--created by slice2cpp--to perform the amd and cpp:xxx metadata validation+filtering? For me, the answer is no. We create many visitors during a compilation; I don't see a reason to optimize the validation+filtering by using a single instance for everything.

when slice2cpp validates and filters metadata using a visitor, is the visiting API really so inappropriate that we need a different API? And is a single validate function the best API for the job? It's a legitimate question--I don't find the amd etc. validation in this PR particularly elegant.

And I completely agree we need to figure this out before you update all the compilers.

cpp/src/Slice/MetadataValidator.cpp

cpp/src/Slice/MetadataValidator.h

pepone · 2024-11-06T16:37:20Z

cpp/src/Slice/msbuild/slice.vcxproj.filters

    <ClCompile Include="..\Parser.cpp" />
    <ClCompile Include="..\Preprocessor.cpp" />
    <ClCompile Include="..\Scanner.cpp" />
    <ClCompile Include="..\SliceUtil.cpp" />
    <ClCompile Include="..\StringLiteralUtil.cpp" />
-    <ClCompile Include="..\FileTracker.cpp" />


seems fine to keep this changes.

Co-authored-by: Jose <pepone@users.noreply.github.com>

InsertCreativityHere added 3 commits November 5, 2024 09:59

Added metadata validator class.

8e68d5d

Added methods for setting metadata on certain elements.

8e5d579

Added validation functions for parser metadata.

b850eef

InsertCreativityHere requested review from pepone and bernardnormier November 5, 2024 18:26

InsertCreativityHere commented Nov 5, 2024

View reviewed changes

InsertCreativityHere added 2 commits November 5, 2024 13:30

Forgot to include a header.

1caf671

formatting...

7fecd38

bernardnormier reviewed Nov 5, 2024

View reviewed changes

Use class inheritance instead of function dispatch.

e153196

InsertCreativityHere requested a review from bernardnormier November 5, 2024 20:14

Good catch by @pepone.

431e143

InsertCreativityHere force-pushed the add-metadata-validation branch from 282dd17 to 431e143 Compare November 5, 2024 20:56

bernardnormier reviewed Nov 5, 2024

View reviewed changes

pepone approved these changes Nov 6, 2024

View reviewed changes

InsertCreativityHere and others added 2 commits November 6, 2024 11:39

Fix include guard.

fce08e3

Update cpp/src/Slice/MetadataValidator.cpp

aa342d3

Co-authored-by: Jose <pepone@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a General Purpose 'MetadataValidator' to the Slice Parser #3067

Add a General Purpose 'MetadataValidator' to the Slice Parser #3067

InsertCreativityHere commented Nov 5, 2024

InsertCreativityHere Nov 5, 2024 •

edited

Loading

pepone Nov 5, 2024

InsertCreativityHere Nov 5, 2024 •

edited

Loading

InsertCreativityHere Nov 5, 2024

pepone Nov 6, 2024

bernardnormier Nov 5, 2024

bernardnormier Nov 5, 2024

InsertCreativityHere Nov 5, 2024

bernardnormier Nov 5, 2024

InsertCreativityHere Nov 6, 2024

bernardnormier Nov 5, 2024

bernardnormier Nov 5, 2024

InsertCreativityHere Nov 6, 2024

bernardnormier Nov 5, 2024 •

edited

Loading

InsertCreativityHere Nov 6, 2024

bernardnormier Nov 6, 2024

InsertCreativityHere Nov 6, 2024

InsertCreativityHere Nov 6, 2024 •

edited

Loading

bernardnormier Nov 6, 2024 •

edited

Loading

pepone Nov 6, 2024

	MetadataList
	Slice::Gen::MetadataVisitor::validate(const SyntaxTreeBasePtr& cont, MetadataList metadata, bool operation)
	{
	for (MetadataList::const_iterator q = metadata.begin(); q != metadata.end();)
	{

	MetadataList validateType(const SyntaxTreeBasePtr& p, const MetadataList& metadata)
	{
	MetadataList newMetadata;
	for (const auto& m : metadata)

	void
	Slice::CsGenerator::MetadataVisitor::validate(const ContainedPtr& cont)
	{
	MetadataList newLocalMetadata;
	for (const auto& metadata : cont->getMetadata())

Add a General Purpose 'MetadataValidator' to the Slice Parser #3067

Are you sure you want to change the base?

Add a General Purpose 'MetadataValidator' to the Slice Parser #3067

Conversation

InsertCreativityHere commented Nov 5, 2024

InsertCreativityHere Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

InsertCreativityHere Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernardnormier Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

InsertCreativityHere Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

bernardnormier Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

InsertCreativityHere Nov 5, 2024 •

edited

Loading

InsertCreativityHere Nov 5, 2024 •

edited

Loading

bernardnormier Nov 5, 2024 •

edited

Loading

InsertCreativityHere Nov 6, 2024 •

edited

Loading

bernardnormier Nov 6, 2024 •

edited

Loading