Skip to content

How are macros passed names of declarations to produce? #2093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
munificent opened this issue Feb 4, 2022 · 17 comments · Fixed by #2094
Closed

How are macros passed names of declarations to produce? #2093

munificent opened this issue Feb 4, 2022 · 17 comments · Fixed by #2094
Labels
static-metaprogramming Issues related to static metaprogramming

Comments

@munificent
Copy link
Member

munificent commented Feb 4, 2022

Some macros might want to create a declaration whose name is chosen by the caller. The approach we're currently leaning towards is that the macro takes the name as an argument, like:

@CreateClass(Foo)
library;

Here, the @CreateClass macro generates a new class whose name is Foo, based on the argument passed to the macro. The argument is syntactically just a identifier expression.

When are macro arguments resolved?

That raises the question of what that Foo expression means. In most cases, the macro argument expression is treated as an actual expression that when interpolated into generated code produces the result of evaluating that expression. For example:

var foo = 'global';

@GeneratePrint(foo)
void f() {}

Here, this contrived macro generates a function body that looks like:

{
  var foo = 'local';
  print(<macro argument>);
}

In the example above, it generates this for f():

{
  var foo = 'local';
  print(prefix1.foo);
}

Where prefix1 is an import prefix of an import to this same library so that the expression reliably refers to the actual top-level foo declaration. This way it doesn't inadvertently instead evaluate to a reference to the local variable foo in the body generated by the macro.

In other words, identifiers in macro arguments expressions are interpreted as resolved references to actual declarations and not simply meaningless syntactic identifiers that are resolved after they get inserted into generated code.

Macro arguments for names

That interpretation doesn't naturally make sense for the first example. In that example there is no Foo declaration that the macro argument can resolve to. It doesn't exist.

The current thinking is that that's OK. The implementation of the @CreateClass will be careful to access the name of that identifier argument and insert that as bare syntax into the class declaration it's generating. Then the macro generates a class with that name. And now after the macro has run, the identifier expression being passed to @CreateClass does exist. Because @CreateClass created it. Once the macro is done, now the identifier can be resolved and things like go to definition in an IDE can take you from that macro argument to the generated class.

This feels honestly pretty sketchy to me. I understand that it's consistent with other places in hand-authored code where users can refer to identifiers that won't exist until macros have run. But this feels different because the identifier that refers to a non-existent declaration is passed to the very macro that creates it.

We're passing an argument that isn't meaningful until after the macro receiving it has run. It looks like we're passing a reference to the thing being declared to the macro but we're actually passing a reference to the thing it will declare, which the macro then conveniently conjures into being.

Macro arguments for non-top level names

Let's consider a different case:

@GenerateMethod(foo)
class C {}

Here, the @GenerateMethod macro takes an identifier argument. It creates an instance method with that name and declares it in the class. Does this work? Unfortunately, no.

Because the model we have is that identifier arguments to macros are resolved expressions. We may need to defer resolving them until after macros have run (so that the declaration they resolve to exists) but the assumption is that eventually they can be resolved. But that's not the case here. Even after the macro runs, there will be no top level declaration named foo that the macro argument expression can resolve to. There's only an instance method, but that name isn't in scope. So after all the macros run, this program will have a compile time error in the macro application because the foo argument refers to an unknown declaration.

This implies that you can't pass a name to a macro as an identifier expression unless it produces a top level declaration with that exact name. You'd have to instead do something like:

@GenerateMethod('foo')
class C {}

Or maybe:

@GenerateMethod(#foo)
class C {}

An analogous problem is when a macro does produce a top level declaration but the name isn't verbatim identical to the identifier being passed. If the macro was to modify the name in any way (for example, functional_widget capitalizes the name), then using an identifier expression no longer works.

This seems like a footgun to me.

Two interpretations

Overall, I feel like we are on a very unstable foundation with regards to what the arguments passed to a macro mean. I can see two straightforward models:

  1. An argument is a thunk, a deferred wrapped expression that can be injected into generated code and will evaluate to the same thing that the expression would if you were to execute it eagerly where it appears at the macro application site. This is consistent with how metadata annotation arguments work. They are expressions evaluated right there. (Not quite the same, though, since macro arguments don't necessarily have to be const expressions).

    Since macros work at the metalevel, we can't actually eagerly evaluate the expression to a value before passing it to the macro. Instead we pass an object representing the code for an expression that will produce that value if inserted into code where an expression is expected.

  2. An argument is pure syntax. It's more or less just an AST node. When it gets interpolated back into generated code, it means whatever it would mean at that location. A macro can introspect over it and do whatever it likes with it (subject to how transparent we want the Code API to be). For example, if it's an identifier, the macro could decide that that identifier means a class name, an instance method name to define, a function to call, or just an arbitrary piece of string to print.

When using identifier expressions as arguments to specify the name of a generated declaration, it feels to me like we are mixing these two together in a way that gives me the heebee jeebees. It might technically work if a macro is carefully authored such that a given macro argument does behave as it would under both interpretations, but that feels like a brittle boundary.

Thoughts, @jakemac53 @srawlins @leafpetersen (or anyone else who wants to chime in)?

@munificent munificent added the static-metaprogramming Issues related to static metaprogramming label Feb 4, 2022
@leafpetersen
Copy link
Member

@CreateClass(Foo)
library;

This feels off to me as well. I personally would expect, in that case, to be passing something which is purely syntactic, like a string or a symbol. Foo here is not a reference to anything, it's just a name.

Is there not a sort of option 3 inherent here, which is that the argument is a resolved ast node, from a restricted grammar? That is, I can imagine saying that arguments to macros must be one of:

  • Resolved identifier (can interpolate reliably without capture)
  • String (can do string things)
  • Number (can do number things)
  • .....

@munificent
Copy link
Member Author

That is, I can imagine saying that arguments to macros must be one of:

  • Resolved identifier (can interpolate reliably without capture)
  • String (can do string things)
  • Number (can do number things)
  • .....

Yeah, I think this is basically my option 1, I just didn't spell out the special support for other literals. The key question is whether an identifier is resolved or not. If it is, then it's weird to use identifiers as a way to pass in the name of a declaration you'll create because at the point in time that the macro is executed, the name isn't resolved. If it's not resolved, then it breaks the other use cases we have where interpolating an argument expression into generated code should avoid capture.

@leafpetersen
Copy link
Member

eah, I think this is basically my option 1, I just didn't spell out the special support for other literals. The key question is whether an identifier is resolved or not. If it is, then it's weird to use identifiers as a way to pass in the name of a declaration you'll create because at the point in time that the macro is executed, the name isn't resolved.

This seems right to me. Note that allowing unresolved identifiers also seems error prone to me. It feels to me to be part of the contract of the macro: either "this macro expects to receive a pointer to an identifier which is in scope, and please give me a static error if I typo it" or, "this macro expects to receive a string which will be used to generate a specific new name". Having a macro with the first contract silently accept an unresolved identifier and then just go off the rails feels maybe bad?

@munificent
Copy link
Member Author

munificent commented Feb 4, 2022

Here's a strawman solution: We could support something like "generated declarations". This is a new kind of declaration syntax that statically specifies the name of the declaration, but delegates generating the implementation to a macro. We'd have syntax for the various kinds of declarations: class, function, member, etc.

Something like (complete strawman syntax):

class MyWidget = SomeClassMacro();

class C {
  void foo() = @SomeMethodMacro();
}

Here, SomeClassMacro macro is a macro class that is responsible for filling in the header and body of the class. Likewise, SomeMethodMacro is a macro class that fills in the body of the method. It can probably generate or modify the signature too. I'm being hand-wavey, but you get the idea.

The important part is that now you aren't passing the name to the macro as an identifier expression argument. Instead, it's written explicitly using a declaration syntax. It's clear to the macro implementation and the human reader that this application creates a declaration with that name instead of being passed one.

The macro implementation implicitly knows to create a declaration with that name from the generated code the macro returns. (If the macro also needs to know the name for its own purposes, like creating constructors, it can be provided through an API.)

For cases where a macro needs to do some more interesting computation to generate a name, then it can still produce declarations with arbitrary names. If that process needs some kind of parameter, it gets passed to the macro as a string literal. So a macro that produced a class with a given name, but capitalized the name, would do:

@LoudClass("someClass")
library;

What this means then is that arguments passed to macros are either:

  1. Literals of primitive types. Macros can see those as values. They can do whatever they want with them, including interpolating strings into generated code as raw syntax.
  2. Code objects wrapped resolved expressions. Totally black box. Macros can only use them to interpolate them into generated code where an expression is expected.

@Levi-Lesches
Copy link

Here's a strawman solution: We could support something like "generated declarations". This is a new kind of declaration syntax that statically specifies the name of the declaration, but delegates generating the implementation to a macro.

I'm not too experienced in this area, but this sounds like a job for external.

/// Here we can just make an empty class that the macro can normally augment.
@SomeClassMacro()
class MyWidget { }

class C {
  /// Here we declare the method external and the macro can take over.
  @SomeMethodMacro();
  external void foo();
}

@jakemac53
Copy link
Contributor

jakemac53 commented Feb 4, 2022

@CreateClass(Foo)
library;

I don't think we need to endorse this pattern, but I do think we should support it, if only because support for it will naturally fall out of the general support for Identifier objects as arguments.

We do have a concrete need for the latter - consider for instance mockito:

@GenerateMocks([ThisType, ThatType])
library my_library;

This needs to be able to resolve ThisType and ThatType, in order to generate the mocks for those types. That requires us to give access to the actual Identifier objects for those types.

Once we add that support, somebody can just as easily use it to take a non-existent Identifier, and generate it (by just grabbing the name off the identifier). This should not be generally problematic - you can reference generated identifiers anyways anywhere else in code, so why not in macro annotations?

Two interpretations

We should have both of these, there are good use cases for both. If you want an arbitrary piece of "syntax" you should express that by accepting a parameter of type Code. If you want an introspectable identifier, you should express that with a parameter of type Identifier. And we likely will want support for some form of type literals as well.

This feels off to me as well. I personally would expect, in that case, to be passing something which is purely syntactic, like a string or a symbol. Foo here is not a reference to anything, it's just a name.

It is a reference to something though. It is just a reference to a generated thing, and it will even be clickable through to the declaration (it is no different than any other Identifier pointing at any other generated declaration). The fact that it was generated directly as a result of this macro is a bit interesting (and possibly unexpected) I agree, but I don't think we should go out of our way to block it, unless it causes implementation or specification problems.

Macro authors can also choose to use a Symbol or String instead, it's up to them.

Yeah, I think this is basically my option 1, I just didn't spell out the special support for other literals. The key question is whether an identifier is resolved or not. If it is, then it's weird to use identifiers as a way to pass in the name of a declaration you'll create because at the point in time that the macro is executed, the name isn't resolved. If it's not resolved, then it breaks the other use cases we have where interpolating an argument expression into generated code should avoid capture.

Identifiers are not resolved, they are resolve-able, and only after phase one. This is a key aspect of how they work. An Identifier passed to a macro constructor is no different than any other Identifier. In phase one macros can only read its name, and thus they could define a new class (or member) with that name. In phase 2 and beyond they can attempt to resolve it (note that we only allow resolving type identifiers today, attempting to resolve a different kind of identifier will fail).

If by the time macros are done running any identifiers in the library cannot be resolved, then that is an error, and should be reported as such (there is nothing special about identifiers in macro applications here).

@munificent
Copy link
Member Author

I don't think we need to endorse this pattern, but I do think we should support it, if only because support for it will naturally fall out of the general support for Identifier objects as arguments.

I don't think we get support for this "for free". Right now, the semantics of arguments passed to macro invocations are underspecified and pinning that down will mean either explicitly figuring out how to define valid semantics for this use case, or ruling it out.

We should have both of these, there are good use cases for both. If you want an arbitrary piece of "syntax" you should express that by accepting a parameter of type Code.

I don't think we can support both use cases. Consider:

@MyMacro(foo)
library;

There is no foo already declared anywhere. Also, MyMacro does not declare it. Is there an "unresolved identifier" compile error in this program after all macros have run or not?

To support case 1 where all arguments are understood to be eventually-resolvable real code, the answer must be "yes". To support case 2 where you can treat arguments as pure syntax, the answer must be "no". I think we have to pick.

We do have a concrete need for the latter - consider for instance mockito:

@GenerateMocks([ThisType, ThatType])
library my_library;

This needs to be able to resolve ThisType and ThatType, in order to generate the mocks for those types. That requires us to give access to the actual Identifier objects for those types.

Our use cases may have to meet the language in the middle. We don't have to exactly support Mockito's current API and way of doing things, we just need some usable way for Mockito to use macros. We could potentially have Mockito look something like:

@Mock
class MockThisType implements ThisType {}

@Mock
class MockThatType implements ThatType {}

Or maybe:

class MockThisType = @Mock;

@Mock
class MockThatType = @Mock;

(In this latter example, @Mock would presumably generate the implements clause by parsing the name of the mock class.)

Supporting a lot of use cases is definitely important, but we have to balance that against not contorting the language into something we'll struggle to maintain and users will struggle to understand. Just because we can technically implement something doesn't mean it's a good design. JavaScript specified with and VMs implemented it. It was still a bad feature.

Macro authors can also choose to use a Symbol or String instead, it's up to them.

My point is that it's not up to them. If the declaration they are generating doesn't end up in the top level scope, then if they try to accept an identifier, they'll end up with an unresolved name compile error after macros run.

Identifiers are not resolved, they are resolve-able, and only after phase one. This is a key aspect of how they work.

Sure, I get how the system works. I think it's entirely reasonable (and necessary) to say that code in a library that has macro applications may have unresolved names before the macro applications have run. But it seems like a big stretch to say that arguments passed to macros themselves may contain unresolved names. Once a piece of code is an actual input to a macro which may introspect over it as a value, then it seems very sketchy to me to have that value possibly depend on the output of other macros or even the macro that it was itself passed to.

Even if it's technically possible for us to specify and implement this, I don't think it's a particularly usable feature. One of our goals with macros is that users can read code and for the most part understand what it means. But with what we're talking about here, an identifier passed to a macro could mean very different things, entirely up to the macro's discretion.

It seems like a base level of usability is that someone reading a macro application should know which things are inputs and which are outputs. Consider:

@MyMacro(Foo, Bar);
library

This macro could look up Foo to introspect over it and then create a class named Bar. Or it could look up Bar to introspect over it and create a class named Foo. Or it could create a type Foo and a top-level function Bar. Or it could concatenate them to make a class named FooBar. Or it could introspect over both. Or it could generate a function that just prints the result of evaluating both of them. Or it could do any of those sometimes and do other things other times based on other arguments to the macro.

It feels like we've reinvented dynamic scoping, but worse.

@jakemac53
Copy link
Contributor

There is no foo already declared anywhere. Also, MyMacro does not declare it. Is there an "unresolved identifier" compile error in this program after all macros have run or not?

To support case 1 where all arguments are understood to be eventually-resolvable real code, the answer must be "yes". To support case 2 where you can treat arguments as pure syntax, the answer must be "no". I think we have to pick.

Sorry yes I misread option 2 there, we should go with option 1 imo.

So not actually "pure syntax", but actually a valid piece of code, resolved where it was written (not where it is interpolated). To the macro author it is an opaque chunk of Code that they can pass to other Code objects.

If you want pure syntax, resolved wherever it is interpolated into a Code object, just accept a String.

@jakemac53
Copy link
Contributor

jakemac53 commented Feb 4, 2022

I don't think we get support for this "for free". Right now, the semantics of arguments passed to macro invocations are underspecified and pinning that down will mean either explicitly figuring out how to define valid semantics for this use case, or ruling it out.

I was digging around and we do actually have the scope specified for Code objects here. This needs to move (or be re-iterated) in other sections I think. It says:

Any bare identifiers in the argument expression are converted to Identifier instances whose scope is the library of the macro application.

I think this should also be tweaked to say the scope is the scope of the macro annotation itself - so if it is on a member of a class you could reference static members unqualified. In other words, the scope is the same as if it was a normal annotation.

Supporting a lot of use cases is definitely important, but we have to balance that against not contorting the language into something we'll struggle to maintain and users will struggle to understand. Just because we can technically implement something doesn't mean it's a good design. JavaScript specified with and VMs implemented it. It was still a bad feature.

I don't see this feature (type literals and identifiers as macro constructor parameters) as being something we will regret or something that will be difficult to implement. We already have to support Identifiers in general, and a TypeLiteral is just really a bag of identifiers?

So while I agree we can't support everything, I think this feature has a good cost/benefit tradeoff.

My point is that it's not up to them. If the declaration they are generating doesn't end up in the top level scope, then if they try to accept an identifier, they'll end up with an unresolved name compile error after macros run.

The macro authors do know what they expect to generate. If they aren't going to generate something into the top level scope matching the identifier they are given, then yes they cannot accept an Identifier as the name for that thing. The identifier won't resolve to the correct place. Also if the identifier given is a private name that wouldn't work (the name they generate would be private to the augmentation, and not visible to the library).

Those are all good reasons for them not to accept an Identifier and use it in this way, and we could document the pitfalls, I just don't think we need to try to totally block them from doing it. That would actually be more complicated than allowing it as far as I can tell.

But it seems like a big stretch to say that arguments passed to macros themselves may contain unresolved names. Once a piece of code is an actual input to a macro which may introspect over it as a value, then it seems very sketchy to me to have that value possibly depend on the output of other macros or even the macro that it was itself passed to.

What is the distinction between arguments passed in the constructor, versus arguments passed to a method of that macro? To run a macro we pass it unresolved Identifier objects as well.

Even if it's technically possible for us to specify and implement this, I don't think it's a particularly usable feature. One of our goals with macros is that users can read code and for the most part understand what it means. But with what we're talking about here, an identifier passed to a macro could mean very different things, entirely up to the macro's discretion.

I think it is concretely very useful to be able to reflect over the types you are given in a macro constructor. This is a common practice in code generators.

If it isn't feasible to implement then the implementation teams can push back, but I see no reason to believe it would be any less feasible than providing an Identifier instance to a phase 1 macro.

It seems like a base level of usability is that someone reading a macro application should know which things are inputs and which are outputs. Consider:

People can write any manner of bad, unreadable, unusable code and/or apis. That is and always will be the case. I don't think we should block known useful patterns just because you could technically abuse them in a way users might not like. If you do that, people just won't use your package.

@Levi-Lesches
Copy link

@munificent:

But it seems like a big stretch to say that arguments passed to macros themselves may contain unresolved names. Once a piece of code is an actual input to a macro which may introspect over it as a value, then it seems very sketchy to me to have that value possibly depend on the output of other macros or even the macro that it was itself passed to.

@jakemac53:

What is the distinction between arguments passed in the constructor, versus arguments passed to a method of that macro? To run a macro we pass it unresolved Identifier objects as well.

It seems like there are two different types of "arguments" you're talking about here:

  1. Values passed to the macro's constructor ("arguments passed to the macros themselves")
  2. Identifiers the macro can introspect during code generation ("arguments passed to a method of that macro")
class ClassA { int get temp => 0; }

/// There are (at least) two values here the [Mock] macro can see: `"Mocked instance of ClassA"` and [ClassA].
@Mock(debugName: "my test class")
class ClassB implements ClassA { }

@Mock()  // to show what happens with no arguments
class ClassC implements ClassA { }

// ----- generates ----- 
class classB implements ClassA {
  int get number => 0;
  String toString() => "Mocked instance of my class";  // name is overriden
}

class ClassC implements ClassA {
  int get number => 0;
  String toString() => "Mocked instance of ClassA";
} 

I think that's an important distinction to make because of the mental model users have when working with macros and code generation. Conceptually, a macro -- as currently proposed -- is a class that modifies a declaration. Slapping a macro on a class, library, function, or variable declaration can modify or add to that item. The macro needs to see what it's modifying or adding to ("augmenting").

But sometimes, the macro needs some more info not found in the source code. In my above example, it's the name I want ClassB to return in its .toString() (I'm sure there are better examples). It wouldn't make sense to put this information in the code of ClassB itself because it's irrelevant. It's more relevant to Mock, which would otherwise generate its own .toString() that we want to modify.

So it's fair to distinguish between whether identifiers in the macro constructor are resolved versus whether identifiers in the source code are resolved. Since the source code is being modified by the macro, it makes sense for it to be unable to compile until the macro is run. But the macro constructor itself is an instruction telling the macro how to augment the declaration. It wouldn't make much sense for that to contain unresolved code, as that would make the macro's purpose itself ambiguous. If you want some string of text to be injected into the code, it would be natural to use a string literal, which we use for that purpose anyway (the only difference is I/O -- macros output code and print outputs English).

@jakemac53
Copy link
Contributor

jakemac53 commented Feb 4, 2022

It seems like there are two different types of "arguments" you're talking about here:

  1. Values passed to the macro's constructor ("arguments passed to the macros themselves")
  2. Identifiers the macro can introspect during code generation ("arguments passed to a method of that macro")

I think we really reduce the original question ( How are macros passed names of declarations to produce?) to a separate question: "Should we add Identifier as a valid type for Macro constructor parameters.".

Those objects are really at the core of what allows you to introspect on types (after https://dart-review.googlesource.com/c/sdk/+/231327 lands which is imminent).

If we allow them, then there is really no difference between 1 and 2. We already effectively have to support them anyways based on the current spec, because the Code instances you get for Code parameters must be able to contain them (the Code objects are resolved in the library scope).

The difference (I believe) is just whether we provide you direct access to those identifiers, when that is what you really want/need. Giving you that access allows you to potentially generate the thing that identifier refers to, but that is really just a side effect, and not something we need to endorse as a pattern.

So it's fair to distinguish between whether identifiers in the macro constructor are resolved versus whether identifiers in the source code are resolved.

No identifiers are resolved, you can only resolve them through a separate API, which is only provided to you in the appropriate phase. This is why it is safe to pass an Identifier which resolves to a generated declaration to a macro constructor, and why I see no need to make a distinction. It is no different than putting an identifier which resolves to a generated type on a declaration (ie: in a type annotation).

@jakemac53
Copy link
Contributor

See #2094 for my attempt at resolving this (the question of how macros should actually be passed names is still up to them ultimately, but taking an Identifier would technically be an option available).

@munificent
Copy link
Member Author

The use cases we're talking about for identifier arguments to macros that I know of are:

  1. The identifier is resolved to a type at macro execution time so that the macro can introspect over the type being referred to. This is what we want for Mockito and DI.

  2. The identifier is inserted into generated code as an expression. When evaluated later at runtime, it should evaluate to the declaration that it would resolve to at the point where the macro application appears. It may not be possible to resolve to a declaration while the macro is running because it may be produced by some other macro.

  3. The identifier is used to tell the macro what name to use for a top-level declaration the macro itself creates. After the macro runs, the identifier argument now resolves to that generated declaration.

  4. The identifier is used purely syntactically. The macro might use it as the name of a non-top level declaration or in some other way. The argument is not intended to resolve to anything, and may never resolve to anything.

Do I have that right? If so, here's where I'm at:

  1. This is a change from the initial design of the feature. When we first came up with phases, the main goal was to ensure that macros couldn't see the evaluation order of unrelated macros. If we aren't careful, allowing a macro to resolve an arbitrary identifier passed to it will break that. Consider:

    @Generate(Foo)
    @Introspect(Bar)
    class A {}
    
    @Generate(Bar)
    @Introspect(Foo)
    class B {}

    Here, @Generate creates a top level declaration whose name is its argument and @Introspect tries to resolve its argument and introspect on the resulting type. Since these pairs of macros are applied to unrelated classes, the evaluation order between the pairs is unspecified.

    In a naïve implementation, if the macros on A run first, then its @Introspect(Bar) macro will fail to resolve Bar since it doesn't exist yet. But when the macros on B run, the @Introspect(Foo) macro will succeed. If the macros are run in a different order, different failures will occur.

    We might be able to fix this by having the resolution API fail if it resolves to a top level declaration produced by another macro in the same library. In other words, you can only introspect on declarations that are either imported from another library cycle or hand-authored in the current library. I think that would support the mocking and DI use cases while still hiding evaluation order.

  2. This is fine. The identifier will get serialized with some generated prefix to ensure that we avoid capturing a local variable and everything works out.

  3. I still really don't like this. I find it to be extremely spooky to have an identifier that a user expects to be resolved and passed as an input actually be the output of the macro. When I see examples using this, it looks to me like:

    main() {
      foo(bar);
      print(bar); // Works! Because `foo` *declared* it.
    }

    If a user wants to write some code that produces a declaration with a certain name, the natural way to do that is to write a declaration with that name. If they want to end up with a class named Foo, then class Foo ... is the most familiar, clear way to say that.

    Can we consider not letting macros produce top level declarations with arbitrary user-visible names? The API would let them create top level declarations with hidden gensymed names for the macro's own use. But if a user wants a top level declaration whose name appears in hand-authored code, they write an actual top level declaration that appears in the code. They can then use a macro to fill in the definition of that declaration. We could use something like my "generated declaration" suggest or even just external like @Levi-Lesches suggests.

  4. We can't support this and the other use cases at the same time. If we assume identifiers in macrro arguments are valid expressions just like they are in metadata annotations, then this would lead to an unresolved identifier compile error. For use cases like this, the name has to be passed in as a string or symbol literal.

If I'm reading the comments right, I think we're in agreement on 2 and 4. I think we can reach agreement on 1 if you're OK with restricting the resolution API to fail on cases like I mention. I believe we all want to ensure that unrelated macro evaluation order isn't user visible, so this is just plugging an unintended hole.

So is it just 3 where we have significant disagreement?

@TimWhiting
Copy link

TimWhiting commented Feb 5, 2022

The different use-cases for resolved identifiers and unresolved syntax are distinct. This is why when I suggested 'Code' parameters on the prototype repo jakemac53/macro_prototype#26 I suggested that macros could accept regular parameters, SyntaxCode, and ResolvedCode (Called in my proposal UnresolvedExpression and ResolvedExpression, but essentially the same thing). This is similar to what the Nim language does. When the author of the macro is writing the macro, they know if they want a resolved or unresolved identifier during a particular phase and whether the syntax should make sense in the surrounding context, or is just syntax to be placed in a generated definition. I can see why this could make users confused (because if unresolved syntax parameters looks just like resolved parameters you might try to use a variable from the syntactic scope that is not valid in the generated scope), however I think the design could account for that.

Tagged strings could be used to pass syntatic code blocks that highlight properly in the IDE, but are clearly not intended to be resolved in the surrounding context, but rather in the generated context i.e.
syntax"myIdentifier"... syntax"myFunctionalWidget(){ doSomething() + doSomethingElse(); return SomeWidget();}"

As far as ordering, I would assume that both of the @Generate macros would be run in an earlier phase than the @Introspect macro, and so would not be dependent on ordering. Of course it is up to the macro author to assign the phase for the macro, but I assume that they will pick the earliest phase that makes sense and wouldn't cause ordering issues.

So I guess my suggestion is to treat the two use-cases separately and make them distinct from each other so that users understand the difference. Macros can introspect on resolvable code (not in the first phase of course), accept basic types / parameters, and finally accept tagged syntax strings, which could just be treated as strings for now, but eventually should support syntax highlighting for syntax prefixed strings. Macro authors could then parse syntax strings into an AST if they need to do more complex manipulation for translation / caching etc, and it would be nice if they could take a raw string from that parsed AST and try to resolve it in later phases, but that would be a stretch goal.

@jakemac53
Copy link
Contributor

jakemac53 commented Feb 7, 2022

Some sort of special tagged string which is understood by the IDE to be Dart syntax seems reasonable - I don't think we want it to be literally a Code object just because I don't think we want user code to know about the Code class at all - and we don't want it leaking into user programs.

We don't have plans at this point to make Code an introspectable object in general at this point (although it's still a possibility), so there wouldn't be really any benefit over just a String for unresolved code. It is much simpler to keep it as a totally opaque class which doesn't attempt to do anything with its parts until needed.

jakemac53 added a commit that referenced this issue Feb 8, 2022
#2094)

Attempt to close #2093, and #2092. Related to #2012.

- Adds `Identifier`, `List`, and `Map` as valid parameter types for macro constructors (and thus valid arguments for macro applications).
  - List and Map are allowed to have type arguments that are any of the supported types. This allows for `List<Identifier>`, etc.
- Specify the scope for identifiers in macro application arguments better (both bare and in code objects).
- Some other unrelated cleanup (can remove if desired).
  - Fixed up some old links
  - Removed the section on `Fragment` (you can just use `Code` for this now).
@jakemac53
Copy link
Contributor

jakemac53 commented Feb 8, 2022

See the PR #2094 which closed this issue. The tldr; is:

We now allow Identifier as an argument (for reasons other than this topic), which means it would be possible to create a declaration by that same name, and the passed in Identifier would resolve to it.

This pattern should likely be discouraged, because it won't work for private names, and it also might be unexpected/confusing, but we decided it would be more complex to block it than just allow it, but discourage its use.

@munificent
Copy link
Member Author

Also:

  1. If we aren't careful, allowing a macro to resolve an arbitrary identifier passed to it will break that. Consider:

    @Generate(Foo)
    @Introspect(Bar)
    class A {}
    
    @Generate(Bar)
    @Introspect(Foo)
    class B {}

    Here, @Generate creates a top level declaration whose name is its argument and @Introspect tries to resolve its argument and introspect on the resulting type. Since these pairs of macros are applied to unrelated classes, the evaluation order between the pairs is unspecified.
    In a naïve implementation, if the macros on A run first, then its @Introspect(Bar) macro will fail to resolve Bar since it doesn't exist yet. But when the macros on B run, the @Introspect(Foo) macro will succeed. If the macros are run in a different order, different failures will occur.

Jake explained to me offline that we don't have to worry about this scenario. Macros can resolve the identifiers passed as arguments, but only in phase 2 or later. So in the phase where top level declarations can be added, no identifier arguments can be resolved. That should ensure that unrelated macro application order is still hidden from users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
static-metaprogramming Issues related to static metaprogramming
Projects
Development

Successfully merging a pull request may close this issue.

5 participants