Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New quoteAst #122

Closed
krux02 opened this issue Jan 21, 2019 · 17 comments · Fixed by nim-lang/Nim#17426
Closed

New quoteAst #122

krux02 opened this issue Jan 21, 2019 · 17 comments · Fixed by nim-lang/Nim#17426

Comments

@krux02
Copy link
Contributor

krux02 commented Jan 21, 2019

The list of issues for macros.quote is long. And I think it is broken beyond repair. So my suggestion is to introduce a new source quoting mechanism that works reliably. But before getting to that part I would like to point out the biggest problem of current quote.

The problem

The biggest problem of quote do is that it tries, when it can, to capture locally defiend symbols, even though the name "quote" suggest that it takes everything literal. This causes several issues, but the biggest of them is that the programmer needs to take a lot of care about locally introduced variables. It can happen easily by accident that the quote grabs variables from the local scope when it shouldn't.

For example the following macro is supposed to grab the variable x from where it is expanded.

import macros

macro genPrintX(): untyped =
  result = quote do:
    echo x

proc main(): void =
  let x = 123
  genPrintX()

main()

Now the code evolves and might be less simple and for some reason a local variable x is introduced into the macro body. The new quoted ast binds to the local variable automatically and a weird internal error happens:

import macros

macro genPrintX(): untyped =
  let x = "mystring"
  result = quote do:
    echo x # Error: internal error: environment misses: x

proc main(): void =
  let x = 123
  genPrintX()

main()

This is not helpful. It even gets worse with the result variables. Since quote do is based internally on templates nim-lang/Nim#7791 also causes problems in macros. But the bigger problem here is, that it is not fixable. People expect that quote captures local variables. I just don't agree that it should ever be done automatically, it should only work with explicit binding of symbols or unquoting of expressions.

Another issue is that the second argument of macros.quote cannot be passed to quote at all. It was never possible. With the current syntax it can also not be made to work.

The solution

My suggestion is a new implementation of quote, lets call it quoteAst to prevent name collision, that it
doesn't rely on automatic symbol binding. It should replace the old macros.quote API. It should create the AST always literally, without trying to bind any symbols. This way it will never have problems with any locally defined symbols. It should be possible to insert variables/expressions that are evaluated in the macro local scope any any time. It should substitute generator expressions as well.

Here are some example how I imagine the API of the new quoteAst:

# a simple quote
result = quoteAst:
  echo "Hello world!"
# inject subtrees from local scope, like `` in quote do:
let world = newLit("world")
result = quoteAst:
  echo "Hello ", uq(world), "!"
# inject subtree from expression, not possible in old quote do:
result = quoteAst:
  echo "Hello ", uq(newLit("world")), "!"
# Custom name for unquote in case `uq` should collide with anything.
# Broken in quote do (never worked)
let x = newLit(123)
result = quoteAst myUnquote:
  echo "abc ", myUnquote(x), " ", myUnquote(newLit("xyz")), " ", myUnquote(arg)
# preventing local overloads of foo in the expanded code
result = quoteAst:
  echo "abc", uq(bindSym"foo")(123, "def")

Limitations

As I am working on this, I found out about a few limitations. As it seems, the uq(...) expression is not allowed everywhere. The following example does not pass the parsing step of Nim, because an unquote expression is now allowed as the name of a new type:

macro abc(name: untyped): untyped =
  result = quoteAst:
    type
      uq(name) = object
        # ...

A call is only allowed where an arbitrary expression is allowed in the language. In some places the parser expects an Identifier or a quoted identifier. Using an unquote here doesn't go through the parses. To fix this problem the parser needs to be changes.

  • Fall back to using backticks (only identifier unquoting, no normal usage of backticks in quoted syntax trees).
  • Use backticks, but allow arbitrary complex (limited) expressions in them (does not allow nesting)
  • Change the parser to allow arbitrary expressions everywhere (no idea what harm it could unleash)
  • Use the currently unused token pairs for unquoting, for example (. and .) or [. and .].

Currently I prefer the third option most. First of all, the tokens are already reserved tokens in the language. They just need to be allowed. They won't conflict with anything else in the language, so no escaping is necessary. And unquoting will be a very visible operator in the quoted ast. But the disadvantage is still that the parser needs to be changed, and the semantc checking pass needs to expect these new expression nodes and reject them.

macro fboobar(Y: untyped): untyped =
  result = quoteAst:
    let x = (.Y.) # who can resist this sexy way of unquoting?
    # ...

Lisp also uses a special operator for unquoting: http://cl-cookbook.sourceforge.net/macros.html#LtohTOCentry-2

Here the `(expr) means the same as quoteAst: expr() here and ,(expr) would be the uq(expr()) or with this change (. expr() .)

@narimiran narimiran transferred this issue from nim-lang/Nim Jan 22, 2019
@mratsim
Copy link
Collaborator

mratsim commented Jan 22, 2019

Isn't x supposed to be interpolated?

import macros

macro genPrintX(): untyped =
  let x = "mystring"
  result = quote do:
    echo `x` # mystring

proc main(): void =
  let x = 123
  genPrintX()

main()

I see the issue with the result variable and the issue that auto-capturing will cause but we only need to enforce use of the quasi-quoting symbol (which cannot be changed true).

@alehander92
Copy link

alehander92 commented Jan 22, 2019

I also wonder, is it possible to just fix ` to do the same thing uq is supposed to do in quoteAst context?

@Araq
Copy link
Member

Araq commented Jan 22, 2019

I don't like the backticks, they already have other purposes in Nim (like calling an operator symbol like an identifier). But I think the existing quote do can be fixed to use .dirty templates under the hood. It will break code though. We could have an --oldQuote switch though or a new quoteQuirky do operation that keeps the old behaviour.

@LemonBoy
Copy link

A lovely proposal that's simple, elegant and clean. I can't wait to see it implemented :)

People expect that quote captures local variables.

They do? I always found this behaviour extremely confusing and a constant source of errors.

I just don't agree that it should ever be done automatically, it should only work with explicit binding of symbols or unquoting of expressions.

100% agree on this.

Another issue is that the second argument of macros.quote cannot be passed to quote at all.

The following works, bar the obvious limitations due to how quote is implemented.

quote("@@"):
  # code...

I also wonder, is it possible to just fix ` to do the same thing uq is supposed to do in quoteAst context?

The backtick syntax is heavily limited by the lexer, eg. it will reject foo(), and becomes awkward to use pretty quickly since you have to introduce many temporary variables to hold your nodes.

@alehander92
Copy link

alehander92 commented Jan 22, 2019

@LemonBoy so , can the backtick lexer support be improved/changed? After all backticks are also used in templates, so they already have this "unquote" meaning, they also have some unexpected limitations there: I am a bit worried that we introduce too many different methods for the same thing: templates unquoting with ` , quotes(old quotes) with template-like behavior using `, quoteAst(new quotes): looking similar, but different behavior than templates using uq , manual tree building.

Otherwise, ignoring existing code it looks like a nice design(but it would be cool to also provide a simple tool that at least rewrites some quote: to new rule quoteAst: : a lot of the existing macros use quote)

@krux02
Copy link
Contributor Author

krux02 commented Jan 22, 2019

@alehander42 I have a regular expression for you that does the replacement: s/`\([^`]*\)`/uq(\1)/g

@alehander92
Copy link

alehander92 commented Jan 22, 2019

@krux02 you're correct: in this case a regexp is probably enough. in the future it would be nice if we can do custom <port_x> <filename> for autofixable syntax/api changes which appear in a lot of usages: we have the compiler api, we have nimpretty, most of the needed parts are there.

@LemonBoy
Copy link

@alehander42 The limitations don't apply in that case since you only quote "bare" identifiers there.
Changing the lexer means additional complexity has to be brought in (and bugs, the code now assumes that quoted identifiers are proper nkIdent) for, IMO, little benefit: the uq syntax is much explicit and less magic and requires no change to the compiler.

For a short period we may also have quoteAst translate x -> uq(x) to ease the transition.

@mratsim
Copy link
Collaborator

mratsim commented Jan 22, 2019

I need an interpolating symbol.

"Hello ", uq(x), "!" is too noisy.

@zah
Copy link
Member

zah commented Jan 22, 2019

FWIW, the original implementation of quote was producing raw AST. It got changed over the years.

My hope has been that we'll eventually fix the default parameter handling in macros as described here and quote will be able to have a nice flexible interface where you control the behaviour with flags.

I too think that the lexer and parser can be fixed to allow arbitrary expressions within the backticks.

@Araq
Copy link
Member

Araq commented Jan 22, 2019

@zah I think its implementation didn't change all that much, but how template works was changed and so quote got the worse behaviour, simply by our changes to template. But maybe I misremember.

@krux02
Copy link
Contributor Author

krux02 commented Jan 22, 2019

@mratsim: I am not 100% sure what you want. But touching string iterpolation (if you mean that) will not be touched by this RFC. I think that will be too much hassle for too little gain. But existing string interpolation should work within quoteAst expressions. I have a test to make the (adapted) example from #8220 work with quoteAst.

@krux02
Copy link
Contributor Author

krux02 commented Jan 22, 2019

@zah the problem with the backticks is that it cannot allow arbitrary expressions within the backticks, because backticks are unlike braces () not paired. You cannot nest them.

@rayman22201
Copy link

I get the convenience argument @mratsim,
but I have to agree with @krux02, explicit is generally better imo; this will probably result in better macros.

That type name limitation seems pretty severe. I hope it can be solved. other than that, cool.

@alehander92
Copy link

alehander92 commented Jan 23, 2019

@krux02 but why would you need to nest unquoted expressions in other unquoted code? this wouldn't be desirable anyway. the usecase is to have
`expression without further unquote`

The only thing that is a problem is stuff like escaped names, but this happens less often and it's not such a big limitation as type names and other possible invalid contexts. In this case, to have (myUnquote) and use it like a nkCall like in your example would be a good workaround indeed

@LemonBoy EDIT: You can quote a serie of idents, which is overally just weird [I don't get it: one can quote non-ident stuff too: at least in macros you can't just assume it's an ident even currently. ]

But even if that's so, it's a much less breaking change than rewriting macros to use uq instead of the quoting symbol

@zah
Copy link
Member

zah commented Jan 23, 2019

I can recall few occasions where I had to fight the limited capabilities of backticks in templates and elsewhere. This usually arises when you work with C libraries that can be wrapped in a generic interface, but the underlying C procs have different names (because C doesn't have function overloading). In this situation, it's very convenient to insert a computed identifier derived from some of your generic parameters. I've already added a little bit of support for this in the compiler, but allowing arbitrary expressions within the backticks would be a further improvement.

Here is some old code that I wasn't able to turn into a more generic interface because it was written before the enhancements:
https://github.com/status-im/nim-keccak-tiny/blob/master/keccak_tiny.nim#L46

@krux02
Copy link
Contributor Author

krux02 commented Jan 24, 2019

@LemonBoy I am interested in what you think about it now. I have an implementation now but I figured out that the original idea doesn't work out as easy as I thought it would.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants