Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoping and overload resolution #5

Open
Araq opened this issue Apr 30, 2020 · 6 comments
Open

Scoping and overload resolution #5

Araq opened this issue Apr 30, 2020 · 6 comments

Comments

@Araq
Copy link
Member

Araq commented Apr 30, 2020

This issue aims to be a more precise description about how Nim's scoping rules and overloading interact and also how syntactic constructions should be resolved. The current implementation does not follow these rules, but should unless experiments show how the rules outlined here can be improved in order to match the reality so that most Nim code out there continues to work as before.

Affected code in the compiler:

  • semexprs.nim
  • semcalls.nim
  • sigmatch.nim

Simple call

For a call expression f(args) first a set of candidates C(f)
is created. Candidates are symbols in the current scope that are callable.
A routine (proc, template, macro, iterator, etc.) is callable. Variables
are callable if they have the type proc. Types are also considered callable.

Named parameters inside args lead to a further filtering step within overload
solution. Then for C(f) the overload resolution is performed as described in the manual.

Module call

For a call expression module.f(args) first a set of candidates C(f) is created. Candidates are symbols in the current "module" that are callable. A routine (proc, template, macro, iterator, etc.) is callable. Variables are callable if they have the type proc.

Note: In a first analysis step it is to be decided whether module can be interpreted as a module identifier. Variables can shadow a module name.

Simple call with explicit generic instantiation

f[a](args) Candidate set as before, but further filtered against generic routines which have the right number of generic parameters. Only if this set is empty, it is interpreted as array indexing followed by the call notation ().

module.f[a](args): Analogous to the "Module call" case.

Object call

obj.f(args)

Condition: obj cannot be interpreted as a module identifier.
Condition: obj.f does not have a field f or field f is not of type proc.

Rewrite rule: obj.f(args) --> f(obj, args)

Impossible: obj.f[T](args), instead the notation obj.f[:T](args) needs to be used. obj.f[:T](args) is rewritten to f[T](obj, args).

Object call without ()

obj.f

Condition: obj cannot be interpreted as a module identifier.
Condition: obj.f does not have a field f or field f is not of type proc.

Rewrite rule: obj.f --> f(obj)

Array indexing without ()

x[args]:

x is only converted to a symchoice if there is not
variable (or let or const) of this name.

Rule: Prefer generic instantiation if possible. Otherwise interpret the
expression as [](x, args)

@timotheecour
Copy link
Member

timotheecour commented May 11, 2020

describing sigmatch precisely would require a dedicated page in the advanced section of the manual; there are so many rules (but they mostly all make sense, apart from a few eg optional params followed by untyped doesn't work)

  • > The current implementation does not follow these rules

can you summarize in top post the difference bw this spec and current implementation?

  • one thing missing is that (currently at least) overload resolution happens left to right and is greedy; this explains the whole comment in setops.contains:
proc contains*[T](x: set[T], y: T): bool {.magic: "InSet", noSideEffect.}
  ## [...] The parameters are in reverse order! ``a in b`` is a template for ``contains(b, a)``.
  ## This is because the unification algorithm that Nim uses for overload resolution works from left to right [...] If ``in`` had been declared as ``[T](elem: T, s: set[T])`` then ``T`` would have been bound to ``char``. But ``s`` is not compatible to type ``set[char]``! The solution is to bind ``T`` to ``range['a'..'z']``. This is achieved by reversing the parameters for ``contains``; ``in`` then passes its arguments in reverse order.
  • somewhat relevant to this is how current implementation implicitly resolves 0-arg templates as a call; IMO it's a design mistake that should be fixed, and it should require a pragma {.implicitcall.} eg:
template fun(): string = "foo"
doAssert fun() == "foo" # ok
doAssert fun == "foo" # should CT error
template fun2(): string {.implicitcall.} = "foo"
doAssert fun2() == "foo" # should CT error
doAssert fun2 == "foo" # ok
  • this description should mention {.experimental:implicitDeref|dotOperators|callOperator.} (also, right now the error msg is bad when dotOperators is enabled and a sigmatch error happens); iterators (living in a different namespace; IMO a design mistake); untyped (there are some design mistakes to fix here wrt overloading + optional params followed by untyped being unsupported)

  • this description should mention getters and setters, eg:

proc `foo=`(a: var A, b: B)
proc `foo`(a: A): B

and the subtle semantics involved when resolving a.foo in a module where both a getter/setter and a field foo exists

accidental ambiguity: generics vs array indexing

  • C++ templates can't be resolved at the parsing stage because of ambiguities bw < and << operators vs < as template
  • nim generics can't be resolved at parsing stage because of mod.foo[T] vs obj.foo[T], thus requiring [: special case
  • D doesn't have this ambiguity, and identifying D templates is resolved during parsing eg foo!T foo!(T1, T2). Ugly or not is subjective, but I think it's worth considering as it would simplify things; it doesn't have to be !

@Araq
Copy link
Member Author

Araq commented May 11, 2020

Ugly or not is subjective, but I think it's worth considering as it would simplify things; it doesn't have to be !

Agreed, but the syntax [: ] is not ambiguous either.

@Araq
Copy link
Member Author

Araq commented May 11, 2020

can you summarize in top post the difference bw this spec and current implementation?

Well the implementation does it in an undisciplined manner. See for example semexprs.shouldBeBracketExpr, or this whole section of code

  of nkCall, nkInfix, nkPrefix, nkPostfix, nkCommand, nkCallStrLit:
    # check if it is an expression macro:
    checkMinSonsLen(n, 1, c.config)
    #when defined(nimsuggest):
    #  if gIdeCmd == ideCon and c.config.m.trackPos == n.info: suggestExprNoCheck(c, n)
    let mode = if nfDotField in n.flags: {} else: {checkUndeclared}
    var s = qualifiedLookUp(c, n[0], mode)
    if s != nil:
      #if c.config.cmd == cmdPretty and n[0].kind == nkDotExpr:
      #  pretty.checkUse(n[0][1].info, s)
      case s.kind
      of skMacro, skTemplate:
        result = semDirectOp(c, n, flags)
      of skType:
        # XXX think about this more (``set`` procs)
        let ambig = contains(c.ambiguousSymbols, s.id)
        if not (n[0].kind in {nkClosedSymChoice, nkOpenSymChoice, nkIdent} and ambig) and n.len == 2:
          result = semConv(c, n)
        elif ambig and n.len == 1:
          errorUseQualifier(c, n.info, s)
        elif n.len == 1:
          result = semObjConstr(c, n, flags)
        elif s.magic == mNone: result = semDirectOp(c, n, flags)
        else: result = semMagic(c, n, s, flags)
      of skProc, skFunc, skMethod, skConverter, skIterator:
        if s.magic == mNone: result = semDirectOp(c, n, flags)
        else: result = semMagic(c, n, s, flags)
      else:
        #liMessage(n.info, warnUser, renderTree(n));
        result = semIndirectOp(c, n, flags)
    elif (n[0].kind == nkBracketExpr or shouldBeBracketExpr(n)) and
        isSymChoice(n[0][0]):
      # indirectOp can deal with explicit instantiations; the fixes
      # the 'newSeq[T](x)' bug
      setGenericParams(c, n[0])
      result = semDirectOp(c, n, flags)
    elif isSymChoice(n[0]) or nfDotField in n.flags:
      result = semDirectOp(c, n, flags)
    else:
      result = semIndirectOp(c, n, flags)

and then semIndirectOp also handles nkDotExpr (which is not an indirect call at all), and re-runs semExpr if semFieldAccess produces a nkDotCall. I'm not saying that the code is wrong (it surely is most ugly though), I'm saying there is no spec for it, no clear rules how this really works.

@disruptek
Copy link

Any reason not to start with the tests? Then we can hash out all the concerns and agree on desirable behaviors.

@Araq
Copy link
Member Author

Araq commented May 14, 2020

What do you mean by "start with the tests"? We have tests, they are green. They don't help us all that much to write the spec.

@timotheecour
Copy link
Member

timotheecour commented May 14, 2020

I've clarified what I meant above by untyped (there are some design mistakes to fix here wrt overloading + optional params followed by untyped being unsupported), see nim-lang/Nim#14346

that's near the top of my wish list for sigmatch improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants