Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should calling functions in a submodule requires a use? #13536

Closed
mppf opened this issue Jul 24, 2019 · 39 comments
Closed

Should calling functions in a submodule requires a use? #13536

mppf opened this issue Jul 24, 2019 · 39 comments

Comments

@mppf
Copy link
Member

mppf commented Jul 24, 2019

This is related to #13523 and #13524.

If we have a module and a submodule, like this:

module M {
  module L {
    proc lFunction() { }
  }
  L.lFunction(); // here
}

Should a use/import be required for L.lFunction() to work?

  • yes: L.lFunction() is not allowed without a use L / import L
  • no: L.lFunction() works without a use L / import L

Arguments in favor of "yes":

  • it would make use and import actually the only things that interact directly with module names - although some of these would then re-export the module name. This might make it easier/more robust to do things like import M as OtherModule.
  • it makes modules more consistent, because they always require import / use, submodule or no.
  • it might be clearer to require the import / use when the submodule is defined in another file (a-la Submodules in different files #13524) and this would give a clear place to indicate public import vs private import.

Arguments in favor of "no":

  • When inspecting the source code, it seems apparent that the module L declaration creates a symbol that is visible and in scope within M. Changing this would make the module declarations less consistent with variable, type, and proc declarations.
  • In the submodule case specifically, renaming modules isn't that interesting, because presumably the module author is also the author of submodules
  • It would feel strange for the M using things in L to require a use but not for L using things in M to do so, e.g.:
 module M {
   proc mFunction() { }

   module L {
     proc lFunction() { }
     ... M.mFunction() ... mFunction() ... // can see mFunction because of nesting
   }

   import L;
   L.lfunction(); // not legal without the import above?
 }
@mppf
Copy link
Member Author

mppf commented Jul 24, 2019

Regarding this reason for "yes"

  • it makes modules more consistent, because they always require import / use, submodule or no.

Early in my history of learning Chapel, I was personally pretty confused by the fact that putting many modules in a single file actually makes them submodules within an implicit module, i.e.

// in M.chpl
module L {
  proc lFunction() { }
}
module N {
  proc main() {
    L.lFunction();  // why does this work?
  }
}

I remember being very surprised by L.lFunction() working, since if L and N were in separate files it would not work. It works because M.chpl makes an implicit module M, which contains L and N. Therefore N can see L through its parent module. I think new Chapel programmers regularly don't understand the implicit file-level module.

Choosing "yes" here would reduce the potential for this particular case of confusion. (another strategy to handle that case of confusion would be to create more errors for implicit modules containing certain patterns - as with #6813 - e.g. implicit modules that contain only multiple module statements).

@mppf mppf changed the title Calling functions in a submodule without any use Should calling functions in a submodule requires a use? Jul 24, 2019
@lydia-duncan
Copy link
Member

I'm following this conversation, but I'm not sure I have a clear opinion just yet

@e-kayrakli
Copy link
Contributor

My immediate response was to draw parallels from enums and what use means for them. I understand that it is somewhat different, but I think they should be more or less consistent.

enum TestEnum { Foo, Bar }

writeln(TestEnum.Foo);  // you can do this without `use`
use TestEnum; //You need the `use` to use the enum without its name
writeln(Foo);

So, maybe a module could use its submodules functions as long as the submodule name is there, but with use/import they can just directly use it?

@mppf
Copy link
Member Author

mppf commented Jul 24, 2019

@e-kayrakli - I don't think it makes any sense to import an enum; the only useful thing to do is use it to bring the symbols within the enum into scope. I've started thinking that perhaps we want a difference between import and use that would arguably address your consistency desire here with the enum case.

import L; would make the module L visible itself, so you could e.g. L.someFunction. In contrast, I'm thinking use L; would bring the contents of L into scope without affecting whether or not L itself was visible.

In that event, if you want to write a call like L.someFunction(), you would import L in some way so that L was available as a symbol. But import will not exist for enums, since there is nothing to do, and so there is no need to import TestEnum in order to use TestEnum.Foo. However use TestEnum and use L would continue to be corresponding features.

@BryantLam
Copy link

BryantLam commented Jul 31, 2019

Some thoughts. My main argument for "no" is that it would be annoying for library authors.

public module MyLibrary {
  public module Component {}
  private module Details {}

  proc callApi() {
  }

  proc callApi2() {
  }
}

If I was disciplined, I would put my import declarations in each callApi function depending on what was needed from the submodules. But if I have 100 of these calls, it's more effective to put them at module scope. Then, my submodule definitions effectively become:

module X {
  import Y; module Y {
    // ...
  }
}

and the problem explodes from there for each descendant submodule.

In favor of "yes" is that the usage of modules would be more consistent, but I think it'll just be annoying / a usability issue.

Also minor note: renaming via import M as OtherModule would work whether import becomes a requirement or not. I don't see that as a valid argument for "yes".

@mppf
Copy link
Member Author

mppf commented Jul 31, 2019

Also minor note: renaming via import M as OtherModule would work whether import becomes a requirement or not. I don't see that as a valid argument for "yes".

If M is always available as in M.someFunction() then how can import M as OtherModule; prevent collisions with another variable/module called M? M will still refer to the submodule, it's just that now OtherModule is another name for M.

Vs. if the import is required, OtherModule would be the only name for it after that import M as OtherModule; .

@BryantLam
Copy link

Ah thanks. I didn't understand what argument you were making. That's a good point, though I agree with you that it's somewhat a weak argument as the authors of these conflicting modules are usually the same person/team.

@bradcray
Copy link
Member

As a point of clarification, it wasn't entirely clear to me in the OP what the "yes" and "no" positions referred to since the original question was phrased as "Should we do a, or b?" (and in fact, I think they are backwards from my assumption which is that "yes" would mean "we should do a").

Starting with the "no" arguments:

because presumably the module author is also the author of submodules

I don't know if I buy this argument. If a sub-module can be defined in another file as we hope, it seems just as likely that I would pull in a non-inline (sub)module that someone else had written as write it myself (?).

It would feel strange for the M using things in L to require a use but not for L using things in M to do so

Is that strange? Can't L see things from M simply through lexical scoping?

To me, the main argument in favor of "no" (assuming it means "you can't simply refer to the submodule") is that by inspecting the source code, it seems apparent that the symbol in question is visible and in scope, so it seems a little artificial not to be able to refer to it. I think of var x as introducing a new variable symbol 'x' that I can refer to if it's in scope; of proc foo() as introducing a new procedure symbol 'foo' that I can refer to if it's in scope; and (traditionally) of module M as introducing a new module symbol 'M' that I can refer to if it's in scope. So to choose the "yes" path makes module statements seem non-orthogonal to other symbol-declaration statements in this regard. (I think that this observation is similar to Engin's about orthogonality with enums above. The enum color { ... } statement gives me a new symbol named color that I can refer to directly if it's in scope without any additional work).

It works because M.chpl makes an implicit module M, which contains L and N. Therefore N can see L through its parent module. I think new Chapel programmers regularly don't understand the implicit file-level module.

I hate to say it, but it sounds like you still don't have this rule right. :) The compiler only inserts an implicit module in the case that a file contains module-level statements (e.g., any statements other than module declarations and comments) at file scope. Such statements have to belong to some module, so the compiler treats the file as their module (which also supports conveniences like being able to write an entire program as file-scope code as in scripting languages). If the file only contains module declarations and comments at file scope, no implicit module is inserted. So code like your example above:

M.chpl

module L { }
module N { }

results in a module namespace hierarchy like this:

./  # global scope
  L/
  N/

whereas if you were to write:

M.chpl

writeln("Hi");
module L { }
module N { }

then you'd get the module hierarchy:

./  # global scope
  M/
    L/
    N/

That said, I think your argument here probably still stands in that the reason your program wouldn't work if the modules were in separate files is that there's nothing in your definition of N or chpl command-line arguments suggesting that a module L is required and should be searched for. When they're in the same file, L is known simply by virtue of the fact that you asked the compiler to compile a module named L by handing it its source code (and doing so puts it into the global scope as illustrated above, so N can see it, as with any other top-level module).

[For historical context: The most traditional point of confusion w.r.t. implicit modules was that users coming from C tended to equate them with #include statements, so would put use statements at file scope like this:

use M;
module R {
  writeln("In R");
}

and then be confused that R wasn't a top-level module. We put in a warning for this case based on user feedback similar to (and prior to) PR #6813 that Michael refers to above. I feel like the amount of confusion around implicit modules has dropped significantly since these changes; i.e., I'm not aware of cases where additional warnings like this would prevent confusion, though I could also imagine that any program that contains both top-level code and a module statement might be worthy of an informative note saying "the compiler inserted an implicit module M to represent your file-scope code for file M.chpl. Style-wise, I'd say that code like this really should include an explicit module rather than relying on a file-scope module, similar to the approach I believe we've taken for the modules/ directory.]

On the "yes" side, you say:

it would make use and import actually the only things that interact directly with module names

Maybe I'm not understanding what you mean by "interact directly with", but if I wrote:

use M except *;
M.foo();

it seems to me as though M.foo() is interacting with the module name (?).

@mppf
Copy link
Member Author

mppf commented Aug 14, 2019

As a point of clarification, it wasn't entirely clear to me in the OP what the "yes" and "no" positions referred to since the original question was phrased as "Should we do a, or b?" (and in fact, I think they are backwards from my assumption which is that "yes" would mean "we should do a").

Sorry that was confusing, "yes" and "no" were answering the question in the title. I've updated it to hopefully be clearer.

To me, the main argument in favor of "no" (assuming it means "you can't simply refer to the submodule")

Now I got confused by the parenthetical... But, I updated the "no" section with your argument.

I hate to say it, but it sounds like you still don't have this rule right. :)

Thanks for correcting me :) I think this part of the issue has more to do with #13523 and so I've moved further discussion of that example to #13523 (comment) .

Maybe I'm not understanding what you mean by "interact directly with", but if I wrote:

use M except *;
M.foo();

it seems to me as though M.foo() is interacting with the module name (?).

Right, but I'm saying the M in M.foo refers to a symbol that is created by the use statement. In other words, my viewpoint is that use creates local symbols for the imported things. (I know it is not implemented that way, but I think that's a reasonable conceptual viewpoint).

Note that if we did my proposal in #13119 (comment) this code would have to be

import M;
M.foo();

since use would not actually create a local symbol called M.

Put another way, the reason that M.foo() works is that import M was present. This gives import M the opportunity to control the visibility of M if this code is in a module used by other code.

Along these lines, one reason it is appealing to require submodules be import/used has to do with module name collisions and was discussed above in #13536 (comment) . The point is that if the submodules are automatically visible (the "no" case above) then programmers won't be able to resolve naming conflicts with e.g. import L as MyL. Because even if they did, that submodule will always also still be visible as L.

@mppf
Copy link
Member Author

mppf commented Aug 14, 2019

I was curious about what Python does here. As I understand it, in Python, you can (only) declare a submodule as something within a package. And, once you do that, the submodule is only visible within the parent module/package if there is an import:

spam/foo.py

def Foo():
  print("In Foo")

spam/__init__.py

Foo.foo()
$ python
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import spam
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mppf/pythonex/spam/__init__.py", line 1, in <module>
    foo.Foo()
NameError: name 'foo' is not defined

This can be fixed by changing spam/__init__.py to either

import spam.foo
foo.Foo()

(using the fact that package paths are not relative by default) or

from . import foo
foo.Foo()

using an explicit relative import.

The Python example makes me wonder if there is a connection between the choice in this issue and whether or not imports are relative or absolute by default... but I don't see a tight connection between these two.

@bradcray
Copy link
Member

bradcray commented Aug 22, 2019

I'm saying the M in M.foo refers to a symbol that is created by the use statement. In other words, my viewpoint is that use creates local symbols for the imported things.

I've been mentally playing with this idea (as suggested by my latest comment on #13523) and am trying to get my head around what its implications would be. As a result, this comment is going to span a few of these related issues a bit (but I'm putting it here since this is where I first grokked Michael's idea). Starting with the case of sibling modules from #13523:

module M {
  proc foo() { } 
}
module N {
  M.foo();
}

Traditionally, we've said that N could refer to M by virtue of the fact that the module M declaration is visible in its lexical scope. Here, I think you're saying that only use statements (and import statements if/when they come along) can refer to module X symbols via normal scoping rules, so M is essentially invisible to N until we add a use M; (arguably equivalent to use M as M;) within module N's declaration:

module M {
  proc foo() { } 
}
module N {
  use M /* as M */;
  M.foo();
}

at which point the M in M.foo() refers to the M introduced by the use statement which in turn can access the M in module M. This essentially gives us rule 1 in issue #13523. Conversely, leaving the use M; off makes the M.foo(); fail because we don't find any traditional symbols (records, classes, enums, procedures, etc.) named M in the lexical scope and can't refer to the M introduced by module M directly.

By analogy, this suggests that submodules could not refer to their siblings without a use:

module M {
  module S1 {
    proc foo();
  }
  module S2 {
    S1.foo();  // requires a `use S1 /* as S1 */;` in order to be resolvable
  }
}

(where today they can by similar rules: "looking up in my scope, I see module S1 so can refer to S1").

Presumably, we'd make a special case for a module's contents being able to refer to itself (?). That is, the following would be legal:

module M {
  proc foo() { ... }
  M.foo();
}

even though I haven't done a use M within module M? Maybe we could think of the declaration module M { ... } as implying an automatic self-use? That is, it's equivalent to:

module M {
  use M /* as M */;
  ...
}

which also arguably rationalizes why a sub-module could refer to its parent module's symbols:

module M {
  module S {
     M.foo();  // OK due to implicit `use M;` within `module M`
  }
  proc foo() { ... }
}

I think this pattern suggests to me that using a sub-module's symbols would also require a use statement since the code could not refer to the sub-module's symbol without one. Thus:

module M {
  // implicit `use M as M;` here
  module S {
    // implicit `use S as S;` here
    proc foo() { ... }
  }
  S.foo();  // I can see `module S` but that isn't good enough to let me refer to `S`, so error
}

and:

module M {
  // implicit `use M as M;` here
  module S {
    // implicit `use S as S;` here
    proc foo() { ... }
  }
  use S;
  S.foo();  // I can see the `S` introduced by `use S` which in turn refers to `module S` so this works
}

So that suggests to me taking the "yes" approach for consistency. And I do think that it helps with the "submodules may be defined in other files and be invisible-ish" problem (more on that in a sec).

Then again, as Bryant notes, it may get really annoying and really old fast.

If we went with this philosophy (nothing can refer to a module's symbols other than use/import) and I had to decide today, I'd choose "yes", knowing that I could relax to "no" later if/when I had more experience with it and was being driven out of my mind (which might be as soon as I had to update all of the existing tests?)

A somewhat middle ground solution (that might be too weird to argue passionately for) might be to say that the implicit use S that appears within any module declaration goes not right after the module S declaration itself, but after the outermost module declaration within the same file.

Thus, given:

M.chpl:

module M {
  module S {
  }
  // auto-inject-files-from-the-magical-directory;
}

and N.chpl (which happens to be in the magical directory):

module N {
}

would turn into:

module M {
  // implicit `use M as M;` -- put here since it's the topmost SW module in M.chpl where M was defined
  // implicit use S as S except *;  — put here since it's the topmost SW module in M.chpl where S was defined; the except `*` is necessary to avoid making all of S's contents directly available.
  module S {
  }
  module N {
    // implicit use N as N;  — put here since it was the topmost SW module in N.chpl where N was defined
  }
}

Thus, within M's scope, we could refer to symbols within M directly and to M and S, but we would have to use N in order to refer to N's symbols. This would also be a good way to prevent hijacking and surprises by magically using files from a directory whose contents you weren't really all that familiar with (as would the straight "yes" approach, obviously).

@BryantLam
Copy link

BryantLam commented Aug 28, 2019

If we went with this philosophy (nothing can refer to a module's symbols other than use/import) and I had to decide today, I'd choose "yes", knowing that I could relax to "no" later if/when I had more experience with it and was being driven out of my mind (which might be as soon as I had to update all of the existing tests?)

I'd be okay with this approach. I don't mind a more restrictive implementation that leaves room to relax later. Once you break submodules into separate files, the consistency makes sense, if only to make certain to the compiler which modules or submodules it should look to use. In fact...

Thus, within M's scope, we could refer to symbols within M directly and to M and S, but we would have to use N in order to refer to N's symbols. This would also be a good way to prevent hijacking and surprises by magically using files from a directory whose contents you weren't really all that familiar with (as would the straight "yes" approach, obviously).

This observation is incredibly relevant to #13524 (comment). If we "inject-known-dir", one way to write that statement (instead of include or inject) is to simply state module N; to denote that N exists and the compiler should look for it. But if the user is forced to still use N; to access N's symbols, how is that any different? This is the crux of the argument for one of the proposals in Rust's Revisiting modules, take 3 from my collection of links in #12923 (comment). I highly recommend reviewing this blog post in its entirety, but for brevity, the relevant section is repeated here:

Modules

Mod statements would no longer be necessary to pick up a file [sic] a new file in the crate. Instead, rustc would walk the files it knows to walk (see next section for more info), and mount a module tree from that, possibly before parsing any Rust code.

Files mounted this way would have a pub(crate) visibility, if you wish to change that publicity, add an export statement to their parent.

pub export submodule1;
pub(self) export submodule2; // if you are very concerned about using something
                             // from this submodule elsewhere in the crate

Though the names of modules are mounted automatically, they are not imported into their parent, and so they are not visible to relative paths from their parent unless they are imported with use or export. That is, you cannot use the name of a submodule without somhow bringing it into scope through use or export statements.

Modules of the form mod foo { /* code */ } would still exist, with no change to their semantics. [for backwards compatibility]

@mppf
Copy link
Member Author

mppf commented Aug 28, 2019

@BryantLam - interesting point. Another way to say it is -

If we went with this philosophy (nothing can refer to a module's symbols other than use/import)

We are basically saying that use and import have a special capability to find other modules which possibly involve finding them in the filesystem. Given that, why would it be surprising for a module M that has a use L to look for L.chpl in M/L.chpl, and also in other module path locations (e.g. modules/standard/L.chpl etc)?

bradcray added a commit that referenced this issue Aug 30, 2019
Require 'use' of top level modules rather than lexical scoping

[reviewed by @mppf and in part by @lydia-duncan]

This PR implements an idea proposed on issue #13523 in which top-level modules are not visible to user code unless a `use` of that top-level module is within lexical scope.  Traditionally, top-level modules have been considered to be placed into the global / program / top-level scope, and therefore their bodies have been able to refer directly to one another through normal lexical scoping.  In essence, this new world makes that program scope "dimly lit" such that you can't see into it to refer to the module unless a `use` statement is present to light your way.

As discussed on issue #13523, the net effect is a less surprising language because expressions like `M.foo()` don't happen to resolve or not based on whether a file defining module M was named on the compile line, happened to be in a file with another module, or was found in the search path when another module `use`d it.  Now if you don't "see" a `use` of a top-level M when you look up through your lexical scope, you can't refer to it.

At present, this new rule is only implemented for top-level modules (I believe), though issue #13536 asks whether it should be applied to submodules as well.

The main outcome of this change is that many modules and tests needed to be updated because they were relying on the old world rules.  The net result is cleaner code.  In the modules directory, I strived to take the approach of using local `use` statements when possible and there were just a few references to the module and module-scope `private use` statements when it was not possible or there were many references to the module.  For modules that were using fully qualified references, I tended to use `only ;` to make the full qualification still be meaningful; in others, I didn't.

The heart of the change is a simple conditional in the lookupNameLocally() routine that essentially pretends it doesn't see a top-level module name unless we're resolving uses:

```chapel
   // don't consider top-level modules to be visible unless this is a use
   if (toModuleSymbol(sym) == NULL || this != rootScope || isUse) {
     retval = sym;
   }

```

Probably the most impactful change to the compiler is that I began threading the `isUse` boolean of the above code through a number of the routines used for lookup to determine whether resolving their symbols could "see" the top-level module symbols or not.  This is fairly mechanical, but ends up touching many lines of code.  I also added a similar `isTopLevel` boolean to the ResolveScope::extend() routine in order to preserve the behavior of generating an error when two top-level modules with the same name were created.  This bool gets threaded into the `isUse` logic above, and without it, the compiler would arbitrarily pick the last such similarly-named module.

Another big change was to have the skipSymbolSearch() routine return a pointer to the module symbol in the event that (a) the only clause did not list the name being searched for and (b) the module name matches.  This permits a case like `use M only ;` to match against "M" where before the match would be done by looking upward in the lexical scope.

I also:
* added a utility function in UseStmt that checks to see whether the name of the module being used matches a particular string
* stored the rootModule's ResolveScope in a global variable because it seemed too useful to remain a local variable (and I did make use of it, although only once)
* made a minor adjustment to a method resolution error to name the base type in question
* removed a double-`use` of `LAPACK` in `LinearAlgebra.chpl` that didn't seem to be adding any value, and which causes a problem when using fully qualified references currently.  I filed a future asking how we wanted to handle such cases.  If we think of `use M` as adding a symbol `M` to the current scope, then doing it twice could be considered an error.  But I don't know if that's natural / surprising and it seems like it could be...
* made minor adjustments to the description of `use` in the spec and to one of the introductory examples
* tried to improve an error message which seemed confusing because it referred to the "use of a module" but was not referring to the `use` statement...
* retired a future fillRandom.chpl
* added a number of new tests distilling cases that tripped me up in larger tests down to their simplest form
* added variants of existing tests whose behaviors had changed in order to lock things in even better

I think one of the biggest questions for me with this PR is whether I added a special case for top-level modules in a place that later fixes or improvements rendered unnecessary.  To that end, in the coming week, I'd like to remove certain segments to see whether that is the case (and add better comments for all that remain).

Resolves #13925 
Resolves #13523
@mppf
Copy link
Member Author

mppf commented Sep 5, 2019

I view the examples brought up in #11262 in this comment as evidence that requiring a use for submodules is a good idea.

module M {
  module M {
    proc whatev() {
      writeln("whee");
    }
  }
}

use M only M;
M.whatev();  // currently, this M refers to the outermost sub-module M above

I think it's confusing that the use M has no effect on what M.whatev() means.

@lydia-duncan
Copy link
Member

I think it's confusing that the use M has no effect on what M.whatev() means.

But this behavior is consistent with that of the following code:

class Foo {
  proc type whatev() {
    writeln("In class Foo's whatev");
  }
}

module Bar {
  module Foo {
    proc whatev() {
      writeln("In module Foo's whatev");
    }
  }
}

use Bar only Foo;
Foo.whatev(); // prints "In class Foo's whatev"

An import Bar.Foo; would have the same impact. You're not going to solve this problem by requiring the submodule is used because the symbols defined at scope are still closer to Foo.whatev than the symbols brought in by the use or import statement.

@lydia-duncan
Copy link
Member

I think what will really help is allowing us to rename modules when they are in a use or import statement

@mppf
Copy link
Member Author

mppf commented Sep 5, 2019

I think it's confusing that the use M has no effect on what M.whatev() means.

But this behavior is consistent with that of the following code:

I don't think that use or modules have to behave the same as classes. I don't find it surprising that your example would call the class method.

You're not going to solve this problem by requiring the submodule is used because the symbols defined at scope are still closer to Foo.whatev than the symbols brought in by the use or import statement.

Suppose that we required submodules be used in the original example and we made use no longer expose the module name itself for qualified access (which is proposed in #13978). Then the example would work:

module M {
  module M {
    proc whatev() {
      writeln("whee");
    }
  }
}
// M not in scope after the above because submodules must be `use`d

use M only M; // outer M not in scope after this because use does not bring in module symbol 
M.whatev();  // here, M can refer only to the inner module

Even if we decided to go a different way on #13978, we could still get this example to work if the submodules had to be used. We would just need to control whether the module itself from a use or the module's contents brought in by theuse were considered nearer in scope. At the very least I would expect an ambiguity error if we did nothing else there.

@lydia-duncan
Copy link
Member

I don't think that use or modules have to behave the same as classes. I don't find it surprising that your example would call the class method.

The point I'm trying to make is that there being consistency between modules and classes in this way is a way to explain away what you found confusing. Having them behave differently will cause confusion for people that think of classes as basically modules that you can make instances of (or modules as basically singleton classes, take your pick). I think that viewpoint is useful and something we should avoid breaking which is part of why I object to making module names be treated differently from other symbols defined in the same scope. To convince me otherwise, you're going to have to argue why that paradigm is not valuable.

@mppf
Copy link
Member Author

mppf commented Sep 6, 2019

A long time ago I proposed that modules and classes be more similar and mixable, so that you could say use a class or a module inside another class as a mix-in. Or new a module. But these ideas in practice were rejected pretty quickly as being too crazy / unrelated to Chapel's main goals of parallelism.

So in the current language, I don't think we have any obligation at all to make modules and classes behave the same. They don't behave the same now. You can't (and as far as I know, never will be able to) use a class at all; you can't inherit from a module in a class or a module; you can't new a module.

I do think that your argument is reasonable. I would rephrase it as this:

  • scoping rules can be really confusing...
  • ... so we should make the scoping rules as simple to understand and as repeatable as possible.

However, I don't agree with this argument. I think that the current situation is that the scoping rules for modules specifically is already too confusing, and that making the scoping for modules have fewer cases (e.g. module names are only made available by import) will reduce the level of confusion. I think this is a worthwhile trade, even if modules become less similar to classes in scoping rules.

However this is a judgement call (we are deciding - which is more confusing?) and I suspect we aren't going to convince each other at all.

For my part, I would be more convinced by your argument if you could use a class; or if the typical use-case for modules did not involve use (or in the future, import). But I don't see either of those things changing.

As the language stands now, if we continue to prefer to "keep module scoping like class scoping" rather than "simplify module scoping with use/import statements" then I think we are building the language around what is more of an unusual/corner case for modules - when there are submodules that aren't used/imported - rather than the common case - in which modules are in different files and are used/imported.

@bradcray
Copy link
Member

bradcray commented Sep 6, 2019

I've lost track... If we were to take a quick straw poll on this issue, where are people currently falling? Please give one of the following a thumbs-up.

@bradcray
Copy link
Member

bradcray commented Sep 6, 2019

Upvote this comment if you think that one shouldn't be able to refer to a submodule's name without a use of it first.

@bradcray
Copy link
Member

bradcray commented Sep 6, 2019

Upvote this comment if you think that one should be able to refer to a submodule's name if simply visible via lexical scoping / without first having a use of it.

@bradcray
Copy link
Member

bradcray commented Sep 6, 2019

Upvote this comment if you're still undecided.

@bradcray
Copy link
Member

bradcray commented Sep 6, 2019

I've started on a branch to implement this, just to see what kinds of impacts it has.

@vasslitvinov
Copy link
Member

I voted for allowing to refer to a submodule's name when visible because it is consistent with what we do with other things, like variables and functions names.

Analogously, it should be legal for a submodule to refer to symbols in the enclosing scopes, to match the behavior of other kinds of nested scopes.

In the following code, I'd draw a parallel between x being a variable vs. a submodule. Another parallel is between the {...} being a nested scope of a conditional vs. that of a module.

var x = 1;
writeln(x);  // no need to 'use x'
if ... {
  writeln(x);  // x comes from the enclosing scope
}

@mppf
Copy link
Member Author

mppf commented Sep 17, 2019

@vasslitvinov - I am curious if you'd apply the same argument to #13523, e.g. if the below is in one file:

module L {
  proc lFunction() { }
}
module N {
  proc main() {
    L.lFunction();  // should this work?
  }
}

then I would imagine the same argument would argue that "L is in scope". Put another way, the parallels-with-other-declarations argument does not apply for the above case, so why should it apply for submodules?

My view on this matter is that modules are fundamentally different from other symbols in terms of scoping (see #13536 (comment) for some elaboration on that) and that it's more of a benefit to usability to limit the ways a module's symbol name is in scope than it is to make it similar to the other cases.

@vasslitvinov
Copy link
Member

@mppf either I am not follow all the intricacies, or I am just outright not convinced.

In the example of two modules side by side:

module L { ... }
module N {
  // is L visible here?
}

I propose to apply my parallel-to-var-decl principle -- so yes, L should be visible from within N. (Still, we need to use L to reference L's symbols without qualification.)

To me this offers clean and simple semantics.

Remark: in this setup L and N are always submodules, either of an explicit enclosing module (if present) or of an implicit one (since they are in the same file).

Remark: inability to use a class is not a strong argument. We already allow use of an enum, to everyone's joy. If we see an important scenario where use of a class or a record helps, I think we will allow that as well.

@lydia-duncan
Copy link
Member

Remark: in this setup L and N are always submodules, either of an explicit enclosing module (if present) or of an implicit one (since they are in the same file).

I believe this is only correct if there are other symbols at the top level besides module definitions (I played with this recently for other reasons). When only modules are defined at the top level, there is no overarching file module scope inserted.

@bradcray
Copy link
Member

bradcray commented Sep 17, 2019

I propose to apply my parallel-to-var-decl principle -- so yes, L should be visible from within N. (Still, we need to use L to reference L's symbols without qualification.)

Vass, I think Michael's point is that his example is precisely what we decided to stop supporting in issue #13523 and PR #13930 because it was too fragile and fraught with confusion (note that Michael's example only uses top-level modules, not sub-modules). So then the natural question is "given the decision #13523, why should sub-modules be different?" (I think there could be an argument there for why they should be different, but it may need to be something different than "you can see the module name looking upwards in its lexical scope" since that rule arguably applies to top-level modules as well).

I believe this is only correct if there are other symbols at the top level besides module definitions (I played with this recently for other reasons). When only modules are defined at the top level, there is no overarching file module scope inserted.

Lydia's correct in this. I missed that you assumed that there was a parent module to L and N.

@vasslitvinov
Copy link
Member

I did not realize there was no implicit module above L+N, as Lydia describes. At least I am upholding my view for submodules.

For top-level L and N, the case when they are in separate files is different IMO than when they are in the same file. I suspect much of motivation in #13523 does not apply when the modules are in the same file. In any case, we are not revisiting #13523 here.

So, I will rephrase the question "why should sub-modules be different?" to "why should top-level modules be different?" My answer being "because top-level modules can be in different files".

Maybe when sub-modules start being in different files, maybe they, too, will need to be use-d or import-ed when they are not in the same file. That remains to be seen.

@mppf
Copy link
Member Author

mppf commented Sep 18, 2019

Maybe when sub-modules start being in different files, maybe they, too, will need to be use-d or import-ed when they are not in the same file.

In fact that is part of the reason that this issue exists. It was in some ways spun off of #13524 which proposes submodules in different files. I think we should make the decision about this issue knowing that we expect to support submodules in different files.

That remains to be seen.

I'm not sure what more information we'd need around the submodules-in-different-files proposal to answer this question (the main part, IMO, is that such functionality is likely to be added). What information would you want to know about submodules-in-different-files to inform this issue specifically?

I think you are proposing that top-level modules / sub-modules in different files have different behavior than if they are in a single file. I do not think that is a good idea, since moving a module/submodule out to a different file shouldn't change the way the program behaves (it should be possible to do simply as a code reorganization).

@bradcray
Copy link
Member

I think you are proposing that top-level modules / sub-modules in different
files have different behavior than if they are in a single file. I do not think
that is a good idea, since moving a module/submodule out to a different file
shouldn't change the way the program behaves (it should be possible to do
simply as a code reorganization).

I agree with this and almost said so yesterday before waiting to see how the
conversation went on its own. I think that a pile of code should generally behave
the same regardless of its organization into files. That said, I'm also open to
exceptions to that rule for convenience, such as the use of a filename as an implicit
module name when a file contains file-scope code other than module declarations.

@vasslitvinov
Copy link
Member

To motivate my approach, think of use MyModule only; as the request "go look in another file if MyModule is not already visible". If MyModule is already visible using normal scoping rules, that's a no-op.

Likewise, if a submodule is already visible, use-ing it is a no-op.

moving a module/submodule out to a different file shouldn't change the way the program behaves

Sure. However adding a little use MySubmodule; when cutting and pasting the actual text seems benign to me. Some of the proposals in #13524 include such a requirement already, just using different syntax. Even for a top-level module, requiring to add use MyModule; when moving parts of a program to another file is a fair game.

When I see something like:

var x ...;
module Y {...}

it does not make sense that I have to use one and not the other. It makes sense to add an adjustment when splitting these into separate files. This is fundamentally where I come from.

@bradcray
Copy link
Member

bradcray commented Oct 1, 2019

You lose me at "in another file" because I don't think language semantics should be so deeply related to the code's organization into files.

it does not make sense that I have to use one and not the other.

I think it does make sense and wouldn't seem as surprising if we'd been living with these rules for years. I think it takes some getting used to given our history with the language because it fundamentally says "module symbols are treated differently than other symbols." But I think that that's OK.

@mppf
Copy link
Member Author

mppf commented Oct 4, 2019

The bug report in #11868 is related. Of course we could seek to solve it another way, but requiring a use/import of the submodule would fix the bug. It's not the strongest reason but it does suggest that requiring a use for submodules might help the implementation effort.

@mppf
Copy link
Member Author

mppf commented Mar 19, 2020

Note, we don't expect to give the same treatment to parent modules, so parent modules won't need to be used/imported, so the "import is the only thing interacting with module names" Pro does not apply.

@bradcray
Copy link
Member

PR #15278 takes a quick stab at this if anyone wants to see the impact, think about it, weigh the tradeoffs, etc. I haven't put in the effort to get all tests working, but have gotten most that aren't specifically wrestling with corner cases of visibility and scoping working.

@bradcray
Copy link
Member

I'm currently leaning against requiring submodules to be used/imported in order to access their names. It feels unnecessarily laborious, and there will always be something within the parent module to indicate the submodule's name and that it is there (either its declaration or a statement that brings it in from a file). So adding the requirement that it also be used/imported feels like busywork to me at present.

@bradcray
Copy link
Member

Today we effectively decided not to pursue this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants