Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Module Search Paths #12923

Closed
ben-albrecht opened this issue Apr 30, 2019 · 29 comments
Closed

Local Module Search Paths #12923

ben-albrecht opened this issue Apr 30, 2019 · 29 comments

Comments

@ben-albrecht
Copy link
Member

This issue is a proposal for a solution to the duplicate module names in mason
packages issue (#8470):

The problem is that 2 mason packages can both use a module name (say, Core
for example). This causes problems if both of these mason packages are used
as libraries by a single application, or if one of the mason packages depends
upon the other.

This issue could be solved by introducing a concept of local module
search paths
, i.e. each module contains its own module search path rather than
using a single global module search path for all modules.

Consider the following example:

Example 1

Directory Layout:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl  # Uses L
      L.chpl

We would like a main module in a different directory to be able to use M
directly and not L and we need to somehow provide the compiler with the
location of M.chpl.

Compilation of Main Module:

chpl main/main-module.chpl M/src/M.chpl

Today, the global module search path looks something like:

$CHPL_HOME/modules/*    # standard library
main/                   # local path of source file
M/src/

Therefore, L is still accessible to the main module.

In this proposal we'd like for the local module search paths to be as follows:

main-module.chpl:
  $CHPL_HOME/modules/*  # standard library
  main/                 # local path of source file
  M/src/M.chpl
M.chpl:
  $CHPL_HOME/modules/*  # standard library
  M/src/                # local path of source file

Therefore, only M can access L directly.

Example 2

Suppose the main-module from before now requires a mason package, Pkg@1.0.0:

chpl main/main-module.chpl
     M/src/M.chpl
     ~/.mason/src/Pkg-1.0.0/src/Pkg.chpl

The local module search paths under this proposal would be as follows:

main-module.chpl
  main/                                 # local path of source file
  M/src/M.chpl
  ~/.mason/src/Pkg-1.0.0/src/Pkg.chpl
M.chpl
  M/src/                                # local path of source file
  ~/.mason/src/Pkg-1.0.0/src/Pkg.chpl
Pkg.chpl
  ~/.mason/src/Pkg-1.0.0/src/Pkg.chpl   # local path of source file
  M/src/M.chpl

# Note: $CHPL_HOME/modules/* implied in all search paths

Subdirectories

What if there are subdirectories? To support this case, we will need new
compilation flags that can modify local module search paths.

The proposed compilation flags for modifying module search paths are:

  • --include-package <moduleFile> adds a module (<moduleFile>) to the local
    module search path of the main module.

  • --include-subpackage <moduleFile> adds a module (<moduleFile) to the
    local module search path of the last module listed in an --include-package
    or --include-subpackage flag.

  • --package-private-M <path> adds a path or module file (<path>) to the local
    module search path of the last module listed in an --include-package or
    --include-subpackage flag.

  • -M <path> adds a path or module file (<path>) to the local module search
    path of all modules being compiled, i.e. the global module search path.

Example 3

Directory Layout:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl # Uses L and K
      L.chpl
      subdir/
        K.chpl

Compilation Command;

chpl main/main-module.chpl
     --include-package M/src/M.chpl
        --package-private-M M/src/subdir

The local module search paths under this proposal would be as follows:

main-module.chpl
  main/                                 # local path of source file
  M/src/M.chpl                          # from --include-package
M.chpl
  M/src/                                # local path of source file
  M/src/subdir                          # from --package-private-M
L.chpl
  M/src/                                # local path of source file
K.chpl
  M/src/subdir                          # local path of source file

# Note: $CHPL_HOME/modules/* implied in all search paths

Example 4

Directory Layout:

  main/
    main-module.chpl # Uses M and A
  M/
    src/
      M.chpl # Uses L and K
      L.chpl
      subdir/
        K.chpl
  A/
    src/
      A.chpl # Uses B and C
      subdir/
        B.chpl # Uses C
        subsubdir/
          C.chpl

Compilation Command;

chpl main/main-module.chpl
     --include-package M/src/M.chpl
        --package-private-M M/src/subdir
     --include-package A/src/A.chpl
        --package-private-M A/src/subdir/subsubdir/C.chpl
        --include-subpackage A/src/subdir/B.chpl
            --package-private-M A/src/subdir/subsubdir/C.chpl

The local module search paths under this proposal would be as follows:

main-module.chpl
  main/
  M/src/M.chpl                          # from --include-package
  A/src/A.chpl                          # from --include-package
M.chpl
  M/src/
  M/src/subdir                          # from --package-private-M
L.chpl
  M/src/
K.chpl
  M/src/subdir
A.chpl
  A/src/
  A/src/subdir/B.chpl                   # --include-subpackage
  A/src/subdir/subsubdir/C.chpl         # --package-private-M
B.chpl
  A/src/subdir/
  A/src/subdir/subsubdir/C.chpl         # --package-private-M
C.chpl
  A/src/subdir/subsubdir

# Note: $CHPL_HOME/modules/* implied in all search paths

Note: The proposed flag names here are placeholders (especially --package-private-M) so feedback is welcome on those.

@mppf
Copy link
Member

mppf commented Apr 30, 2019

I was wondering if one could always use --include-subpackage instead of --package-private-M. The main issue there is that if a module wanted to work with 2 subdirectories, it'd want to add them to its paths (without adding a subpackage as well).

I don't really like the flag name --package-private-M and would propose some alternative names for it:

  • --include
  • --include-subdir
  • --private-include

@bradcray
Copy link
Member

This proposal strikes me as being pretty complex in its use of a very delicate set of ordered flags to define a hierarchy. And I have to admit that I tend to get pretty skittish about proposals that start to require the language or compiler to start to have some sort of concept of a (mason) package (is there precedent for this in other languages?). To me, from the language/compiler perspective, a mason package is just a module that uses whatever modules it needs to and defines whatever submodules it wants to, so I worry about the need to teach the compiler about packages if it's not necessary.

I know I've gotten pushback on this before, but are we sure that such module-specific dependences shouldn't be specified in require statements instead so that a given module can specify what it relies upon locally in a way that naturally maps to the module hierarchy rather than requiring that information to be pushed onto the compiler's command line, disassociating it from the module in question?

@mppf
Copy link
Member

mppf commented Apr 30, 2019

@bradcray -

This proposal strikes me as being pretty complex in its use of a very delicate set of ordered flags to define a hierarchy.

Did you see that Example 1 and Example 2 did not use any new flags at all? (I.e. I think the solution to Example 1 and Example 2 is the main part of the proposal; the new flags add additional functionality and if they're the only thing you don't like then it's worth separating that). In particular I'd like to know if you object somehow to Example 1 or Example 2.

Besides that, there is nothing about this proposal that ties it directly to mason. The names of the flags use the term "package" to reflect their expected common use. @ben-albrecht and I discussed another option for the flag names, --begin-group and --end-group but it seemed worse to have the flag be super abstract than for it to appear to be connected to mason when it is not.

are we sure that such module-specific dependences shouldn't be specified in require statements instead so that a given module can specify what it relies upon locally in a way that naturally maps to the module hierarchy rather than requiring that information to be pushed onto the compiler's command line

That might be possible but I don't think it would solve the problem for Mason packages using modules with the same name by itself. Something about how the compiler handles the module paths will have to change to solve that problem. Or we could conceivably insist that Mason packages using more than one .chpl file use a require statement from the main package .chpl file.

In any case I think there is more to your require statement idea that I don't know about (I.e. to my understanding, require only applies to C-level include paths and libraries). Are you imagining that by "require"ing a directory, you could add that to the module search path just for the current module? That seems to be a bit fraught to me (require "-Msubdir/" would suddenly behave differently from the command line flag -M normally does; require "subdir/" would apply only to Chapel paths and locally to the module but I think of require as being mainly for C dependencies and global to the program being built).

Anyway, I view the main idea of this proposal to be this:

This issue could be solved by introducing a concept of local module
search paths, i.e. each module contains its own module search path rather than
using a single global module search path for all modules.

Wouldn't having require statements that are module-local also rely on the same idea? Isn't it just a matter of having an alternative to the command line to specify it? (In which case we should talk about the relative merits of specifying paths on the command line vs in source code).

@bradcray
Copy link
Member

bradcray commented May 2, 2019

Anyway, I view the main idea of this proposal to be this:

This issue could be solved by introducing a concept of local module
search paths, i.e. each module contains its own module search path rather than
using a single global module search path for all modules.

I don't think I'm objecting to this aspect of the proposal; more to the use of command-line flags to specify per-module behavior; and somewhat to the elevation of packages to a compiler-/language-level concept if it can be avoided (not using the term "package" in the flags seems like a dodge... is there no way we can relate these concepts to modules directly?).

to my understanding, require only applies to C-level include paths and libraries

Today, require statements also permit the specification of .chpl files. For example:

testit.chpl:

require "M.chpl";

proc main() {
  M.foo();
}

M.chpl:

proc foo() {
  writeln("In M.foo()");
}

Works if you do:

$ chpl testit.chpl

I don't think we've ever had support for -M path requirements, though the compiler doesn't seem to complain about them; but it doesn't seem to do anything useful with them either that I can tell. However, you can use relative paths to refer to modules as well. For example, the require statement in testit.chpl above could be written:

require "subdir/M.chpl";

I think we could also consider adding support for require-ing -M paths (and making those module-private / local) if that was considered desirable. The pushback I referred to above is that I think there has generally been concern among some about specifying too much about files/directories in source code (e.g., some expressed concerns about the original support for -I, -L, -l options, and over in a discussion about FFTW, there's a proposal to stop require-ing the library name in the sources as I understand it). And if the relative paths of the previous idea were sufficient, that's definitely easier (in that it's already implemented).

I also wonder sometimes about specifying paths using config params, though this is harder for module paths than -I / -L paths because the current compiler architecture wants/needs to know those long before param resolution has occurred (but perhaps if it were restricted to simple string literals and compile-line config params?

What the require example above doesn't do is make module M in any way invisible to other modules: it's like putting subdir/M.chpl on the command-line where M will be parsed into the program-level scope of modules, such that M will be visible to everyone. To have it be hidden, it would somehow need to be considered a submodule of another module (which takes my mind in the direction of include statements—"i.e., literally stick this module's definition into the current scope"—though I know those have generally been panned as a solution for this kind of problem).

I could imagine potentially needing to make other flavors of require statements module-private as well. For example, maybe your module relies on a helper.h file, and so does mine, but they are two different header files that are found along different include paths. So this need for different modules to access files with the same names in different places without affecting other modules seems as though it may be more general than being just for Chapel code (?). (That said, it can obviously only go so far. E.g., if your C header defines a foo() function and so does mine, I don't think there's much we can do to help avoid them conflicting in the C linker... And arguably we could address this by having the user do the C-level compilations outside of Chapel and just add requirements for .o files or libraries?)

Wouldn't having require statements that are module-local also rely on the same idea? Isn't it just a
matter of having an alternative to the command line to specify it? (In which case we should talk
about the relative merits of specifying paths on the command line vs in source code).

I think that's right. The kernel of what I like about thinking about this in the context of require statements is that it puts the module's dependences and requirements close to the source code that is generating those dependences and making them part of the programmatic hierarchy (modules contain modules which contain modules, each of which may have some requirements). Whereas trying to mimic that hierarchy on the command-line seems a bit fraught and fragile to me, not to mention verbose, and disassociated with the source code.

@mppf
Copy link
Member

mppf commented May 2, 2019

I could imagine potentially needing to make other flavors of require statements module-private as well.

We could think about making all require statements private by default (except as I said, we can't do this for C things). We could also consider supporting private require.

Anyway, I view the main idea of this proposal to be this:

This issue could be solved by introducing a concept of local module
search paths, i.e. each module contains its own module search path rather than
using a single global module search path for all modules.

I don't think I'm objecting to this aspect of the proposal;

That's good to know. I think we could immediately implement the change to fix Example 1 and Example 2.

more to the use of command-line flags to specify per-module behavior;

I think we should talk more about command-line vs. source code for these things, but my position is that we'll ultimately want to support both.

(not using the term "package" in the flags seems like a dodge... is there no way we can relate these concepts to modules directly?).

Sure, the flags could be called --include-module and --include-submodule.

Anyway, let's talk more (maybe in another issue) about how you'd imagine supporting submodules in a different file from a module. Such a functionality will be important in the event that the submodule wishes to refer to private functions in a parent module (since just making them both top-level modules will not allow access to the private functions).

@bradcray
Copy link
Member

bradcray commented May 2, 2019

After discussing offline with Michael: I remain unconvinced that this feature is a necessity in Chapel, though I'll admit I'm not certain about that. The direction I'd prefer to invest in for the short-term is to explore the ability to break a module and its submodules up across multiple files, and then come back to this issue.

Specifically, for example 1, it seems to me that if L is intended to be a module that helps M define its behavior but that nobody else should know about, that L should be a sub-module of M rather than a top-level module that somehow only M knows about. Or, put another way, I don't think there should be a way to inject module names into the top-level namespace that some modules can see but others can't (any more than I think there should be a way to declare a module-scope variable that some functions can see but other functions cannot).

So to me, the question example 1 poses is "Did the author actually want L to be a sub-module of M?" and if so "Is the issue really that they want a way to split module M and its submodules across multiple files to avoid having to define L within the M.chpl file?"

@ben-albrecht
Copy link
Member Author

ben-albrecht commented May 2, 2019

Or, put another way, I don't think there should be a way to inject module names into the top-level namespace that some modules can see but others can't (any more than I think there should be a way to declare a module-scope variable that some functions can see but other functions cannot).

@bradcray - Earlier you stated you were not opposed to the local module search paths part of the proposal, but this reads like you do now. Have I interpreted your current stance correctly?

So to me, the question example 1 poses is "Did the author actually want L to be a sub-module of M?" and if so "Is the issue really that they want a way to split module M and its submodules across multiple files to avoid having to define L within the M.chpl file?"

Yes, the author could make L a sub-module of M if we had a solution to splitting sub-modules across multiple files, as you suggest.

However, there are some challenges with sub-module approach, e.g when a project has a diamond-shaped dependency:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl     # Uses L & K
      L.chpl     # Uses Utils
      K.chpl     # Uses Utils
      Utils.chpl # Is this a submodule of L and K?

In any case, I think a good next step would be to explore the separated-submodule idea a bit more in a new issue and understand how we might handle some of the challenging cases, so that we have more concrete ideas to compare against each other - as @mppf mentions above.

@bradcray
Copy link
Member

bradcray commented May 2, 2019

Earlier you stated you were not opposed to the local module search paths part of the proposal, but this reads like you do now. Have I interpreted your current stance correctly?

I guess that's accurate and that the off-line discussion made me more skeptical about their importance than I had been previously. It might be most accurate to say that I'd like to see whether supporting the ability to break nested modules across multiple files + minor tweaks to directory/file organizations and conventions would obviate the need to support module-specific search paths.

I don't see your diamond-shape case as presenting a problem for sub-modules. I think the module structure you're saying you want is:

module M {
  private use L, K;

  private module L {
    private use Utils;
  }

  private module K {
    private use Utils;
  }

  private module Utils {
  }
}

And then the question becomes "How would we permit you to break this structure up across multiple files?"

@bradcray
Copy link
Member

bradcray commented May 2, 2019

Here's one answer to my question (albeit one that's generally been met with negative reviews, but just to start somewhere...):

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl  # wants sub-modules L, K, Util
      subdir/
        L.chpl
        K.chpl
        Util.chpl

M.chpl:

module M {
  private use L, K;
  include "subdir/L.chpl", "subdir/K.chpl", "subdir/Util.chpl";
}

(where L.chpl, K.chpl, and Util.chpl each define the respective module from my previous comment).

Properties:

  • modules L, K, and Util are all private sub-modules of M and are only privately used, so no other modules that use M will be able to access them (accidentally or purposefully)
  • files L.chpl, K.chpl, and Util.chpl are all stored in directory M/src/subdir which is never added to the compiler's module search path, so other uses of L, K, and Util` will never accidentally find them.

@mppf
Copy link
Member

mppf commented May 3, 2019

I think that some of the criticism of include has to do with its similarity to the C preprocessor. And yet perhaps all we need is "I'd like to write a submodule in a different file".

I'm not sure I'm on board with the requirement that the subdir exist in this situation. (Two ways to remove that requirement - first, the local module search path strategy; second, use a different filename extension for snippets of Chapel code to be included). However I agree that it would resolve the duplicate module problem if the strategy were followed.

Additionally the need to put a path like subdir/L.chpl in the source code will run into the concern you mentioned above:

generally been concern among some about specifying too much about files/directories in source code

So here is a straw-person counter-proposal:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl 
      L.chpl // intended to be private
module M {
  private module L;
}

Here the compiler could interpret module L; as "Please find L.chpl in the local module search path and include its contents here". I would expect that the compiler would allow (but not require) L.chpl to wrap all of its code in a module L { } declaration.

@bradcray
Copy link
Member

bradcray commented May 4, 2019

I'm not sure I'm on board with the requirement that the subdir exist in this situation.

Sorry, I didn't mean to imply that subdir had to exist. I think you could just as easily write include "L.chpl", "K.chpl", "Util.chpl"; after moving them up a directory level and not involve subdir at all. I wouldn't expect an include of a file to add that file's directory to the global module search path any more than subdir was in my example (so ./ wouldn't be either). In writing the example, I was imagining that M/src/ might already be in the global module search path which is why I pushed them down a level. More on that just below.

So, starting with your preferred directory structure:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl 
      L.chpl // intended to be private

I'm thinking about how main-module.chpl found M.chpl to begin with? (where the third answer below is what I think this issue is assuming, but for completeness...).

One possibility is that it's in the module search path. But if that's the case, then L.chpl is also in the module search path suggesting that any other module's use L is also going to find it. That is, there's nothing about L.chpl that's private if we store it in a directory that's in the module search path. And if it is, adding a module-local search path wouldn't do anything to hide it any better.

A second possibility is that main-module had require "M/src/M.chpl"; in its source. Taking this approach keeps M/src/ out of the global module search path and would prevent any other modules from finding L unless they also knew to name M/src/ in some way. So taking this approach doesn't require a local-module search path to keep L.chpl hidden because it already is. That said, this approach seems unlikely to be attractive because your main-module probably doesn't want to embed M's location in its source code.

A third possibility is that M/src/M.chpl was named on the command-line (which is the approach implied by the issue description above). Today this does add M/src/ to the global module search path, but perhaps it shouldn't. Most require statements are designed to behave similarly to adding their contents to the command-line, so this is arguably inconsistent with how the previous case was handled (or vice-versa). I believe we decided to add this feature as a convenience so that M could more easily use sibling modules in the same directory structure (see two paragraphs below for additional info). But perhaps we've actually created an inconvenience since it doesn't provide a way to name specific files without also adding their directories to the search path (and if you had really wanted to do that, perhaps you should've just specified the -M flag in which case you presumably wouldn't have had to name the specific file for a case like this anyway).

So this makes me think that we should look into no longer having command-line Chapel files affect the global module search path, see what tests break, and whether we find them compelling. If not, we can change this behavior to not affect the global module search path, and not require a local module search path either (at least for this case/reason). This is a simple change to make (see https://github.com/bradcray/chapel/tree/relative-chpl-dont-affect-modpath) and it looks like < 75 tests use the feature, so I'll run a spot-check on them tonight and do a full run to make sure I didn't miss anything when nightly testing isn't about to run (failures due to spot-check: modules/bradc/printModStuff/foo.chpl, modules/bradc/srcDirImpliesPath/foo.chpl, modules/sungeun/ambiguous/ambiguous2.chpl, studies/hpcc/FFT/fft-testPow4.chpl... I'll need to look into whether I think these are motivating or not another day).

(For historical purposes: Why did we take this behavior? I think it's because if a file a/b/c.c is specified to a C compiler then any #includes within that file are searched for from a/b so we thought we were being symmetric. But this arguably makes more sense for a require or include statement which names a file than it does for a use statement that names a language-level identifier...)

Anyway, if we were to change this, then I think we wouldn't need a module-local search path for this case either (and at this point I want to foreshadow an important sidebar that makes up the final three paragraphs of this comment).

Am I missing any other ways that main-module could know about M's location?

Additionally the need to put a path like subdir/L.chpl in the source code will run into the concern you mentioned above:

generally been concern among some about specifying too much about files/directories in source code

Just to be clear, I don't share this concern, at least for cases like this. I think it's reasonable for an author of a big Chapel module who wants to break it into separate files to organize those files using subdirectories and specify relative paths to get to the files where they live. I'm also not sure that those who have objected to putting paths into sources in the past would object to cases like this either; what I recall hearing objections to was more around putting library search paths or include paths into sources for system-wide packages. But maybe there is a reason to avoid even simple relative paths like this when creating little code clusters that I'm not seeing.

I think that some of the criticism of include has to do with its similarity to the C preprocessor.

It's also similar to LaTeX's \input feature, which I find invaluable (I can't imagine having to put the entire Chapel language specification into a single file... Why would we require Chapel programmers writing huge module structures to do the same?)

The main criticism I've heard about include is that while it might be useful, it's not sufficient for everything users want when breaking things across multiple files because they want some sort of separate compilation ability and include just creates a way to give the compiler more source at once rather than some sort of pre-compiled thing. I think this feature request is a reasonable one, but I don't think it means that include isn't useful/valuable in and of itself. Particularly given that we don't have separate compilation yet; and once we did, presumably there'd be a way to say "include or input this precompiled module as a submodule to my current scope" as an alternative to "this uncompiled source code."

I don't mean to imply that having an include / input statement is the only way to solve the nested modules in different files problem, but it's a familiar one and doesn't seem inherently problematic to me. It can be abused of course, but most things can if you push them hard enough; and I think there are plenty of clean uses and preferred styles that make sense (e.g., included files should define entire modules, functions, or variables, not parts of lines that will be glommed together with parts of other lines). For example, in LaTeX I could put arbitrary text into each file, but I don't... I usually map each section or figure to a file by convention which is helpful to me and clean to understand.

All that said, I'm far more happy to wrestle with counterproposals to the "how do we break a module across multiple files" question than the "how do we create module-specific search directories" question because I think it solves two problems: (1) how to avoid huge monolithic files in Chapel and (2) how to encapsulate private modules so that they don't pollute the top-level program namespace.

That said, I have to admit that I'm not crazy about Michael's counterproposal:

module M {
  private module L;
}

As Chapel stands today, I interpret this as: "I'm defining a private module named L. It has no body / contents" (similar to how extern proc foo(); has no body). Nothing about this statement (as compared to the current form private module L { ... } suggests to me "look around the file system for something that defines a module named L and inject its contents here." To me, it would be surprising if such a concept did not name a file.

[One historical note that I've brushed up against a few times in this issue and want to get out in the open again: The current behavior in which use L; causes the compiler to go look for files named L.chpl was considered a poor hack the day it was introduced, and isn't something that I think we should particularly cling to or emulate. The original intention which we never had time to implement was to traverse the module search path looking for files that define module L regardless of the file's name. For instance, L-1.1.chpl or MyFilename.chpl would be parsed if they defined module L { ... }. Why did we take the current approach? Because it was simple and got us running and in many cases the two things do / did match (particularly when using implicit module names).

We made a start at doing something better with a grammar called modulefinder.ypp (that can be found in the git archives) which was meant to be a clone of chapel.ypp that mostly just dropped code on the ground but knew how to navigate comments and strings to avoid false positives. Then the idea was to create little index files that would say which modules were defined by each .chpl file and to use those indices to resolve use statements rather than leaning on the assumption that the filename had to be the same.

I think this model still has merit (lots more than the current system), though there are challenges as well: For example, if the modules that are defined by a file depend on the settings of a config param then the index files couldn't simply be updated based on the timestamps of the .chpl files, but rather would have to be sensitive to the specifics of the compilation; so perhaps rather than storing index files, the compiler should just have an ultra-fast way to find candidate .chpl files (via grep?), parse them, and see whether they defined the module or not (dropping the code on the ground if the answer was "not" and searching onward...)]

@bradcray
Copy link
Member

bradcray commented May 6, 2019

The main criticism I've heard about include is that

I'd forgotten that Bryant also gave it a thumbs-down for other reasons in issue #10909.

@mppf
Copy link
Member

mppf commented May 6, 2019

Am I missing any other ways that main-module could know about M's location?

Not that I know of. Indeed, the Mason case is that M/src/M.chpl is named on the command-line. However, I personally would rather have module-local search paths than to not be able to implicitly use another module in the same directory as M.chpl. I use this feature all the time when running tests.

For example, if the modules that are defined by a file depend on the settings of a config param then the index files couldn't simply be updated based on the timestamps of the .chpl files, but rather would have to be sensitive to the specifics of the compilation

I'm not seeing how which modules a file defines could depend on a config param currently. Are you imagining that some other feature is introduced?

A third possibility is that M/src/M.chpl was named on the command-line (which is the approach implied by the issue description above). Today this does add M/src/ to the global module search path, but perhaps it shouldn't.

Right, I think we need to choose one (or more) of these:
a. Naming M/src/M.chpl on the command line doesn't add M/src/ to the global module search path
b. Naming M/src/M.chpl on the command line does add M/src/ to a local module search path (within M.chpl, files in M/src can be used) but not to the global module search path.
c. We change mason to use a different (perhaps not yet available) way of communicating a package module path to the compiler. This different way would not affect the global module search path.

The original intention which we never had time to implement was to traverse the module search path looking for files that define module L regardless of the file's name.

I have in the past bristled at the way that use M works today and I agree with you that something about it probably needs to change. However I view the feature currently missing is a way to explicitly indicate which file you want to gather a module from. Perhaps the require syntax followed by a use would do it. But, if we used require that way, we'd still need some sort of "local search path". Why? Because the require Something.chpl would say "Go find Something.chpl please and allow modules defined in it do be used from this module". In particular it would not say "Please make modules in Something.chpl available to use from all modules".

I think we still have a problem that requires module-local search paths. The reason for that is that if we allow a (more explicit) way to indicate where a module is coming from, then it needs to be checked before the global module search path and not apply to other modules.

I tried using require in this way with the current compiler and here is an example and the problems I ran into:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl  # wants to privately use L / have submodule L
      subdir/
        L.chpl
chpl main-module.chpl  M/src/M.chpl 
// main-module.chpl
use M;

proc main() {
  mfunction();

  use L; // currently compiles but I want it to be an error
         // because L is intended to be private to the package M.
}
// M/src/M.chpl
module M {
  //require "L.chpl"; // doesn't find L.chpl
  //require "./L.chpl"; // doesn't find L.chpl
  require "M/src/subdir/L.chpl"; // works but requires specific working directory
  proc mfunction() {
    use L only;
    L.lfunction();
  }
}
// M/src/subdir/L.chpl
module L {
  writeln("initing L");

  proc lfunction() {
    writeln("in lfunction");
  }
}

An idea of module-local search paths would solve 2 problems in this example:

  1. it can make the use L in main-module.chpl an error, since L won't be findable in the global path.
  2. it can allow M.chpl to require "L.chpl" in a way that doesn't assume anything about the current directory of the compilation call.

@mppf
Copy link
Member

mppf commented May 6, 2019

That said, I have to admit that I'm not crazy about Michael's counterproposal:

module M {
  private module L;
}

As Chapel stands today, I interpret this as: "I'm defining a private module named L. It has no body / contents" (similar to how extern proc foo(); has no body). Nothing about this statement (as compared to the current form private module L { ... } suggests to me "look around the file system for something that defines a module named L and inject its contents here." To me, it would be surprising if such a concept did not name a file.

Sure, we can address that. What about

module M {
  private module L in "L.chpl";
}

Anyway I think the big question is if we want submodules-in-different-files to be handled by:

@BryantLam
Copy link

BryantLam commented May 20, 2019

From the original post, I completely agree with Examples 1 and 2. Because that would be only a behavioral change with the compiler with no new compiler options, I don't see a reason not to do it today.

For Subdirectories and Examples 3 and 4: nope. I don't want to be in perpetual servitude to an arbitrary layout of my filesystem directories. The user should not be allowed to arbitrarily put files in subdirectories only to then go about representing that layout differently in their actual code. The code should dictate where to put modules, not the other way around. That way, we don't get into this mess with require statements, include statements, or whatnot.

For the following code:

module MyModule {
  use Submodule1;
  use Submodule2;
}

There should only be a few known layouts for it, including a few combinatorial layouts between 2 and 3.

1
├── main.chpl # Uses MyModule
├── MyModule.chpl
├── Submodule1.chpl
└── Submodule2.chpl

2
├── main.chpl
└── MyModule
    ├── MyModule.chpl
    ├── Submodule1.chpl
    └── Submodule2.chpl

3
├── main.chpl
└── MyModule
    ├── MyModule.chpl
    ├── Submodule1
    │   └── Submodule1.chpl
    └── Submodule2
        └── Submodule2.chpl

All three of these are compiled with the same line:
chpl main.chpl

Local module search paths:
main.chpl:
  $CHPL_HOME/modules/*                          # standard library
  <current directory of main.chpl>              # Find MyModule in 1
  If MyModule not found: <current directory>/MyModule/MyModule.chpl   # Find MyModule in 2 and 3
MyModule.chpl:
  $CHPL_HOME/modules/*                          # standard library
  <current directory of MyModule.chpl>          # Find Submodule{1,2} in 2
  If Submodule1 not found: <current directory>/Submodule1/Submodule1.chpl
  If Submodule2 not found: <current directory>/Submodule2/Submodule2.chpl
Submodule1.chpl
  $CHPL_HOME/modules/*                          # standard library
  <current directory of Submodule1.chpl>
Submodule2.chpl
  $CHPL_HOME/modules/*                          # standard library
  <current directory of Submodule2.chpl>

# The if-conditionals could be considered an optimization.
#
# As an aside, Rust represents the top-level module with file `lib.rs`. The local module search
# paths listed would go to e.g., <current directory>/MyModule/lib.chpl instead of fully naming
# the already known name.

Look at the directory tree. Can you tell which are the parent modules and which are the submodules? The compiler should be able to do this too without any new compiler options.

Note: Here be dragons.

So, Example 3 is turned into:

  # Original Post
  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl # Uses L and K
      L.chpl
      subdir/
        K.chpl

  # My Proposed Layout
  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl # Uses L and K
      L.chpl
      K/
        K.chpl

compiled with:
# Same as Example 1
chpl main/main-module.chpl M/src/M.chpl

and Example 4:

  # Original Post
  main/
    main-module.chpl # Uses M and A
  M/
    src/
      M.chpl # Uses L and K
      L.chpl
      subdir/
        K.chpl
  A/
    src/
      A.chpl # Uses B and C
      subdir/
        B.chpl # Uses C
        subsubdir/
          C.chpl

  # My Proposed Layout
  main/
    main-module.chpl # Uses M and A
  M/
    src/
      M.chpl # Uses L and K
      L.chpl
      K/
        K.chpl
  A/
    src/
      A.chpl # Uses B and C
      B/
        B.chpl # Uses C
        C/
          C.chpl

compiled with:
# Essentially same as Example 1
chpl main/main-module.chpl M/src/M.chpl A/src/A.chpl

Most of the comments in this thread talk about the Original Examples 3 and 4, which I am not in favor of supporting because of the unnecessary complexity and discussion it has generated. This solution seems way cleaner.

@BryantLam
Copy link

Reading through the comments more carefully, Brad's responses have actually been pushing back against the idea of local module search paths. Global module search paths are right in line with the current status quo where module members have default public visibility and use statements import symbols into the global namespace. 👍 (But seriously. I hate all three of these behaviors.)

@bradc: Specifically, for example 1, it seems to me that if L is intended to be a module that helps M define its behavior but that nobody else should know about, that L should be a sub-module of M rather than a top-level module that somehow only M knows about. Or, put another way, I don't think there should be a way to inject module names into the top-level namespace that some modules can see but others can't (any more than I think there should be a way to declare a module-scope variable that some functions can see but other functions cannot).

So to me, the question example 1 poses is "Did the author actually want L to be a sub-module of M?" and if so "Is the issue really that they want a way to split module M and its submodules across multiple files to avoid having to define L within the M.chpl file?"

Example 1 is good to me. I want to expose a set of public APIs through a top-level module called M. I don't mind defining a submodule L in a separate file if it means I can break away some of those components into a logical grouping. If I can't break that module into a logical grouping called L, then they deserve to be in a monolithic file because it's all related functionality anyway.

One place where this helps is with #12712 in a refactor of all stable standard modules into top-level module std.

@mppf: So here is a straw-person counter-proposal:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl 
      L.chpl // intended to be private
module M {
  private module L;
}

Here the compiler could interpret module L; as "Please find L.chpl in the local module search path and include its contents here". I would expect that the compiler would allow (but not require) L.chpl to wrap all of its code in a module L { } declaration.

While just a straw-person, the problem with this approach is that it doesn't work for #12712.

@bradc: One possibility is that it's in the module search path. But if that's the case, then L.chpl is also in the module search path suggesting that any other module's use L is also going to find it. That is, there's nothing about L.chpl that's private if we store it in a directory that's in the module search path. And if it is, adding a module-local search path wouldn't do anything to hide it any better.

The local module search paths would need to learn how to look for files too. The problem doesn't occur if main.chpl only knows how to find exactly M/src/M.chpl and not look all inside M/src/.

A second possibility is that main-module had require "M/src/M.chpl"; in its source. Taking this approach keeps M/src/ out of the global module search path and would prevent any other modules from finding L unless they also knew to name M/src/ in some way. So taking this approach doesn't require a local-module search path to keep L.chpl hidden because it already is. That said, this approach seems unlikely to be attractive because your main-module probably doesn't want to embed M's location in its source code.

I don't want to embed directory paths to files at all. It's unnecessary information if we outright forbid that directory structure from existing with local module search paths. More critically, semantic imports through use statements will help incremental compilation. C++ has decades of experience with #include statements causing all kinds of nightmares for incremental compilation; granted, Chapel has not gotten to those same problems (yet) because the language doesn't have the equivalent of #define or #include.

So this makes me think that we should look into no longer having command-line Chapel files affect the global module search path, see what tests break, and whether we find them compelling.

I wouldn't put too much emphasis on the test suite given that there likely aren't a lot of tests with large hierarchical module dependencies in there.

Just to be clear, I don't share this concern, at least for cases like this. I think it's reasonable for an author of a big Chapel module who wants to break it into separate files to organize those files using subdirectories and specify relative paths to get to the files where they live. I'm also not sure that those who have objected to putting paths into sources in the past would object to cases like this either; what I recall hearing objections to was more around putting library search paths or include paths into sources for system-wide packages. But maybe there is a reason to avoid even simple relative paths like this when creating little code clusters that I'm not seeing.

I dislike all of the stated behaviors. 😄 Maybe that's because I like controlling Chapel things from within Chapel source and not controlling C or command-line things from within Chapel source. The compiler can control the local module search paths without having to resort to filesystem-level constructs and the duplication of functionality when nested use statements could make the compiler smarter.

// A_Include.chpl
module A {
  include "B.chpl"
  include "C.chpl"
  include "D.chpl"
}

// A_Include_Bad.chpl
module A {
  module B {
    include "B.chpl" // Did I just wrap B in an outer B?
  }
  module C {
    include "C.chpl"
  }
  module D {
    include "C.chpl" // Whoops.
  }
}

// A_LocalModuleSearchPaths.chpl
module A {
  use B;
  use C;
  use D;
}

It's also similar to LaTeX's \input feature, which I find invaluable (I can't imagine having to put the entire Chapel language specification into a single file...

While I don't want to go up against the behemoth that is LaTeX, this is one of the first results regarding \input versus \include. The biggest benefit for the more restrictive \include is incremental compilation.

... Why would we require Chapel programmers writing huge module structures to do the same?)

Because we can be better! Do you want faster horses or a car? 👍

I think this feature request is a reasonable one, but I don't think it means that include isn't useful/valuable in and of itself.

I'd personally like whatever solution occurs with modules to be useful enough that include wouldn't be needed. Again, incremental compilation is going to be problematic for include. (Where do you create the compilation boundaries when you can potentially include arbitrary files in a directory structure?) But I can't say I fully understand all the relevant concerns especially around generics.

Particularly given that we don't have separate compilation yet; and once we did, presumably there'd be a way to say "include or input this precompiled module as a submodule to my current scope" as an alternative to "this uncompiled source code."

I'd advocate for use statements to be some of this. Semantic imports!

I don't mean to imply that having an include / input statement is the only way to solve the nested modules in different files problem, but it's a familiar one and doesn't seem inherently problematic to me.

See above regarding C++ modules after decades of experience with #include.

It can be abused of course, but most things can if you push them hard enough; and I think there are plenty of clean uses and preferred styles that make sense (e.g., included files should define entire modules, functions, or variables, not parts of lines that will be glommed together with parts of other lines). For example, in LaTeX I could put arbitrary text into each file, but I don't... I usually map each section or figure to a file by convention which is helpful to me and clean to understand.

Unless the language enforces it, someone will do something tricky with it, which is where #include and #define directives have been an issue. Do you like manually specifying source-file dependencies in Makefiles? I sure don't.

I'd advocate for something much more restrictive that provides the necessary information for compilers to do their jobs while giving enough flexibility to the programmer to lay out their directory structure without low-level hooks to let humans be the creative entities that they are in doing said tricky things. I'd argue that this isn't new ground; modern languages are all gravitating away from these low-level behaviors regarding package directory structures. Though taken to the extreme, something to avoid is Python's behavior where it's stating all over the filesystem.

The original intention which we never had time to implement was to traverse the module search path looking for files that define module L regardless of the file's name. For instance, L-1.1.chpl or MyFilename.chpl would be parsed if they defined module L { ... }. Why did we take the current approach? Because it was simple and got us running and in many cases the two things do / did match (particularly when using implicit module names).

Uh, so if I wanted to edit someone else's module L, I'd have to grep all source files to find it? I'm glad that's not the world we ended up in; restrictive is better for the compiler! (From my previous point, that also sounds pretty expensive for the filesystem.)

@mppf: I have in the past bristled at the way that use M works today and I agree with you that something about it probably needs to change. However I view the feature currently missing is a way to explicitly indicate which file you want to gather a module from. Perhaps the require syntax followed by a use would do it. But, if we used require that way, we'd still need some sort of "local search path". Why? Because the require Something.chpl would say "Go find Something.chpl please and allow modules defined in it do be used from this module". In particular it would not say "Please make modules in Something.chpl available to use from all modules".

I agree; use M's behavior should change to become more powerful.

@BryantLam
Copy link

Handy references to consider:

@BryantLam
Copy link

BryantLam commented May 20, 2019

Another example. I'll use use but pretend that we're not worried about global namespace pollution in the active scope (i.e., import, or advocating for use to do more/different).

module Foo {
  public use Bar.Baz;
  private use Details; // Only available to Foo and any of its submodules like Bar or Bar.Baz.
}

module Bar {
  public use Baz; // If we don't do this, then Foo can't `use Bar.Baz` either.
}

proc main() {
  use Foo;
//use Foo.Bar; // No! You can't do this. Foo didn't `public use` Bar.
  use Foo.Baz; // Ok. Foo has Baz in scope. main doesn't need to know that it occurs through Bar.
//use Foo.Bar.Baz; // Nope!
//use Foo.Details; // Can't do this either. Details is private.
}
# One notional directory structure.
root
├── Foo
│   ├── Bar
│   │   ├── Bar.chpl
│   │   └── Baz.chpl
│   ├── Details
│   │   ├── DetailsA.chpl
│   │   ├── DetailsB.chpl
│   │   ├── DetailsC.chpl
│   │   └── Details.chpl
│   └── Foo.chpl
└── main.chpl

Local module search paths:
main.chpl
  <std>
  # Where is Baz? Look in module Foo
  <curdir>
  If Foo not found: <curdir>/Foo/Foo.chpl
Foo
  <std>
  # Where is Baz? Look in module Bar
  # Where is Details?
  <curdir>
  If Bar not found: <curdir>/Bar/Bar.chpl
  If Details not found: <curdir>/Details/Details.chpl
Bar
  <std>
  # Where is Baz?
  <curdir>
  If Baz not found: <curdir>/Baz/Baz.chpl
Baz
  <std>
  # Uses Foo.DetailsA
  <curdir>
  <path to DetailsA> # already known?
Details
  <std>
  # Uses Details{A,B,C}; Where are they?
  <curdir>
  If DetailsA not found: <curdir>/DetailsA/DetailsA.chpl
  If DetailsB not found: <curdir>/DetailsB/DetailsB.chpl
  If DetailsC not found: <curdir>/DetailsC/DetailsC.chpl

But note the dragons. I'm not sure how Details will get exposed to Bar or Baz. Rust can use a top-of-level absolute path to handle this case. So maybe this isn't quite the right model and we have to be even more restrictive and start our local search elsewhere like the root of the tree. Clearly, Rust has thought through more of these issues than we have.

Edit: Added search paths for modules under Foo looking for Details. Maybe? I don't know.

@mppf
Copy link
Member

mppf commented May 20, 2019

@BryantLam - wow that's a lot of comments :)

I wanted to bring up some things related to your proposal in #12923 (comment) but please note that I'm not yet trying to express an opinion about it.

Difference from directory-is-a-module?

First, I want to understand how your proposal is different from what I proposed in #10946. I think that the difference is that your proposal doesn't actually create submodules. Instead, it is a module search adjustment. In particular, when looking for M.chpl, the compiler will be willing to look for M/M.chpl in current search paths in addition to looking for M.chpl. Is that right?

Does the module L; strategy work for standard modules in std namespace?

Second,

@mppf: So here is a straw-person counter-proposal:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl 
      L.chpl // intended to be private
module M {
  private module L;
}

Here the compiler could interpret module L; as "Please find L.chpl in the local module search path and include its contents here". I would expect that the compiler would allow (but not require) L.chpl to wrap all of its code in a module L { } declaration.

While just a straw-person, the problem with this approach is that it doesn't work for #12712.

I don't understand why this couldn't work for #12712. Certainly in that situation, we wouldn't want all of the submodules to be private, but I was here trying to propose that module L; would look for L.chpl somewhere and the private part was entirely optional. So in particular a sketch of the standard library would be this:

module std {
  module Sort;
  module Random;
  ...
}

Keeping in mind that this is a straw-person proposal, I do have sympathy for Brad's objection to it that module SomeName; appears to define an empty module. However I don't think it's that different from a function signature, which says that the function exists but does not define its body. (We have this pattern now for extern functions, but it'll probably come up with interfaces, and C/C++ programmers are definitely familiar with it). Anyway my hypothesis here is that it is possible to adjust the syntax for this idea to address the objection.

Privacy control and submodules

Lastly, I wanted to bring up the interaction of privacy control with submodules.

Back to the first example, we have

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl  # Uses L
      L.chpl

and the desire is to arrange it so that L is private to M. The original issue description above proposes a search-path way of doing that - where L is not visible because it's not in the global module search path (in normal usage that names M/src/M.chpl on the compile line, anyway).

However I do think it's also reasonable to wonder - what if M had private functions/types/variables? Since here L is part of the implementation of M, what if the author of that module wanted L to be able to access these private functions/types/variables?

If L is literally a submodule to M, it can use private things in M, because it is part of M:

module M {
  private proc privateProc() { }
  module L {
     privateProc();
  }
}

I can see two strategies here:

  1. Tie module search paths to privacy rules to allow this pattern
  2. Support (somehow) explicitly creating a submodule in a different file.

I think that the module L; proposal as well as #10946 both use approach 2. I can't tell if your proposal solves this problem at all, or if you would use approach 1.

What does Rust do?

I did look at your "Here be dragons" link for Rust and didn't find anything in there that surprised me. (But maybe I don't know what to look for). However I did notice that Rust uses exactly the corresponding syntax from my straw-person proposal: https://doc.rust-lang.org/book/ch07-02-modules-and-use-to-control-scope-and-privacy.html#separating-modules-into-different-files

Using a semicolon after mod sound instead of a block tells Rust to load the contents of the module from another file with the same name as the module.

Additionally, I've tried to understand other elements of Rust's design here.

  • A module MyModule can be found in MyModule.rs or in MyModule/MyModule.rs (which is similar to what @BryantLam proposed) -- https://doc.rust-lang.org/reference/items/modules.html#module-source-filenames
  • I think they must have some rule like the local module search path proposed here for this to work (but I havn't found this in the documentation). In particular, if you have MyModule/MyModule.rs and that declares a module in another file Impl.rs, doesn't the Rust compiler need to know to look for it in MyModule/ and not just in the global search path? Maybe it is just that a mod Impl; declaration always looks in the directory storing the source code for the module it is contained it? That is, the file MyModule/MyModule.rs contains mod Impl; and therefore it looks for Impl.rs in MyModule/ ?
  • Rust does allow one to indicate where to search for a module with the path attribute - https://doc.rust-lang.org/reference/items/modules.html#path-attribute - I think this might be providing a similar functionality to require "SomeDirectory/SomeFile.chpl or to the proposed module L in "L.chpl";.

@BryantLam
Copy link

@mppf: First, I want to understand how your proposal is different from what I proposed in #10946. I think that the difference is that your proposal doesn't actually create submodules. Instead, it is a module search adjustment. In particular, when looking for M.chpl, the compiler will be willing to look for M/M.chpl in current search paths in addition to looking for M.chpl. Is that right?

I think that's right. My proposal doesn't create submodules; rather it limits the module search paths instead. After looking at #10946 more closely, I didn't grasp the distinction between a submodule versus useing another module until now, so I can see why my proposal and some of my examples might not be solving the core problem of representing submodules. Your comment has helped a lot to clarify my misunderstanding.

Does the module L; strategy work for standard modules in std namespace?

I don't understand why this couldn't work for #12712. Certainly in that situation, we wouldn't want all of the submodules to be private, but I was here trying to propose that module L; would look for L.chpl somewhere and the private part was entirely optional.

Whoops! Sorry, you're right! I read your code snippet too quickly and thought that private was the relevant portion when it was actually the module L; piece. Your approach looks good to me since [sub]modules already feel like they could be a kind of extern anyway when you break them into separate files.

Privacy control and submodules

I can see two strategies here:

1. Tie module search paths to privacy rules to allow this pattern

2. Support (somehow) explicitly creating a submodule in a different file.

I think that the module L; proposal as well as #10946 both use approach 2. I can't tell if your proposal solves this problem at all, or if you would use approach 1.

I think my proposal--being an extension of the original post at the top--will likely not address the submodules distinction unless strategy 1 enables that effect. The proposal only handles the local search paths, so I think if we substitute module L; for use L;, as in your straw-person proposal, it will achieve the intended effect. #10946 might be fine too, but I feel like that approach has less control since I can't explicitly say a module is private in the supermodule (?). I'm a bit fuzzier on that approach and would want to see more if seriously considered.

What does Rust do?

More (hopefully relevant) references:

@mppf
Copy link
Member

mppf commented May 21, 2019

@BryantLam - Thanks especially for the links to Rust's previous work. I see also Revisiting Rust's modules, part 2 and - from a different author - The Rust module system is too confusing.

Also, it looks like what was actually agreed upon and implemented from those blog posts is in RFC 2126. As I understand it, that RFC does support the idea that a directory can represent a module - but it doesn't make it mandatory. (In particular, if you search for mod cli;, you will see that the file cli.rs can refer to the files in cli/ and as a result the module cli is substantially stored in the directory cli).

One thing that is clear to me is that just as this is one of the few Chapel issues where I see people putting 👎 on ideas... the module system discussions for Rust were pretty contentious. Somehow it seems inherent to the topic.

Anyway, it looks to me like the authors of the blog posts mentioned above would like for Rust modules to more closely map to files and directories. However AFAIK this is not what Rust has done, at least in part due to backwards compatability issues. I think that is a reasonable direction for us to go - or to at least seriously consider. Certainly one could view #10946 as a starting point in that direction. Note that some of the blog posts even argue for deprecating mod L; type declarations - which is what we have been discussing in the straw-person proposal. These would not be necessary with the idea of a directory-as-a-module.

There is an important difference from #10946 and the Rust proposals around directory-as-module. In #10946, I proposed that the files within a directory would be submodules. But in Revisiting Rust's modules, the files in a directory collectively create a module, and submodules are stored in their own subdirectory.

Why do they think that it's generally better for the directory structure to match the module hierarchy? Because it's less confusing (especially for beginners) and also because it allows one to know where to look for a particular piece of code in a larger project.

So, I think the main question at this point is this - should the recommended style for submodules in different files involve an idea of directories representing submodules? That is, that a module can be represented by a directory, with submodules represented by subdirectories?

What could this look like, in terms of Example 1 from this issue?

Example 1 using M.chpl and M

Directory Layout:

  main/
    main-module.chpl # Uses M
  M.chpl
  M/
      L.chpl

Compilation of Main Module:

chpl main/main-module.chpl M.chpl

M.chpl could private use L. The compiler would know that when compiling code in M.chpl, it can also look in M/*.chpl to satisfy use. Additionally, M.chpl could have a call like L.foo() which would be allowed in M.chpl even without a use statement. (We get that behavior today if L.chpl is included on the command line - here the files in M/*.chpl would be similarly treated, but only when handling code in M.chpl and not in say main-module.chpl). main-module.chpl would not be able to use L or to refer to it unless M.chpl includes public use L. (I think we are planning to move towards private use being the default but that's another issue). Lastly, the compiler would consider L to be a submodule of M for privacy / scoping purposes.

@BryantLam
Copy link

BryantLam commented May 24, 2019

More links from Rust:

@mppf: One thing that is clear to me is that just as this is one of the few Chapel issues where I see people putting -1 on ideas... the module system discussions for Rust were pretty contentious. Somehow it seems inherent to the topic.

I wrestled with this topic myself coming from a C/C++ background, but I do believe it is better that Chapel enforces a packaging standard in the long run. The majority of programmers are reading/maintaining code way more often than writing code. I'd personally want any user of Chapel to be able to quickly learn/scan through any package source because all packages/codebases would have a consistent filesystem layout, whatever that layout may be.

Why do they think that it's generally better for the directory structure to match the module hierarchy? Because it's less confusing (especially for beginners) and also because it allows one to know where to look for a particular piece of code in a larger project.

I completely agree. This rationale deserves repeating because Python source is laid out in a similar way and I'd like to think that the spaces-vs-tabs debate went away with code formatters and style guides similar to how Python's rigid packaging hierarchy removed a similar debate for new codes being written.

There is an important difference from #10946 and the Rust proposals around directory-as-module. In #10946, I proposed that the files within a directory would be submodules. But in Revisiting Rust's modules, the files in a directory collectively create a module, and submodules are stored in their own subdirectory.

Why do they think that it's generally better for the directory structure to match the module hierarchy? Because it's less confusing (especially for beginners) and also because it allows one to know where to look for a particular piece of code in a larger project.

I agree, especially since it is easier to grok by a new user. While I do empathize with @bradc's desire for what amounts to multiple files inlined into a module { ... }, I don't think that's the common case, or at least common enough that deviating from simply having the filesystem represent the module hierarchy.

If such a feature were desired, the proposal in #10946 actually notes inlined-files-as-module as a possibility from the original Rust proposal where files in a directory were concatenated/inlined into that module and submodules must be directories. This model would not be that hard to understand either, but it is different enough from Python and Rust that it has to be taught. I'm okay with either option since the inlining/concatenating approach affords an additional capability. It does, however, deviate from Chapel's notion of file-level modules, though that problem is also present with #10909's include statement.


More questions related to module search paths:

Question: Will Chapel will need the same pathing distinctions that Rust and Python have?

Ambiguities? Visibility of name conflicts between user modules and Mason packages?

For example, how would you specify between the two Foos?

module Baz {
  ...
}

module Foo {
  module Bar {
    module Foo {}

    use Foo;    // ambiguous; or (my preference) child-Foo using relative paths

    // Python-like syntax.
    use .Foo;   // child-Foo referencing `self::Foo` module in Rust
    use ..;     // parent-Foo referencing `super` module in Rust

    use Baz;    // Today, this would work. Should it? What about other packages?
    use /Baz;   // Unambiguous from top. My fake syntax that starts from "root".
                // .. What does "top" even mean?
  }
}

Another example in #10946 (comment).

Question: What's the default search-path behavior?

Absolute vs. relative pathing. Is there one? Relative pathing is more natural. This would affect the ambiguous search case and Baz.

Question: What's the visibility for items in a "package"?

The compilation boundary in Rust is a crate. The default visibility for items in a crate was changed from private to pub(crate) in order to facilitate easier reuse of modules within the crate among sibling modules. Before that change, there was excessive re-exporting of items that could make the apparent module hierarchy (the filesystem layout) significantly different than the actual module hierarchy.

Chapel doesn't really have a notion for a package, but I think the question of item visibility will also be a concern in order to minimize re-exports. Python doesn't have this problem because everything is public visibility (for better or worse), so maybe it's not a big concern since Chapel already has default public behavior; the main downside to this behavior is when someone uses a deep submodule of a package that was not intended to be exposed outside and their code later breaks because that submodule changed/was deleted (but maybe that's on them).

@mppf
Copy link
Member

mppf commented May 24, 2019

Question: Will Chapel will need the same pathing distinctions that Rust and Python have?

I think so - I think we'll need a way to specify the difference between an absolute path and a relative one, at the least. I think this is only about use statements though, to be clear.

Question: What's the default search-path behavior?

I would agree that relative paths are more natural. However I'm open to considering the alternative.

Question: What's the visibility for items in a "package"?

We could introduce a visibility like package as an alternative to public and private and make that default. That would amount to following Rust's rules most closely.

But either way, if there is some module M that wants to also export M.Detail, it would need to public use M.Detail; (likely it would "use only" but IIRC we are thinking about changing that default). If did private use M.Detail or just used things like M.Detail.someFunction() (with no use of Detail at all), I would not expect that M.Detail would be available to code using M.

That leads me to wonder if it would be good enough to rely on that property to control whether or not Detail is exported at all from M. In that event, functions eligible for export in Detail would be marked public, but they wouldn't necessarily be available if Detail were not exported or if the functions were not included in a public use bringing in symbols in addition to the module name.

@bradcray
Copy link
Member

bradcray commented Jun 7, 2019

I've ignored this issue for a month because it was driving me a little crazy when I was active on it, and then the conversation snowballed to the point where I was unable to keep up (which then sapped my motivation to even try to catch up). I started into an attempt to catch up with it today and quickly felt overwhelmed again, so ended up just taking a really quick first pass through it, mostly skimming for the sake of time, and trying not to get too hung up on details. I have a feeling that what we're going to need at some point (maybe now, but more likely not quite yet) is a new issue proposing a strawperson plan that wouldn't require everyone to digest the discussion on this one in order to understand it.

As a baby step towards catching up and re-engaging on this topic, let me try and state the concern that I was left with a month ago and that's been rattling around in my head since dropping off. It seems relevant to a few of the comments that caught my eye today as I was trying to catch up, like this one:

Bryant: I didn't grasp the distinction between a submodule versus useing another module until now

In doing so, I'm going to ignore (for now) a bunch of other questions that were asked of me and comments that seemed like they wanted a response to try and keep this manageable. Moreover, I'm going to do this without talking about files and directories at all because I think my concern is unrelated to that aspect of the issue (which is unfortunate, since that's the topic of the issue! :) ).

My mental model of Chapel's namespaces and scoping (which I believe matches what is implemented today), goes something like the this:

  • all symbols are defined within some scope; that scope is sometimes something named like a module or an enum or a function; sometimes it's something anonymous like a compound statement.
  • all modules are either top-level modules or sub-modules of some other module
  • if there is any notion of a global scope in the language, it is the collection of all of the top-level module symbols and nothing else. This is why I've been trying to re-train everyone, including myself, to use the term "module-scope variables" rather than "global variables" lately... I don't think that Chapel actually has any global variables other than the names of top-level modules... This is also why I get concerned when Bryant says things like "since Chapel is a single namespace language..." in contexts that suggest that everything gets lumped together into a single global scope, and tried to clarify what we mean when we say that here)
  • Chapel's modules are overly porous / infectious today due to the fact that all uses are transitive (or public) and private use has not been implemented yet (Lydia is taking a look at that in this sprint). This may contribute to why it feels like there is a global or single (in the bad sense) namespace.
  • In addition, Chapel's automatic use of certain modules like IO and Math combined with the previous bullet also tend to make things far more porous and global-seeming than they ought to (as in issue module use pollutes local scopes due to public/transitive use ChapelStandard #13118).

Given that, when I think of a Chapel program, I tend to think of its structure as being formed around the nesting or hierarchy of modules. For example, given the code:

module M1 {
  module S1 { ... }
  module S2 { ... }
}
module M2 {
  module S1 { ... }
  module S2 { ... }
}

my mind pictures the following (and apologies, but I'm going to use a directory hierarchy notation for convenience, though I'm not trying to tie this back to directories and files in any way):

/
  M1/
    S1/
    S2/
  M2/
    S1/
    S2/

Moreover, when I think of use statements, I tend to imagine symbolic links (ugh, file system analogies again) that point from the scope where the use occurred to the symbol or symbols that it makes available (or * if it's not filtered at all) permitting them to be referenced as though they were defined within that scope.

OK, so where a lot of this conversation hung me up a month ago is due to what seemed to me like a recurring theme of "I want to use a module / make it known to the Chapel compiler, but I don't want anyone else to be able to see it, yet I don't want to make it a submodule." And to my thinking, that seems inelegant and counter to the Chapel's design.

As a specific example to talk about, let's go all the way back to example 1:

  main/
    main-module.chpl # Uses M
  M/
    src/
      M.chpl  # Uses L
      L.chpl

In my mind, regardless of the arrangement of files and directories here, the options for the module hierarchy on master today are either:

case 1: sub-module

/
  main-module/
  M/
    L/  # L is a sub-module of M

in which case nobody can get to L without going through M (assuming M lets them by making L public)

or:

case 2: sibling module

/
  main-module/
  M/
  L/  # L is a sibling of M

in which case we shouldn't be surprised if others can see L because we haven't done anything to hide it from them.

What I worry about is that it felt like the original post and several of the comments have been wanting something new and different like the following:

case 3: private sibling module

/
  main-module/
  M/
  L-but-private-to-M/  # L is a top-level module but nobody other than M is allowed to know about it

To me, this feels like a new, complicated, and unnecessary concept that I'd like to avoid if at all possible. That is, I believe that if you don't want others to know about L, it should be a sub-module of M; and that if you're not willing to do that, it should be OK with you for others to refer to L when it isn't shadowed, since it's defined at the top-level. Maybe put another way, I'd like to avoid injecting a notion of permissions onto the module hierarchy such that some modules can see certain top-level modules while others cannot.

So my baby step for today is to pause at this point and see whether anyone (but particularly @mppf and @BryantLam) disagree with what I've written here (where you're welcome to point out "Yeah, we'd already come to this same conclusion midway through that huge conversation you skimmed"). Most specifically:

  • Does anyone believe Chapel needs to support case 3?
  • If so, why? (i.e., why isn't making L a private sub-module of M sufficient?)

@mppf
Copy link
Member

mppf commented Jun 7, 2019

My viewpoint today is that I'd be pretty happy with a system emulating some of the Rust proposals (e.g. what I outlined in #12923 (comment) ) which provides for easy submodules but not for the private-sibling-module pattern. However I view this as having some of the same features of the original proposal in terms of customizing how the compiler "finds" modules (since e.g. M.chpl can access M/L.chpl but the rest of the source code cannot). But yes, it does so with submodules rather that private-sibling-modules.

I think that it would be reasonable for an author trying to achieve Case 3 from the original issue description to use to put L.chpl inside of M somehow. I have some concern that if doing so is not really easy / intuitive in terms of files and directories that users won't do it. Additionally, I think the question of whether or not it should be an error for Mason packages is a bit fraught. Perhaps mason should merely print out the names of the modules that are being exported.

@BryantLam
Copy link

BryantLam commented Jun 14, 2019

I agree with Michael. Admittedly, this issue went off on a tangent for a bit, but it's all related to the original post's issue of conflicting same-named modules in the global module path (#8470). Reusing Example 1 from the original post (modified to include K) and ignoring the proposed solution in #12923 (comment) that has partially forked into a separate discussion in #10946

# Filesystem Structure
# Example 1B

  main/
    main-module.chpl # Uses M, K
  M/
    src/
      M.chpl  # Uses L
      L.chpl
  K/
    src/
      K.chpl  # Uses a completely different L
      L.chpl

The module hierarchy is:

# Module Structure
# case 1: sub-module
/
  main-module/
  M/
    L/  # L is a sub-module of M
  K/
    L/  # This independently developed L is a sub-module of K

Today, this program cannot be compiled without conflicting-module errors.

chpl main-module.chpl -M M/src -M K/src
# error about redefinition of L

How do you solve this issue without local module search paths? One option is to do it using low-level primitives like the include statement that you proposed (#12923 (comment)), but that is both too flexible and duplicative of use statements if we are to take a strategic view and consider what it means to package and distribute Chapel libraries. We have capable language features (use and/or import) and can impose some arbitrary—but well-intentioned—restrictions to the file/directory layout (#10946) with the end goal of still requiring local module search paths, but now we would have fewer paths to actually search through.

Edit: I do agree with you. I don't think case 3 of private sibling modules is something I'm overly concerned with in the discussion regarding how to lay out code. In libraries, there will be a package-level module (i.e., Mason package) that has to be exposed as the entry module into that library, similar to a main module of an application. It's why these questions are particularly relevant regarding visibility of symbols within a package boundary.

Edit2: Part of the debate which eventually led to #10946 was how to split submodules into other files so the compiler can still find them.

@bradcray
Copy link
Member

Thanks for (eventually) asking the simple yes/no questions I was looking for with only minimal (5ish?) other unrelated paragraphs. I'll get back to the topic at hand soon.

@mppf
Copy link
Member

mppf commented Jul 23, 2019

I've split off the specific concrete proposal I think we might have some agreement on into #13524.

@mppf
Copy link
Member

mppf commented Apr 24, 2020

We implemented something along the lines of #13524. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants