Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module search path ordering and name conflicts with bundled modules #25306

Open
mppf opened this issue Jun 17, 2024 · 4 comments
Open

module search path ordering and name conflicts with bundled modules #25306

mppf opened this issue Jun 17, 2024 · 4 comments

Comments

@mppf
Copy link
Member

mppf commented Jun 17, 2024

In my recent work to bring more of the logic about what to parse to the frontend library in PR #25125, I ran into the test userInsteadOfStandard. In some discussion with @bradcray I learned about the purpose of this test and that brought up some design questions & I'm creating this issue to discuss these questions. The questions have to do with the order in which something like use Foo searches for Foo in the standard/internal/package modules or in user paths.

Search Path Ordering

The dyno frontend is searching for modules in the following order:

  • bundled module paths
    • includes internal module paths; standard module paths; modules/packages; distributions
  • user paths
    • includes -M, current dir, and directories containing files named on command line

This is different from the production compiler's search order:

  • user paths
    • includes -M, current dir, and directories containing files named on command line
  • bundled module paths
    • includes standard, package, and internal library paths

This leads to the change in behavior for userInsteadOfStandard with PR #25125.

Key Questions

  1. What behavior do we want when a user inadvertently uses a module name that conflicts with a bundled module?
  2. Should it be possible for the compiler to work with two different top-level modules with the same name?
  3. Is it a breaking change to add a new module to the bundled modules/ directory (including standard, internal, package modules)?
  4. Should modules in the bundled modules/ be searched for before or after user modules from -M / . / same directory as other code?

A Brief History of Time userInsteadOfStandard

test/modules/bradc/userInsteadOfStandard/ was added in 2009 in 0cbefbd. The purpose of this test is to check that we have behavior we like for the scenario in which a user unwittingly names a module the same as a standard/internal/package module. Originally, the test had a Math.chpl that the test is imagining was created by a user who didn't know that there is a Math.chpl in the standard library. The test has a foo2.chpl that uses the module with the conflicting name.

Note that the purpose of this test is not to test that we can replace a standard library module in the field. We have --prepend-standard-module-dir / --prepend-internal-module-dir for that & as far as I know, there is agreement that these flags (or similar flags) should be required to get the replace-standard-library behavior. (I and others have been confused about this in the past).

It is not immediately obvious to me if this test is intended to focus only on automatically included modules. However, the questions it raises apply either way.

Commit 557eba1 (also in 2009) added a mechanism to the compiler to process standard/internal modules separately from user modules, so that the use statements in the standard/internal modules only search in the standard/internal module paths (and not the user directories). The idea was that in a case like this, the compiler could actually have two Math modules; one from the user's directory and one from the standard library. It would rename one of these, internally.

I think this logic worked for this test until PR #19306, which accepted a temporary change in behavior for this test, in order to make progress on other issues. After that:

  • PR Split the Math module into two parts, one of which is included by default #19849 changed it from Math.chpl to AutoMath.chpl on the justification that the test was added to exercise a name conflict with an included-by-default module & so this preserved the behavior while we were pulling out the automatically included part of Math to AutoMath.
  • PR Add workarounds to get userInsteadOfStandard working for now #20188 added various workarounds to get this test working. The workarounds included a special case in the compiler for AutoMath, Errors, ChapelIO and Types; and also adding various procs to the user's AutoMath.chpl to get things to compile (because there was only one AutoMath module & the user's one was replacing the standard one; likely due to my own confusion about the purpose of this test)
  • PR Replace c_*alloc functions with unstable allocate #22358 changed it to ChapelIO for the conflicting module name due to changes in behavior resulting from changes in module initialization order. It continued to add some procs to keep standard library code compiling since the single ChapelIO module the compiler gets is the one from this test rather than the standard library.
  • In working on PR Have the frontend library drive the process of parsing #25125, I saw failures with this test which led me to realize that the dyno frontend is behaving differently from the production compiler in terms of the order of searching standard/internal modules vs user modules.

Issue #23100 describes a related issue with a user module named search.chpl conflicting with modules/packages/Search.chpl on case-insensitive filesystems.

Summary of Discussion

Recall the key questions:

  1. What behavior do we want when a user inadvertently uses a module name that conflicts with a bundled module?
  2. Should it be possible for the compiler to work with two different top-level modules with the same name?
  3. Is it a breaking change to add a new module to the bundled modules/ directory (including standard, internal, package modules)?
  4. Should modules in the bundled modules/ be searched for before or after user modules from -M / . / same directory as other code?

I'm aware of 3 strategies here. I'll assume that the conflicting module will be named XYZ.chpl for the purpose of discussion below (although it has been Math, AutoMath, and ChapelIO at various times for userInsteadOfStandard).

A: Work with the bundled module

The idea here is that there can only be one module with a given name. Even if there is a user module with that same name, use XYZ; needs to refer to the bundled one because otherwise other bundled code or other library code (say in another mason package) depending on that bundled module will break.

Pros:

  • consistent behavior
  • simple to implement and understand

Cons:

  • adding a new standard/internal/package module is arguably a breaking change
    • perhaps this can be addressed with an idea like Rust Editions

Compiling userInsteadOfStandard/foo2.chpl would result in this compiler output:

<command line>: warning: ambiguous module source file -- using /home/mppf/w/12/modules/standard/ChapelIO.chpl over ChapelIO.chpl
foo2.chpl:1: In module 'foo2':
foo2.chpl:31: error: unresolved call 'testchapelio()'
foo2.chpl:31: note: because no functions named testchapelio found in scope
foo2.chpl:31: note: unresolved call had id 308230

B: Work with the user's module

The idea here is that there can only be one module with a given name, but that the user's code needs to keep working if possible, and therefore the user's module should be used. Note that this means that other bundled code or other library code (say in another mason package) depending on that bundled module will break because they will try to use things from the bundled module that don't exist in the user's module. This is what we have now for 2.1 & in my opinion this option is untenable.

Pros:

  • works OK if the bundled module is never used by anything (e.g. I'd expect that to be the case for a little-used package module such as module/packages/Buffers.chpl)

Cons:

  • fails for any internal/standard module used automatically or used in the application (potentially within a 3rd party library used by the application)
  • adding a new standard/internal/package module is still arguably a breaking change, because if user's code has that same module name, it can cause compilation failures elsewhere
  • using certain file / module names can lead to mysterious breakage apparently from the internal/standard modules

Compiling userInsteadOfStandard/foo2.chpl would result in this compiler output:

warning: Ambiguous module source file -- using ./ChapelIO.chpl over $CHPL_HOME/modules/standard/ChapelIO.chpl
$CHPL_HOME/modules/standard/IO.chpl:841: error: Bad identifier in 'only' clause, no known 'write' defined in 'ChapelIO'
$CHPL_HOME/modules/standard/IO.chpl:841: error: Bad identifier in 'only' clause, no known 'writeln' defined in 'ChapelIO'
$CHPL_HOME/modules/standard/IO.chpl:841: error: Bad identifier in 'only' clause, no known 'writef' defined in 'ChapelIO'

(The test as-written adds procs to ./ChapelIO.chpl to avoid the errors above, but I don't think that's reasonable to expect in the inadverdent-same-name case; more generally with a module XYZ this will work as long as the bundled XYZ is not used in the compilation).

C: Work with both

The compiler should work with two modules with the same name, where one is a bundled module, and one is not. This strategy was used in 557eba1 but has since stopped working on main. The idea here would be to improve the compiler in some way to support this.

Pros:

  • adding a new bundled module is not a breaking change

Cons:

  • goes against the current design "there can only be one top-level module with a given name in a given compilation" (more on this below)
    • complications for separate compilation
    • compilation for understandable error messages (if one of the modules is internally renamed inside the compiler, we have to un-rename it in the error messages; additionally, error messages mentioning the modules might be very confusing since the error message can't distinguish between the two)
  • a library making use of this property will stop working if combined in certain ways; consider for example a Mason package named Math; that might work at first but will cause problems in an application that uses that library and also wants to use proc sin from the bundled Math library
  • potential for "hijacking" if an application is written using the bundled XYZ and depends on another Chapel library that later adds a top-level module named XYZ
  • significant implementation complexity, especially with the query-based dyno frontend
  • only helps with bundled modules; does not help when multiple libraries / mason packages try to use the same top-level module name. Relatedly, it would still not be possible for a library to add a top-level module as a non-breaking change for users of that library.

Compiling userInsteadOfStandard/foo2.chpl would result in this compiler output:

warning: Ambiguous module source file -- using ./ChapelIO.chpl over $CHPL_HOME/modules/standard/ChapelIO.chpl
About "there can only be one top-level module with a given name in a given compilation"

We have made a number of decisions recently that seem to double-down on the idea that there can only be one top-level module with a given name. Here are a few examples:

From #7847 (comment) :

"I want to have two modules with the same name in a single Chapel program." But that isn't possible / legal in Chapel as it's defined today (and I don't have a vision as to what it would mean to attempt to support it.

From #8470 (comment)

[regarding a "multiple definitions" error in a similar situation] I think this is appropriate—I tried to compile a program with two modules with the same name at the same scope and that doesn't make sense. It's hard for me to imagine that the language should do any more than this.

Issue #12923 proposed having different module search paths per-module, but this proposal was dismissed.

Issue #19312 concluded with the decision that it's not possible to define a user module that shadows an automatically-included symbol. That issue is saying that it's not generally possible to make a symbol with the same name as something from the automatically included modules; it is similar to the question of if it's possible to make a module with the same name as a bundled module.

@lydia-duncan
Copy link
Member

  1. What behavior do we want when a user inadvertently uses a module name that conflicts with a bundled module?

I think as much as we can, we should stick with the behavior stabilized in 2.0.

  1. Should it be possible for the compiler to work with two different top-level modules with the same name?

My personal opinion (if we were starting fresh on this decision) is:

  • When all files are included in the same compilation stage, probably not
  • When multiple libraries are linked together using separate compilation, I'm open to allowing two different top-level modules with the same name but they should be completely limited to the library they were originally compiled with in that case (which might involve giving them a munged name to avoid conflicts in the future or something). This might even be essential for supporting Mason when one package uses an older version of Chapel (if that's even something we want to support). But given that I'm not implementing separate compilation, I'm also okay with it not being allowed.
  1. Is it a breaking change to add a new module to the bundled modules/ directory (including standard, internal, package modules)?

I have a vague memory of us deciding that "yes, adding new things (functions, modules) will break user codes but no, that doesn't mean we aren't allowed to do it in the future", but I can't seem to track down that conversation. I remember it being distinct from things like new keywords (where it's a lot more fuzzy if we're allowed to do that), and possibly more okay when in a module that isn't included by default (and thus more limited uses/imports of the module's contents are possible to maintain old behavior, but of course that doesn't help with top-level module adds).

  1. Should modules in the bundled modules/ be searched for before or after user modules from -M / . / same directory as other code?

Any opinion I want to form on this is more strongly influenced by what we've already stabilized. I don't think a particular strategy is more important than having made a decision.

@mppf
Copy link
Member Author

mppf commented Jun 18, 2024

Thanks for your thoughts, @lydia-duncan. Regarding whether or not the current behavior needs to be preserved for stabilization reasons, I'm not so sure. Here are the two parts of my rationale for "maybe not":

  • the warning: Ambiguous module source file warnings would probably deter people from thinking they had reasonable (or stable) code
  • the times I'm aware of where this has come up (e.g. Module import ambiguity bug when conflicting with package module #23100 or the userInsteadOfStandard test itself) have lead to catastrophic compilation failure & we're only concerned about stabilizing working program, not compilation failures. (userInsteadOfStandard adds some stuff to its ChapelIO.chpl to prevent the compilation failure, but this isn't reasonable to expect of a user)

@mppf
Copy link
Member Author

mppf commented Jun 24, 2024

Another point about whether or not the current behavior is stable: the details of the module search path are currently only documented in a technote: https://chapel-lang.org/docs/technotes/module_search.html . Whether we added unstable warnings for it or not, it being in a technote rather than the spec is meant to communicate it is not stable.

@mppf
Copy link
Member Author

mppf commented Jun 27, 2024

PR #25125 changes userInsteadOfStandard to better test what happens if an bundled module name is inadvertently used, and it changes the behavior in that case to prefer the bundled module. It also adds an unstable warning for such ambiguities.

We should continue to discuss here what we want to happen in these cases.

mppf added a commit that referenced this issue Jul 1, 2024
This PR adjusts the integration between the frontend library and the
rest of the compiler. It changes it so that only live modules are
converted from uAST to AST and so that this process occurs in the module
initialization order. This is expected to reduce the number of fixups
required.

There are a few user-facing changes in various edge cases:
* This PR also changes how `require "something.chpl";` interacts with
the process of determining the main module & live modules. `require`
statements no longer impact the determination of the main module & a
module brought in by `require` will actually be dead-code-eliminated if
it is not `use` or `import`ed. See also issue #25112 for discussion of
these changes.
* This PR changes the behavior of
`test/modules/bradc/userInsteadOfStandard` which is meant to show what
happens if a user inadvertently creates a module with the same name as a
standard module. This PR changes the ambiguous module chosen from
`./ChapelIO.chpl` to the standard library one and adds an unstable
warning for such ambiguous module situations. See issue #25306 for
discussion about what should happen in these cases.
* The compiler now gives "surprising shadowing" warnings for cases where
an identifier could refer to an enclosing module name or something
brought in by a `use` statement.
* In some cases, errors in modules that are not `use` or `import`ed are
no longer emitted. The reason for this is that these modules are
eliminated earlier in compilation.
* In some cases, the compiler will emit an error about not knowing what
is the main module earlier in compilation before / in addition to other
errors.

More detail about implementation changes in this PR:
* This PR adjusts the dyno scope resolver to allow resolution of things
like `module M { M; }` -- here `M` can refer to the enclosing module
name even without `use` or `import`. This case was previously being
handled by the production scope resolver.
* This PR changes the way that certain built-in types are resolved. That
was necessary to make the process more able to handle convert-uast being
run on various modules in a different order from what the production
compiler uses. Since we already had a list of 7 builtin types that
needed to be wired up when their record/class declarations were found, I
went ahead and moved this mechanism to something mediated by "well-known
types" processing in wellknown.h / wellknown.cpp. Then, I added `_tuple`
to the list of things that are processed in this way. Note that this
involved moving several global variables for well-known types to
wellknown.h / wellknown.cpp from various other locations.
* Since I removed the global variable `currentModuleType`, pass the
current module tag to `buildManageStmt`.
* I moved the `nestedName` warning from `checkUast.cpp` to
`post-parse-checks.cpp` (note that `checkUast.cpp` checks the production
AST just after it was created by converting the uAST to it & shoul[d be
renamed to `check-generated-ast.cpp`). Since this warning works with the
`ImplicitFileModule` warning, I adjusted `parseAndConvert.cpp` to delay
them both, so they can be issued together.
* I added a global variable `gConvertFilterModuleIds` for the live
module analysis to communicate to `convert-uast.cpp` which modules
should be converted. Using a global variable in this way is not ideally
and ideally this would be managed by having convert-uast.cpp only
convert one module at a time.
* Most of the changes are in `parseAndConvert.cpp`. Now the dyno code /
frontend is in charge of deciding what to parse and convert and that
process works with `chpl::parsing::findMainAndCommandLineModules` and
`chpl::resolution::moduleInitializationOrder` to decide the order in
which modules should be converted. I ran into some challenges where
other parts of the compiler assume that certain modules appear early in
`allModules` or that their DefExprs appear early in a traversal over
`theProgram`. `processInternalModules` works around these issues by
rearranging the key modules to appear in an order that matches what the
rest of the compiler is used to.
* I added some string utility functions (`startsWith` and
`replacePrefix`) in a new `string-utils.h` / `string-utils.cpp` in the
frontend library.
* I added a `Context::error` & related methods to various Error handling
classes to accept an `IdOrLocation` in order to wire up errors that
indicate `<command line>` as the location for the error.
* I added a `libraryMode` to `findMainAndCommandLineModules` to hide
errors about the main module when creating a library. I also added some
helper functions `checkFileExists` `getExistingFileInDirectory`
`getExistingFileInModuleSearchPath` to support some operations using the
module search path.
* I adjusted `Scope` to track if it contains a `require` statement. This
allows `resolveVisibilityStmts` to more accurately filter on which
`Scope`s have meaningful work to do in order to resolve
use/import/require.
* I added `deduplicateSamePaths` to `filesystem.h` / `filesystem.cpp`
that uses LLVM Support library functions to detect and remove paths that
are redundant because they refer to the same filesystem element.
* I adjusted `findMainModuleImpl` to consult the modules loaded as
library files as a potential source for the main module. I also improved
the error messages from this function.
* I tidied up `addCommandLineFileDirectories` & adjusted it to put the
command line file directories and `-M` paths before the paths from
`CHPL_MODULE_PATH`.
* I added `doLookupEnclosingModuleName` to implement lookup for cases
like `module M { M; }`. I ran into problems with certain files named a
keyword, so this uses `isReservedIdentifier` to ignore module name
lookups for implicit modules with those names. It would be nice to
adjust the lookup process to be more robust in this way, but this is the
simple solution. See also #19197 which tracks problems in this area.
* I adjusted `resolveVisibilityStmts` process `require` with the module
search directory if there are no `/`s in the path. I also adjusted it to
process modules loaded by `require`. This addresses a pattern currently
used in `test/parsing/errors/nameLength` where the main file `require`s
another, which `require`s a third file; and then the main file `use`s a
module expected to be made available by this nested `require`.
 
Reviewed by @DanilaFe - thanks!

- [x] full comm=none testing
- [x] full `CHPL_COMM=gasnet` testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants