Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compiler warning when use statement is not at start of scope #13041

Open
BryantLam opened this issue May 17, 2019 · 19 comments
Open

Add compiler warning when use statement is not at start of scope #13041

BryantLam opened this issue May 17, 2019 · 19 comments

Comments

@BryantLam
Copy link

Feature request for a compiler warning. Low priority until someone else runs into this behavior.

Create a warning for #12744 when a use statement is not at the start of scope. Otherwise, bugs will occur when users encounter this behavior.

module A {}
module B { var x = 13; }

var x = 42;

proc main() {
  writeln(x); // 13
  use A;
  use B;
}
@bradcray
Copy link
Member

To me, this seems more related to the plan to support a Python-style import that makes nothing available by default than to where the use statement occurs. I.e., I don't think putting the use statement at the top of the scope would make a user any less surprised that they were referring to the wrong x if they didn't realize that module B defined one? Or are you thinking that they're well-aware that B defines an x but didn't expect it to go into effect until the use was reached? If that's the case then issue #13042 seems much more useful to me.

Maybe put another way, I think it's often useful to put a use M only foo; statement into the middle of a block near where foo is needed similar to how it's useful to declaring variables where they're needed rather than being required to declare them at the top of a scope as in old-style C, so wouldn't want a warning barking at me if I chose to use that style. It could be an opt-in warning of course, but then it's likely that users wouldn't know to opt into it until they'd been bitten (again making #13042 seem more useful).

@BryantLam
Copy link
Author

BryantLam commented May 17, 2019

To me, this seems more related to the plan to support a Python-style import that makes nothing available by default than to where the use statement occurs.

Agreed.

Maybe put another way, I think it's often useful to put a use M only foo; statement into the middle of a block near where foo is needed similar to how it's useful to declaring variables where they're needed rather than being required to declare them at the top of a scope as in old-style C, so wouldn't want a warning barking at me if I chose to use that style. It could be an opt-in warning of course, but then it's likely that users wouldn't know to opt into it until they'd been bitten (again making #13042 seem more useful).

This is where I disagree with the rationale for why use or import statements should be allowed in the middle of a block. Is a naive user going to be surprised about the behavior of this code?

module A {
  var keepWorking = false;
}
var keepWorking = true;
proc main() {
  while keepWorking {
    writeln("critical operation!");
    var someCondition = true;
    if someCondition {
      break;
    }
  }

  use A only keepWorking; // Remark: Why, user?! Why? Such bad code...
  if keepWorking {
    writeln("operation just for A");
  }
}
// Hint: Nothing gets executed.

The proposed warning would force the user to introduce an indented scope if they want a use statement in the middle of their code. In fact, that could even be part of the warning; tell them to introduce a scope if they really want to use a module where they put that statement.

@BryantLam
Copy link
Author

BryantLam commented May 28, 2019

Ironically, I hit a problem when abiding by this guideline. The following code broke with a nested scope:

var e: [1..8] int = 1..8;
{
  use UnorderedCopy;
  // UnorderedCopy or its dependencies has a symbol conflict on `e`.

  ref eArray = e; // error: Cannot set a reference to a param variable.
  writeln(eArray);
}

This error took a bit for me to debug because the compiler error didn't have a note on where the param variable was. This error is easily remedied:

  1. Recommended workaround: use M only
  2. Feature request: use with private by default / Python-style import
  3. De-scope the block much to my disappointment in this instance.

@bradcray
Copy link
Member

I agree that this example is bad news / super confusing.

This is a pretty interesting case because UnorderedCopy doesn't have any (explicit) dependencies (i.e., no use clauses within it). So then what's going wrong? The Chapel compiler auto-inserts uses into every typical module (user, standard, or package) to result in the auto-use of things like I/O, Math, etc. And then this bites you because the Math module defines a param e... and because uses are currently always transitive.

I think the fix here is to implement issue #6093 (adding support for private, non-transitive uses), and then to have the compiler-generated uses of the standard modules be a private use.

@bradcray
Copy link
Member

I've forked off issue #13118 to capture the case Bryant ran into just above.

@bradcray
Copy link
Member

bradcray commented Jul 8, 2019

Just noting for anyone who didn't subscribe to my forked-off #13118 that the issue Bryant ran into above is now resolved on master due to @lydia-duncan 's PRs #13372 and #13253 (as well as other supporting PRs).

@mppf
Copy link
Member

mppf commented Oct 4, 2019

Somehow

proc f() {
  innerFn();
  proc innerFn() { }
}

seems fundamentally less confusing to me than

proc f() {
  someFn();
  use M; // assuming M defines someFn
}

It looks like a trade-off between reducing one arguably surprising use of modules and making them in some way more consistent with locally-defined symbols. Which would be more useful to easy understanding of the language itself and code written in it?

Personally, I think the use M case above is fundamentally different from the proc innerFn case, because the use M does not have to specifically name what symbols it makes visible. (For this reason, import M; does not seem to be nearly as problematic).

Anyway, the rule can't possibly be "use has to be at start of scope" because interpreted literally that would disallow

proc f() {
  use A;
  use B; // not at start of scope
  ...
}

I think instead the proposal I would advocate is that the compiler emit a warning if code relies on a symbol brought it by a later use statement. Or, alternatively, to not allow the code before the use statement to have access to the symbols it brought in at all. Either of these would still allow a use statement in the middle of a block near where it is needed.

@BryantLam
Copy link
Author

BryantLam commented Oct 4, 2019

I think instead the proposal I would advocate is that the compiler emit a warning if code relies on a symbol brought it by a later use statement. Or, alternatively, to not allow the code before the use statement to have access to the symbols it brought in at all. Either of these would still allow a use statement in the middle of a block near where it is needed.

I'm okay with both proposed solutions. I would advocate for the second. This issue was a consequence from the confusing behavior I found in #12744. Maybe there's more support for changing this behavior now that module symbols are treated a bit uniquely as of 1.20.

@mppf
Copy link
Member

mppf commented Jul 1, 2021

I am wondering if we are still thinking of changing this? Some of the proposed changes would be breaking language changes, even if they adjust a pattern that is confusing and probably not occurring very frequently for that reason.

@bradcray
Copy link
Member

bradcray commented Jul 1, 2021

I continue not to be much of a fan of this proposal, at least from the perspective of having it be on by default. I wouldn't object to an "opt-in" feature for it.

Some of the proposed changes would be breaking language changes

Do you mean "breaking" in the sense of "your code will generate warnings where it didn't before?" or in the sense of "your code won't work anymore"? If the latter, what's an example? I think I missed that the first time around.

@mppf
Copy link
Member

mppf commented Jul 1, 2021

In #13041 (comment) @BryantLam was advocating for "not allow the code before the use statement to have access to the symbols it brought in at all." which would not be a warning but rather a fatal error.

@bradcray
Copy link
Member

bradcray commented Jul 1, 2021

Ah OK, I was focusing too much on the OP's request when interpreting "changing this" in your comment. I personally don't have much appetite for doing more than a warning here because it feels like a big change (placement of uses matter) that introduces a non-orthogonality (non-executable statement orders now matter sometimes) in order to address a case that I don't think comes up very often (I put my use/import into the center of a block and also referred to a symbol it introduced before it), and probably less so in a world with import (where I'm not going to be bringing anything in without explicitly asking for it).

I'd be less opposed to:

  • an opt-in style-oriented warning for use or import statements that follow other statements in a block (except for maybe require?)
  • an on-by-default warning that occurs when an identifier could be resolved either by one obtained from a use or import appearing later in the block or by referring to an outer-scope symbol (i.e., "Chapel's not like C, are you confused?").

@mppf
Copy link
Member

mppf commented Jul 3, 2021

When I first was looking at this issue I was just wondering if we could close it but now I'm thinking we actually need to make the breaking change.

it feels like a big change (placement of uses matter) that introduces a non-orthogonality (non-executable statement orders now matter sometimes)

But doesn't the order of use statements already matter?

// Example A
module Library {
  module SubModule {
    var str = "str from submodule";
  }
}

module Program {
  // this does not currently compile, but swap these two statements and it does
  use SubModule;
  use Library;

  proc main() {
    writeln("str=", str);
  }
}

I care about this because I have been hoping for the reworked compiler to combine scope resolve and resolution into a single "pass" that generally operates in program order (departing from that to go resolve a called function etc). I understand that we'd like to be able to refer to a class, module, or function declared later in a scope. But use statements themselves seem to be more complicated themselves and that seems to mean that at the very least the use statements are processed in order.

I think my mental model of the production compiler has been that it handles non-function identifiers in one pass (scopeResolve) and then function later on. But here I think the behavior is implying that there are currently actually 3 passes:

  1. Resolve use statements (i.e. figure out what say 'Library' means in the above example)
  2. Resolve other variable names etc (what I think of as scopeResolve)
  3. Resolve function calls.

Now, why would I care about combining these into one "pass"? There are features that are quite hard to build in the current construction. One is having a class inheriting from an instantiated generic (e.g. class IntImplementation : MyGenericClass(int)) which seems like something we'll want or (more of a reach) using a type function in the inherits list (e.g. class MyClass : computeParentType()). I can resolve these issues by combining what are traditionally two passes (scopeResolve and resolve). But these examples don't care about the "use" statement order or really the order in general, other than that we would need to be able to resolve some functions while establishing the class hierarchy.

However there is a similar functionality that seems related to this ordering issue. Say we wanted to support something along the lines of #17438. Here's an example:

// Example B
param favoriteModule = "Math";
proc computeRequiredModule() param {
  ... uses of other identifier possibly from a use stmt ...
  return favoriteModule;
}
use computeRequiredModule();

This example seems intuitive and straightforward, but if we process the use statements before resolving any functions, I don't see how it could work because of a circular dependency.

Here is another related example that shows that computeRequiredModule could need symbols from use statements before and after use computeRequiredModule() if we were to preserve the current behavior:

// Example C
module Helper {
  param favoriteModule = "Math";
}
module OtherModule {
  param favoritePrefix = "";
}
module Program {
  use Helper;
  use computeRequiredModule();
  use OtherModule;
  proc computeRequiredModule() param {
    return favoritePrefix + // favoritePrefix comes from a use after the one calling this fn
               favoriteModule; // favoriteModule comes from a use before the one calling this fn
  }
}

Now, this can't possibly compile, because we have to process the use statements in order (see Example A). But if we have the language design approach that function bodies etc are all resolved assuming that all use statements are processed, this would seem like it should compile. But it's possible to handle cases like this with the idea that the use statements (and the things they mentioned) are processed only in order (which allow Example A and a variation of this example with use Helper; use OtherModule; use computeRequiredModule();).

Other variants have a similar problem. This version uses a config param:

// Example D
config param moduleName = "bla";
use moduleName;

This example is reasonable to handle with in idea of "resolve the use statements and dependencies in order" but if use moduleName can affect the way statements above it are resolved then how can we resolve the config param since there is a circularity?

Here is the example from #17438 (comment) :

// Example E
use Version;
use (if chplVersion >= createVersion(1, 24) then Memory.Diagnostics else Memory);

How can we resolve / evaluate the conditional if all use statements can provide identifiers for it? There's a circularity problem. At the very least the contents of the expression in the use statement would need to be resolved in order of the use statements. What I'm trying to show in the other cases here is the way that the current "resolve use statements in order" rule could extend naturally to resolving other things - or it could conflict with the idea that other things are always resolved assuming the use statements are processed.

@lydia-duncan
Copy link
Member

Yeah, if we were to enable use and import to get modules by resolving more complicated syntax, we'd definitely need some sort of tracking of which use/import statements rely on which symbols that rely on which other use/import statements to determine what code combinations are and are not valid. Allowing use/import statements to occur at any point in a scope complicates that, but so does allowing functions to be called from any point in a scope.

I think in practice there aren't very many codes that rely on use/import statements behaving this way. However, I'm cautious about deciding to do away with it as a result - the concept of keeping a strict order for code is not new and so was definitely considered when we originally decided use statements should be possible at any point in the scope and affect symbols defined before it lexically. It's important to me to understand the reasons that caused us to make this decision, even if we decide to move away from it.

@bradcray
Copy link
Member

bradcray commented Jul 6, 2021

But doesn't the order of use statements already matter?
[example of use referring simply to sub-module of module already used]

That's a good point. If I were arguing for the status quo, I might say that relative order between use (and import) statements matter, but that relative order between use and non-use doesn't (at least, I can't think of examples where it does; can you?).

I need to stew on the rest of your proposal. The concept of supporting conditional use statements is attractive to me, but I remain skeptical about whether this can / should be done in a way that's unified with the rest of the language (vs. done in an early pre-processor style step or sub-language). As a specific point of skepticism, the notion of use-ing param strings as a way of computing module names is pretty undesirable to me (similar to how giving users the ability to "compute" variable names by doing string computations in the language would be).

@mppf
Copy link
Member

mppf commented Jul 7, 2021

But doesn't the order of use statements already matter?
[example of use referring simply to sub-module of module already used]

That's a good point. If I were arguing for the status quo, I might say that relative order between use (and import) statements matter, but that relative order between use and non-use doesn't (at least, I can't think of examples where it does; can you?).

Not exactly, but I was thinking about how the ordering requirements on use fit in with other language choices. As you know, we can put calls to functions first and function declarations later in a file. But we can't mention of module-level variables before its definition (e.g. writeln(xx); var xx = 1;). So one viewpoint is use statements are more like functions today but I'm proposing that they behave more like variable declarations.

In a conversation with @lydia-duncan I learned that the production compiler works by resolving the use/import statements first and then proceeding on to the other declarations (in scope resolve, so we're not talking about functions yet). I think this implementation detail supports your conclusion that the order between use and non-use doesn't matter.

However I think that the use statements being processed in order (and only affecting things below them) makes more intuitive sense than the alternative that we have today. Obviously that's subjective. I'd even go so far as to say that, with the current rules, relying on the ability to resolve a symbol brought in by a later use statement is bad style and an anti-pattern.

I need to stew on the rest of your proposal. The concept of supporting conditional use statements is attractive to me, but I remain skeptical about whether this can / should be done in a way that's unified with the rest of the language (vs. done in an early pre-processor style step or sub-language). As a specific point of skepticism, the notion of use-ing param strings as a way of computing module names is pretty undesirable to me (similar to how giving users the ability to "compute" variable names by doing string computations in the language would be).

I'm pretty sure it could be done (at least, with the ordering change we are discussing) in a way that's reasonable and unified with the rest of the language. Whether it should be done is question that needs more investigation.

I'm not particularly tied to the use-a-param feature; I was just trying to show some potential future directions and problems we are likely to run in to. I suppose some other languages have "first class modules" and I'd argue that any features along those lines will run into the same problems. (And there are almost certainly other ways to solve these problems beyond my proposal - but it's hard to say without talking about a specific problem.)

@bradcray
Copy link
Member

bradcray commented Jul 7, 2021

So, thinking about this en route to dinner I found myself wondering whether you were also going to propose as part of this change that a use-before-def of a variable within a given scope be changed to attempt to refer to an outer-scope variable of the same name rather than the local scope variable of that name, as in languages like C:

var x: int;
{
  writeln(x);  // currently refers to the `real` version and results in an error; you might be about to propose it refers to the `int` version?
  var x: real;
}

If not, how does the single-pass approach work? (a possible answer: variable resolution is done at the same time as function resolution, which would also unify paren-less functions and variables better).

But I'm not entirely clear how the single-pass approach works for functions either... For example, if I have:

proc foo(x: int) { ... }
{
  foo(1.2);
  use Foo; // introduces a `foo(x: real)` overload
}

is the implication that foo(1.2) would necessarily call the int version of foo() (because we hadn't processed use Foo; yet to f), or that it would also consider the Foo.foo() overload? If the latter, how does that work in a single pass compilation structure? Or maybe foo(1.2) would generate an error once use Foo was processed because we'd find additional definitions of foo() and make the original call more of a "use before all defs in this scope were found" type of error? (but that wouldn't seem to jibe with being consistent with simply putting a proc foo(x: real) { ... } in that spot given that you're obviously still trying to permit functions to be called before they're used.

Last night, trying to argue your side, I was going to suggest that since use statements can introduce variables, they should be more similar to inlining the declaration of the variable there, but this didn't hold up for me. Specifically, the reason we disallow references to variables before they're used is that the variable won't have been initialized yet. But for variables being obtained via a use/import, the variable should already be set up (at least, for programs that don't have an inherent circularity between their module-scope use statements and variable uses). So I ended up backing away from this and feeling like keeping them behaving more like functions seemed more reasonable.

But use statements themselves seem to be more complicated themselves

I suppose I agree that they're more complicated than the (typical) variable, procedure, or class definition in that they're making pre-existing symbols available rather than creating new ones, but at the end of the day, like var, proc, class decls, they make symbols available to a given scope that weren't before. What are you thinking of when you call them complicated?

@mppf
Copy link
Member

mppf commented Jul 7, 2021

So, thinking about this en route to dinner I found myself wondering whether you were also going to propose as part of this change that a use-before-def of a variable within a given scope be changed to attempt to refer to an outer-scope variable of the same name rather than the local scope variable of that name, as in languages like C:

var x: int;
{
  writeln(x);  // currently refers to the `real` version and results in an error; you might be about to propose it refers to the `int` version?
  var x: real;
}

Nope, I'm happy with the current behavior here.

If not, how does the single-pass approach work? (a possible answer: variable resolution is done at the same time as function resolution, which would also unify paren-less functions and variables better).

Actually "single pass" is a bit of an approximation. I'm imagining that the compiler can gather the declarations immediately in a scope (so when processing the block in this example, it will gather the var x as a declaration named x). This gathering process is much simpler though than processing use statements, and from a language design point of view, is less problematic IMO because the declarations are visible there in the code (i.e. there is no need to figure out some transitive closure of symbols available in use statements; it's just the declarations right there).

But I'm not entirely clear how the single-pass approach works for functions either... For example, if I have:

proc foo(x: int) { ... }
{
  foo(1.2);
  use Foo; // introduces a `foo(x: real)` overload
}

is the implication that foo(1.2) would necessarily call the int version of foo() (because we hadn't processed use Foo; yet to f), or that it would also consider the Foo.foo() overload?

In this example, I'd expect that the foo(real) overload is not available for that call.

(But it's also not quite accurate to say that all function resolution would occur in order - things like generics still need to be resolved according to when they are called. As we are talking about having more order constraints on use statements, if a use statement used a function, that function would need to be declared before the use).

But use statements themselves seem to be more complicated themselves

I suppose I agree that they're more complicated than the (typical) variable, procedure, or class definition in that they're making pre-existing symbols available rather than creating new ones, but at the end of the day, like var, proc, class decls, they make symbols available to a given scope that weren't before. What are you thinking of when you call them complicated?

Two things:

  1. Which symbols they bring in is not necessarily as obvious at the location of the use (for a human or for the compiler)
  2. Use statements have a bunch of features like as / only / except that further complicate the question of "which names are available due to this statement?"

Vs. for a function or variable declaration, it has a name which is right there and immediately obvious in the syntax.

@bradcray
Copy link
Member

bradcray commented Jul 8, 2021

In this example, I'd expect that the foo(real) overload is not available for that call.

So, if we were to go that route, then I think we should definitely either:

  • go back to the OP suggestion to warn when the use is not at the start of the scope; or
  • *simply require all use/import statements to be at the top of the scope (likely permitting require statements to precede them as well).

The reason is that it seems unintuitive to me that a use that brought a symbol into scope would act differently than defining the symbol in place of that use (which is what is happening in this example, essentially). Similarly, I'm guessing that:

var x: int;
{
  writeln(x);
  use Foo;  // defines `var x: real;`
}

would act differently than the previous version of the example that defined x in place of use Foo;? (i.e., that x would resolve to the int version?) And I think that would be similarly confusing / problematic.

In saying this, I'm not saying that my preference is to go this route rather than what we have today, simply to say that if we were to re-interpret use in this way, I think we'd want/need to make one of those changes to avoid this kind of trap which could occur when refactoring code.

Use statements have a bunch of features like as / only / except that further complicate the question of "which names are available due to this statement?"

I'd argue that as and only simplify the question by making it apparent what symbols are available due to the statement, but I agree that except and vanilla use are non-obvious. That said, while though use may be non-obvious, it still seems quite mechanical (i.e., "what compilers are fore"), and I think that "transitive closure" shouldn't come up often or result in a lot of extra work in well-structured modules (where our internal modules are an obvious poster child for the opposite, though they've been getting better with time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants