Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should range be a non-defaulted generic type? #18215

Closed
bradcray opened this issue Aug 12, 2021 · 24 comments
Closed

Should range be a non-defaulted generic type? #18215

bradcray opened this issue Aug 12, 2021 · 24 comments

Comments

@bradcray
Copy link
Member

Traditionally, range has been a generic type that is fully defaulted, with:

  • int indices
  • fully bounded
  • not capable of storing non-unit strides
    The argument being that this was the common case and the one we wanted to optimize most for.

However, this has also resulted in a few stumbling blocks:

  • users are often confused that var r: range = 1..n by 2; doesn't work and get frustrated at having to change it to var r = 1..n by 2; or var r: range(stridable=false) = 1..n by 2;
  • arguments of range type need to be declared as : range(?) rather than : range to take "any range type", making them asymmetrical with domain (which does mean "any domain type" because domains aren't fully defaulted)

While reviewing the range type today, @e-kayrakli proposed that perhaps ranges shouldn't have default values. My immediate reaction was "That's ridiculous!" but on reflection, I'm having trouble coming up with arguments for how the defaults are helping us. Just because they're the common case doesn't mean that we rely on the defaults in that way at all (e.g., 1..n gets us that same common case without causing pain for anyone else).

To that end, this issue proposes we look into making range partially or completely non-defaulted in its generic fields to see what the impacts are.

This relates somewhat to #18214 due to the asymmetry between domain and range w.r.t. being fully defaulted.

@e-kayrakli
Copy link
Contributor

This is similar to a decision we made about the regex type: #14693

The only limitation that I can think of by making range have non-default fields is that you cannot do var r: range;, which you can today. I don't see much value in that given that ranges are immutable and r is just an empty range in that scenario.

Other than that, I think dropping the defaults will make things more permissive, such that you can pass any range to proc foo(r: range), which you can't today.

This permissiveness is generally positive for me. However, there has been subtleties with stridable ranges where you'd have to use alignedLow instead of low for example. So, status quo can protect us against those. Today, if you have proc foo(r: range), then that r is for sure non-stridable, and you can use low and high correctly. And if a non-stridable range is passed to foo that's a compile error, so it is less likely for you to bump into that subtlety. With other proposed changes to alignedLow/low, I think that's an even smaller concern, though.

@bradcray
Copy link
Member Author

I don't see much value in that given that ranges are immutable and r is just an empty range in that scenario.

I don't think it's that bad. You can still re-assign r, you just can't change it in-place.

But I agree that var r: range; is not a super high-value declaration. The main place I'd want to do it is as a field of a class or record, but in that context, it would still "work", but would simply make the object more generic than it is today (which is arguably a good thing).

Question: If we made range only partially defaulted (say we defaulted idxType to int, but nothing else), does a declaration like:

var r: range = ...;

require the RHS to be an int range, or is it like the formal types in #18214 where all of the defaults are ignored? (I can obviously experiment with this, just haven't tried yet). And does that answer change our opinion at all?

@bradcray
Copy link
Member Author

bradcray commented Aug 18, 2021

I'm taking a look at this, starting by simply removing stridable=false by default. Within five minutes I was able to get this classic stumbling block working, which is promising:

var r: range = 1..10 by 2;

writeln(r);
writeln(r.type:string);

@bradcray
Copy link
Member Author

Looks like ~100 tests need edits to keep working with a generic range type, which is... not bad. Some questions as far as next steps go:

  • is there any way to make this change incrementally, like other deprecated behavior, or is it a case of just holding our nose, pulling the plug, and hoping not too many users suffer?
  • should all three of range's generic fields have no default? Or just some of them?
  • should domains also stop defaulting to stridable=false?
    • upside: would be more consistent
    • downside: they're already generic (so it's not required to make them generic), it's not as much a known pain-point for users, and it's more code that will need to be updated (and longer type signatures required to define a concrete domain type)
    • midside: if we move away from using a boolean for stridable, they'll have to change in some way anyway.

@bradcray
Copy link
Member Author

#18268 has an implementation for this, though I don't expect this to go into the 1.25 release because I think it's likely that we'll want to do all changes to the range type (likely changing the boundedness enum; reconsidering the name / value of the stridability param) all at once. And because I don't have a vision as to how to make the change gradually.

@vasslitvinov
Copy link
Member

The idea of turning range into a generic type appeals to me a lot, primarily because it makes range and domain similar in this regard.

What I find missing is a handy way to specify the common-case concrete range type, i.e., today's range. This would be a range counterpart for domain(1). Towards that, I propose to provide a type alias / convenience:

  • srange (for "simple range") = an alias for today's concrete range
  • srange(idxType) = range(idxType, bounded, stridable=false)

Analogously, we can introduce xrange (for "extensible range"), which is the same as srange except with stridable=true:

  • xrange = alias for range(int, bounded, stridable=true)
  • xrange(idxType) = range(idxType, bounded, stridable=true)

... as well as b-range for a variety of fully-bounded ranges. //admittedly this is heading towards the weeds

@bradcray
Copy link
Member Author

srange

I'm not a fan of this, it looks "strange" to me (ha ha). Of course, users who liked/wanted type aliases like this could always create them themselves (or type functions).

@bradcray bradcray changed the title Exploring making range a non-defaulted generic type? Should range be a non-defaulted generic type? Sep 27, 2021
@vasslitvinov
Copy link
Member

srange

I'm not a fan of this, it looks "strange" to me (ha ha). Of course, users who liked/wanted type aliases like this could always create them themselves (or type functions).

I agree this is not the best name. However, I still believe it will make a difference if we give the users an easy-to-use alias for today's range. For example, we have int, even though we don't have to and the user could always create their own alias for int(64).

So, to me it comes down to picking a good name.

Or perhaps follow int's lead and define range to mean the simple range and domain to mean domain(1). Although I am unexcited about this and I know why: because the common use of the range is the simple range, whereas I doubt that the common use of the domain type is domain(1), at least outside of simple "I am learning Chapel" examples.

@bradcray
Copy link
Member Author

For example, we have int, even though we don't have to and the user could always create their own alias for int(64).

They could, but I'd argue that it's very common for one to declare var i: int; and also common for int in languages to have a default size. I also often compute on it from there, relying on its default value (e.g., having i += 1; as a potential next statement). In contrast, I think it's much rarer for someone to want a default-initialized range where they'll rely on that default value; and much more common for people to type : range and then be confused about why they can't assign stridable things into it. It's been awhile since I implemented #18268, but my rough memory was that there were not many existing codes that benefitted from range being fully defaulted.

Do you have a favorite pattern that demonstrates the utility of having range be fully defaulted?

@vasslitvinov
Copy link
Member

Do you have a favorite pattern that demonstrates the utility of having range be fully defaulted?

I don't think I can find such a pattern that stands out. Using it as the type of a formal, for example? Any use of fully-defaulted range in the code will support the proposal of providing a convenient way to write such a type.

This question though led me to the following experiment: how many times do we use range with parens vs. without parens in the code?

I looked at all our module code, as well as test/release/examples and test/studies. I ignored the other subdirectories of test as they are less likely to contain the kind of code that a user would write. I did my best to limit the counts only to the uses of range as a type and exclude comments, strings, and chpldocs. So (drum roll...):

modules test
range followed by parens 237 32
range no parens 28 22
proc range. ... 84 not detected

@bradcray
Copy link
Member Author

bradcray commented Jan 5, 2022

Using it as the type of a formal, for example? Any use of fully-defaulted range in the code will support the proposal of providing a convenient way to write such a type.

I'm not sure I agree with either of those statements. Apart from what you and I are used to from years of working with the language in its current state, it doesn't seem to me that a formal x: range is more likely to obviously mean "This argument must be a range of integers without a stride" than x: range(int) or x: range(stridable=false) (or x: range(stridability=unit)) would. And even when we find such uses in the code, it's not obvious to me that the user meant to constrain it in that way vs. meant for it to be general, but failed to realize that they'd overly constrained their formal.

This question though led me to the following experiment: how many times do we use range with parens vs. without parens in the code?

What do you conclude from this experiment? (I'm curious whether it matches my conclusions).

To me, the more interesting experiment is to look at which codes had to change in #18268 and to see whether any of them seem like a significant step backwards.

@vasslitvinov
Copy link
Member

I agree that a newbie can perceive range as a generic type, especially when domain is. BTW such a newbie may find it surprising that domain has stridable=false by default whereas range does not. Alas I don't know how to improve this aspect.

I also agree that in some cases today's code that says range may benefit from being changed to range(?).

What I am proposing is a shorthand for the simple range type, so I don't have to spell out range(stridable=false) when I mean it. To my intuition such use is common. My experiment supports this intuition (this is my conclusion, if you will). Although it depends on how you read it. Simple range is uncommon in the modules, which mostly describe generic functionality and so need to work with all range flavors. However simple range accounts for over a third of range uses in examples+studies, which is more typical for end-user code.

Looking at the changes in #18268, if we un-default stridable, I'd love that to be the first generic field so I can write simply range(false), if there is no better shorthand. It's like int(64) vs. int(numbits=64). No big deal, but makes the code more pleasant.

The great majority of the changes in #18268 replace range with range(stridable=false), most of the rest replace range(idxType) with range(idxType, stridable=false), or the same with int instead of idxType. I consider it a small step backwards in readability, having no impact on semantics.

Also, #18268 makes only ~6 changes in test/studies and no changes in test/release/examples. Which tells me that most of the 54 uses of range type that I counted in my experiment are in the formals' types, and those occurrences can be converted to the generic range just fine.

@bradcray
Copy link
Member Author

bradcray commented Jan 6, 2022

such a newbie may find it surprising that domain has stridable=false by default whereas range does not. Alas I don't know how to improve this aspect.

Is there any reason not to make domain the same as range in this regard? That is, if we make stridable not have a default for ranges, make it not have one for domains either? (other than it being a change)

What I am proposing is a shorthand for the simple range type, so I don't have to spell out range(stridable=false) when I mean it. To my intuition such use is common. My experiment supports this intuition (this is my conclusion, if you will). Although it depends on how you read it. Simple range is uncommon in the modules, which mostly describe generic functionality and so need to work with all range flavors. However simple range accounts for over a third of range uses in examples+studies, which is more typical for end-user code.

But are these cases using range because they are trying to assert "I want a non-stridable range of default-sized ints and nothing else!" or are they saying it as a constraint / documentation feature where other things will fill in the details? I.e., in the proposal here, I may say:

var r: range = 1..10;
var r: range(?) = 1..10;
var r = 1..10;
var r: range(stridable=false) = 1..10;

and effectively end up with the same thing. So it seems to me to only really be important in the case of a variable that does not have an initializer.

so I can write simply range(false)

I don't think this is particularly attractive in its lack of saying what the false does. If the stridability property became an enum, as proposed, then I think things are better for this form. But even if other fields were also generic, we could also support arbitrary orders through type functions or initializers or the like, presumably? (or we could just reorder the fields)

@vasslitvinov
Copy link
Member

vasslitvinov commented Jan 6, 2022

But are these cases using range because they are trying to assert "I want a non-stridable range of default-sized ints and nothing else!" or are they saying it as a constraint / documentation feature where other things will fill in the details?

I don't have a good sense of that. I just see that there is some number of cases in the code where the simple-range type is used as the type of a record field, array element, or a tuple component and there is no default value.

Another thought is that allowing range to have non-unit stride may result in latent bugs in existing code or code written by Chapel users who are used to the opposite. Consider a function with formal(s) of type range. Such functions, I conjecture, account for most uses of non-generic range in test/release and test/studies. When we switch range from concrete to generic, that function does not need to change because the now-generic type will be instantiated for each callsite, resulting in the same concrete code as before.

However, latent bugs come about when such a function assumes stride==1 and some future invocations will pass it range(s) with a non-unit stride. Is this a big deal?

Is there any reason not to make domain the same as range in this regard? That is, if we make stridable not have a default for ranges, make it not have one for domains either?

That will make a very good sense, theoretically. Also, applying this a change to our code base will be a good stress-test for how impactful it is.

@bradcray
Copy link
Member Author

Just noting that Vass's arguments in the previous comment would arguably form a rationale for adopting Michael's recent proposals that perhaps generic type constraints must make their genericity more visible in the source code via (?), say. I.e., #19120 or #19121.

@vasslitvinov
Copy link
Member

vasslitvinov commented Apr 15, 2022

Should 'range' become generic, I propose to introduce xrange as a predefined alias for today's 'range', i.e., the bounded unstridable range. "xrange" used to be an efficient range type in Python.

@bradcray
Copy link
Member Author

I find the name xrange non-Chapeltastic, and am not sure "efficient range type" is a good motivator to mimic since it implies that the normal range is inefficient / less efficient (when, in common cases, var x: range(?) = 1..10; would be no less efficient than var r: xrange = 1..10;. I'd counter-propose that if a user is suffering for lack of this alias, they can define it themselves quite easily and choose the name that appeals to them.

@vasslitvinov
Copy link
Member

vasslitvinov commented Apr 15, 2022

Just noting that Vass's arguments in the previous comment would arguably form a rationale for adopting Michael's recent proposals that perhaps generic type constraints must make their genericity more visible in the source code via (?), say. I.e., #19120 or #19121.

My interpretation: this is the other direction we could choose -- make both range and domain concrete unless indicated otherwise.

// Then we'd need to make regexp concrete as well for consistency, which some users preferred not to: #14693 (comment)

@bradcray
Copy link
Member Author

// Then we'd need to make regexp concrete as well for consistency

I'm not following this comment: What's the link between regex and domain/range? (i.e., how would a decision on domain/range affect the design of regex or any other type?)

make both range and domain concrete

Do you mean fully concrete, not generic at all? If so, how are you proposing we avoid throwing away performance w.r.t. 1D vs. nD domains, strided vs. unstrided arrays, etc.?

@vasslitvinov
Copy link
Member

What's the link between regex and domain/range?

The link is similarity. My preference that similar things behave in similar ways.

make both range and domain concrete

Yes, fully concrete. Just like today I can declare a variable of the type range, I'd be able to declare a variable of the type domain. Which would mean today's domain(1).

To avoid performance impact, the user would write domain(?) to mean today's domain.

@bradcray
Copy link
Member Author

The link is similarity. My preference that similar things behave in similar ways.

In what way is a range or domain similar to a regex?

Yes, fully concrete. Just like today I can declare a variable of the type range, I'd be able to declare a variable of the type domain. Which would mean today's domain(1).

To avoid performance impact, the user would write domain(?) to mean today's domain.

Oh, so you don't mean to make the range/domain types themselves concrete, but to make the interpretation of the keywords without modifiers or additional argument names concrete. That wasn't clear to me.

@bradcray
Copy link
Member Author

Here's a case today that I wrote, the compiler complained about, and then I rewrote, realizing that it was a case where a more generic range would've made it easier to write:

var r: range(bool) = ..;

@ghbrown
Copy link
Contributor

ghbrown commented Jan 10, 2023

BTW such a newbie may find it surprising that domain has stridable=false by default whereas range does not.

This specific fact surprised me today, and I've already been caught by and internalized the "domains are generic, ranges aren't" discussion. So I was expecting I could do

var d : domain(1) = {1..5 by 1};

but alas I hadn't internalized it as well as I thought.

Thankfully it seems there is agreement that domain with no default stridability would be nice (if I'm reading the thread correctly). Just putting my two cents in that same bucket for full genericity.

@vasslitvinov
Copy link
Member

We decided to keep the generic aspects of range as-is. See #17122 (comment)

This decision resolves this issue. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants