[RFC] Make the behavior of calling range primitives clear#4511
[RFC] Make the behavior of calling range primitives clear#4511dnadlinger merged 1 commit intodlang:masterfrom
Conversation
|
LGTM. |
|
I'm flagging @andralex on this, since it's an additional design constraint to a core piece of the library. |
std/range/primitives.d
Outdated
| is allowed only if calling `r.empty` has, or would have, | ||
| returned `false`.) | ||
| $(LI Calling `r.front` multiple times without calling | ||
| `r.popFront` yields the same result for each call.) |
There was a problem hiding this comment.
just a formatting nitpick: it seems that before every method/property has had it's own point. That seems reasonable, Can we keep this?
|
@wilzbach fixed |
|
I'm all for clarifying the semantics (in fact, I think these details were a major oversight in the range design, although I'm aware Andrei disagrees). However, I think the currently proposed wording still leaves too many uncertainties – what does "yield the same result" mean? The same return value? Are side-effects evaluated multiple times? |
Where have you seen that? If he disagrees, we probably should just close this now and move on, because it won't happen.
It means that when I call How this is achieved is not relevant or necessary to specify. You can re-calculate, you can use a cache, you can have whatever side effects you need to do, you can do whatever you want. Just make sure that the range's |
|
Perhaps I should change the wording to " as calling the set version of front will change the value of front. |
Actually, your rewording still doesn't imply that calling set version of front is verboten. Another strawman: "front should return or refer to the same element until popFront is called". Note that empty shouldn't advance the range either. |
Oh, what I was thinking of is not related to this PR in particular or indeed any other specific change proposal. In one of the more epic range semantics discussions – I think it was one related to transient ranges, maybe to byLine –, Andrei rather nonchalantly dismissed the possibility that these correctness considerations would have an impact on the basic design (e.g. splitting
Okay, so let's say |
Yes, they should. If it has side effects, that is up to the range author whether it's important. The range API isn't concerned with side effects, and neither should the user of the range be concerned.
I don't think generic code should care. Performance of getting the front element is not technically part of the range definition, but I think everyone will expect the same performance as an array, where front is pretty cheap. If you find your peformance is lacking, then profile and optimize. If this means you make a copy, then you make a copy. Even with an array, continually indexing the first element is going to cost more than making a copy to go in a register, I've made improvements to code by doing just that. |
|
I've thought about this a little, and realized that it's possible to interconvert between the current range API ( Perhaps what we need is the generator-to-range construct wrapped in a convenient Phobos function, so that it's easy to generate ranges from generators. |
|
And by "generators" I include everything that might be problematic based on the updated range API proposed in this PR, such as |
|
Yes, a generator can be turned into a range quite easily. But I think we don't need a new concept to do that. A generator is really just a delegate (the "empty" method is mostly useless). |
|
@quickfur |
|
With |
The author of which range?
My claim: What you are saying here is that the range API is a leaky abstraction by design.
You are right, generic code (i.e. every single range algorithm) shouldn't have to care. But for that to be possible, it needs to be clear what the code can and cannot (or should not) expect. For example, if I grant you that people will expect |
The author of the range that is creating the side effects.
Leaky how? I can write code that has side effects anywhere, no design can stop that. Unless you want to make all ranges pure?
map should not concern itself with preventing unexpected behavior from someone who creates such a range. Indeed, I have shown how to create an unexpected range with map as shown on the forum post linked from this PR that has nothing to do with caching. The thing is, caching does not affect the API, just the performance. And premature optimization is not usually desired. It has its own effects (for example, if the underlying data changes through other means, the cached copy is no longer valid). If you want to cache with map, use
I think it's more important to identify the aspects of performance/behavior in the documentation of the higher-order range. Do whatever the minimum is to get it to be a valid range, then allow the user to build on top of that if necessary. The reality is, as long as we create algorithms that use arbitrary function calls to build the data, we cannot be certain that any range construction has well-behaved semantics. There is a certain degree of responsibility that the function author has to assume, map (and the compiler) can only do so much. |
Leaky as in: If the design doesn't specify how side-effect/number of evaluation concerns are to be handled, then I need to crack open the abstraction and review all the source code individually to figure out whether my client code is correct or not.
The question is not whether you can or not, but whether you should be allowed to in "well-formed" range code, i.e. whether you can then expect those side effects to be evaluated in a certain way.
This is somewhat of a distraction from my main point, but I disagree. If the user writes
If you want this to become is the official stance (which is reasonable enough), then let's document it that way: Ranges must expect their
Agreed. My point is that when I write a range algorithm, which consume a range but also offers a range interface itself, I can only assume responsibility if I know what guarantees/characteristics client code will expect, and I can in turn expect from the range I'm passed. |
|
@klickverbot It seems your reasoning comes down to -- can we force algorithms (either by mechanical verification or convention) to require calling front a certain number of times? The answer is no. Any range author and/or function author for range-using code can call front as many times as they want. Does this mean ranges can't have side effects? I don't think it does. But the side effects are the responsibility of the caller, not the algorithm that's calling front N times. Yes, it means that if you are using a function that creates those side effects via calls to front/popFront/empty, then you have to examine the code to see what exactly is going to happen. In essence, you are relying on an implementation detail for correct code. To put it another way -- I don't think it's invalid to have I don't think we need a specific note on map, just a general rule concerning callable parameters and side effects. |
|
Walter agrees with the essence of the rule: https://forum.dlang.org/post/nl4b8h$2jds$1@digitalmars.com |
|
@schveiguy: That's a fair summary, although I'm not just concerned with strict validity conditions, but also "recommendations" for usage, or best practices if you will. For example, a summary of your position could be: Try to implement algorithms with the minimum number of calls to range primitives they naturally require, but don't do any caching internally to reduce the number of You can definitely make the claim that there is a qualitative difference between that and the minimal statement that this PR is concerned with. My point is that writing down (and ideally formalising) these higher-level considerations apart from just "is (not) allowed" is essential to making ranges work in a composable and predictable way. Just specifying that Regarding map, I don't see how there could be a general rule for "callable parameters". The semantics of a callable template argument, and consequently also the user expectations, depend of course on what the algorithm in question is. For example, it seems reasonable for a (In case this isn't obvious: I'm not against this PR by itself, since it seems the most solid design from many perspectives, and matches common expectations. But while people are actively thinking and debating about this, it might also make sense to tackle the bigger issue at hand.) |
|
Added Walter's rules and make the docs a little clearer. |
|
Since we're all here, the rules currently allow |
std/range/primitives.d
Outdated
| available in the range.) | ||
| $(LI `r.empty` called multiple times, without calling | ||
| `r.popFront`, or otherwise mutating the range object, | ||
| yields the same result for every call.) |
There was a problem hiding this comment.
I don't like the term "calling", because it implies that empty must be a function. That's not necessarily the case, since it can be a member variable of the range that gets updated by popFront. Maybe "evaluated" is a better word?
Also, multiple evaluations of empty without calling popFront should not cause the value of front to change. Or is this already covered by the following rules?
There was a problem hiding this comment.
Also, multiple evaluations of empty without calling popFront should not cause the value of front to change. Or is this already covered by the following rules?
See above comment
|
@quickfur fixed |
std/range/primitives.d
Outdated
| $(LI `r.front` returns the current element in the range. | ||
| It may return by value or by reference.) | ||
| $(LI `r.front` can be legally called iff calling | ||
| `r.empty` has, or would have, returned `false`.) |
There was a problem hiding this comment.
Hmm. What about s/call/evaluate/? Since front technically could be a member variable updated by popFront, not necessarily a function.
|
@quickfur Sorry, missed that. Fixed. |
|
Thanks! LGTM once @andralex 's comments are addressed. |
I think that it's pretty clear that all of the range primitives should be O(1) with regards to the number of elements in the range. Exactly what that amounts to in terms of how expensive the operations end up being within O(1) is very much implementation-dependent and not really a concern of the range API IMHO, though it should be expected that
I don't see any reason to allow And as for concerns about what returning the same resultmeans, if we want to specify it more concretely, I think that the requirement should be that the same result means equality. There's too much useful code that does stuff like allocate - e.g. Overall, I think that this looks good. |
Should this be added too? Something like, "Phobos calls range primitives with the assumption that they are O(1) with respect to the number of elements in the range. Therefore, it's best practice to restrict the range primitives you do provide to those that are O(1). For example, it's not a good idea to implement a singly linked list as a bidirectional range, as the back and popBack attributes would be O(n), and they would be called many times in Phobos code if they are present."
I will clarify this. |
|
I think BTW, typically "sub-linear" is acceptable for fast operations. |
|
I think it's clear that any generic algorithm that takes arbitrary code as parameter (e.g. a lambda, a range whose range API implementation comes from the user) will have a complexity that's keyed on the complexity of the provided code/implementation. E.g., Similarly, But it seems unreasonably onerous to require the docs to specify the complexity as "O(f(n) log f(n)) where f(n) is the complexity of parameter X"; we usually assume that range primitives, lambdas, etc., are "cheap" in the sense that we can make the possibly-not-so-accurate assumption that they are O(1) and still derive an overall big-O complexity that gives a good idea of the actual complexity. If the user breaks the O(1) assumption, it's their own problem to deal with the resulting change in complexity; Phobos should simply state that the given big-O complexities are based on the assumption that certain things are O(1), so if your range doesn't obey that, then you're on your own to figure out what the actual complexity will be. I don't think we should outright reject such a range as a non-range. |
|
@JackStouffer I didn't say I agreed with everything @andralex said, just that, being the official "head honcho" of Phobos, his comments do need to be addressed one way or another (either concede with him and implement what he proposed, or win the argument and get his agreement on doing it another way) before we merge. And for the record, I agree that we should use "if and only if" instead of "iff", even though personally I actually prefer the latter. For public-facing docs using the full phrase doesn't hurt and can only help reduce the likelihood of alienating would-be readers who may or may not know what "iff" means. It's not as though people who understand "iff" wouldn't understand "if and only if", so there is no harm in catering to the wider audience. |
|
Actually, upon reflection, Regardless, it's certainly not the case that a range that has ridiculously expensive range operations isn't a range. It's just a horribly performing one that will violate the complexity assumptions made by any generic, range-based algorithm. It will work perfectly soundly with them but with performance that is worse than those algorithms are supposed to have. |
|
Performance is an important part of the range elevator pitch, and complexity is mentioned elsewhere in Phobos. I think I should add a small note, outside of the rules section, about Phobos' assumptions about complexity so it's clear. |
|
@jmdavis In principle, I agree. Creating a range with expensive |
|
Addressed comments |
std/range/primitives.d
Outdated
| ) | ||
|
|
||
| Also, note that Phobos code assumes that the primitives `r.front` and | ||
| `r.empty` are $(BIGOH 1) complexity wise or "cheap" in terms of running |
There was a problem hiding this comment.
O(1) doesn't make much sense without stating what the considered parameters are (length of range, ...). "Complexity" can also refer to more than time complexity (space, …).
|
@klickverbot Done I think this is ready to merge |
|
Auto-merge toggled on |
|
The language, particularly regarding the time complexity statements, could still be tightened down a bit, but this can be done separately. |
Based on this post, many core devs were in agreement that one basic rule of ranges is that many calls to
frontwithout a call topopFrontshould all yield the same value. Some ranges break this rule.This PR formalizes this rule as to make it crystal clear that these violations are bugs that must be fixed:
Ping @schveiguy @jmdavis