Added std.range.orElse which provides a default value for empty ranges #5154

wilzbach · 2017-02-19T20:29:25Z

As mentioned on #5153 if an API method could potentially return an "undefined response (aka null), there are a couple of strategies:

Method	Uses in Phobos
return Nullable	no precedent in Phobos (?)
use `enforce` (aka exceptions)	generally disliked, because exceptions allocate (and everybody wants to be `@nogc` nowadays
add an overload with a `seed`	`std.algorithm.iteration.maxElement`

As the third way seems to be the current strategy (e.g. fold, reduce, ...), it creates a lot of "redundant" code. Here I propose to add a seed(<range>, <defaultValue>) method that can be combined with any range method to solve this problem. The gist is very simple:

import std.range : iota;
assert(0.iota(10).seed(42).front == 0);
assert(0.iota.seed(42).front == 42);

The simplest way to add such a orElse method would be to combine choose and only like this:

auto orElse(Range, RangeElementType = ElementType!Range)(Range r, RangeElementType fallback)
if (isInputRange!(Unqual!(Range)) &&
    !is(CommonType!(ElementType!Range, RangeElementType) == void))
{
    import std.range : choose, only;
    return choose(!r.empty, r, only(seed));
}

However this has a couple of problems:

no custom type - e.g. static if (!hasDefaultValue!Range) enforce(!r.empty)
choose doesn't work in CTFE (#14660)
not fully @safe (it uses union and void[])
doesn't work with ranges of const elements
hasn't all opSlice overloads (e.g. InfiniteRanges)
no toString for user convenience as that's generic
would require a custom only that allows assigns, otherwise hasAssignableElements and hasSwappableElements won't work.

While most of these points are solvable with e.g. template mixins, I am not sure about all of them and hence this PR contains its an implementation of a OrElseRange.

Destroy this idea!

edit: renamed from seed to orElse

pineapplemachine · 2017-02-19T22:00:12Z

Please note that this comment is to continue a discussion here.

You're confusing a seed with a fallback. Seed is a term used when referring reduction functions, e.g. max, sum, fold, reduce. The seed is the initial value of an accumulator. Technically, you can seed a reduction function by unconditionally prepending to the sequence of inputs - and this is important - but not by only prepending when the input is empty, as this implementation does.

Furthermore, though technically walkBack could be expressed as a reduction HOF, and this seed function would technically achieve your goal if the prepending was unconditional, and prepending to the input isn't a totally unreasonable approach to seeding a reduction function, it's important to understand that it's not an apt way to perceive what's going on.

A fallback is a default value returned when an operation received an input that would otherwise cause it to fail, as opposed to throwing an exception or (e.g. when the assert in walkBack is omitted) silently resulting in undefined behavior. Probably the best analogy is the associativearray.get(key, fallback) method. Of course, the fallback is returned when no such key exists in the array. The fallback argument is not a seed in this case, and could not possibly be implemented as one.

My mach library implements a function similar to walkBack, accepting a fallback argument. It also defines a walkindex function, similar in purpose to walkLength and walkBack, which takes a range and an index. As you'd expect, it traverses the range until it hits that index, and it returns the element there. It optionally accepts a fallback. When the range is shorter than the given index, the fallback is returned rather than a fatal error being produced. This fallback also is not a seed, and could not possibly be implemented as one.

However - this isn't necessarily an argument against including a well-implemented function serving this purpose. (Though I don't think seed is an appropriate name for what it does) Rather, this is specifically an argument against using a function such as this one rather than simply having an optional fallback argument for walkBack.

DmitryOlshansky · 2017-02-20T14:45:52Z

The gist is very simple:

import std.range : iota;
assert(0.iota(10).seed(42).front == 0);
assert(0.iota.seed(42).front == 42);

This cries to me to call it "fallback" or "orElse" because that is what it does. "seed" doesn't tell me anything - all I can think of is random number generators.

wilzbach · 2017-02-21T03:45:06Z

This cries to me to call it "fallback" or "orElse" because that is what it does.

I really like the name orElse!
(renamed and moved to std.range)

DmitryOlshansky · 2017-02-21T16:08:05Z

I really like the name orElse!

Shamelessly stolen from Scala Option ;)
Now imagine if we make Nullable follow range interface this all can tick together very nicely.

9il · 2017-02-27T05:47:40Z

Phobos-idiomatic D code is going to be a language for the initiated. We need to think about rite of Passage

9il · 2017-02-27T05:57:33Z

Phobos-idiomatic D code is going to be a language for the initiated. We need to think about rite of Passage

Something like that: a candidate should repent about all accidents with Go. Then he should bring bones of three Scala engineers and read Stroustrup: The C++ Programming Language all night without rest. Then we will exam this persons about Phobos internals

andralex

This is a terrific idea. I preapprove the name addition. Thanks!

andralex · 2017-02-28T18:47:37Z

std/range/package.d

+Returns: if the range is non-empty the original range and otherwise a range of length
+with the fallback.
+*/
+auto orElse(Range, RangeElementType = ElementType!Range)(Range r, RangeElementType fallback)


Why is there a need for a default in the second template parameter?

andralex · 2017-02-28T18:51:47Z

std/range/package.d

+{
+    static struct OrElseRange
+    {
+        private


I'm edging close to preferring individual private with declarations because:

labels (private:) affect things at distance and make reading and reviewing more complicated

scopes (like here) force an extra indent level

refactoring/blame is by far best with individual labels

not a matter of scaling (number of symbols is tractable)

andralex · 2017-02-28T18:52:37Z

std/range/package.d

+    {
+        private
+        {
+            Range r;


Traditional name in decorating ranges: source.

andralex · 2017-02-28T19:01:51Z

std/range/package.d

+        }
+    }
+
+    return OrElseRange(r, fallback);


I suggest a dramatically simpler and smaller implementation. Use r.chain(fallback) as the range type, and pair it with a bool that tells you whether you need to output the last element.

Sadly we don't have a primitive akin to Unix head (http://www.computerhope.com/unix/uhead.htm), That utility, if passed a negative number, outputs lines except a last few. If we did have that, the implementation would be a one-liner:

return r.chain(only(fallback)).head(r.empty ? -1 : ssize_t.max);

Anyhow, using chain should simplify matters quite a bit.

andralex · 2017-02-28T19:12:16Z

One more thing: orElse should also accept a full range that is returned if the first range is empty.

wilzbach · 2017-03-05T06:18:58Z

From the NG:

Yep. Additions to std.range like orElse will make idiomatic Phobos code slower then C++ and Scala. It is not clear when you do a benchmark for single implementation, but idiomatic combinations of D Ranges will became slow.

To bring this a bit into context: mir-algorithm uses C++-like iterators to avoid this - an example:

IotaIterator!int iota;
RetroIterator!(IotaIterator!int) retro;

++iota;
--retro;
assert(*retro == *iota);

http://docs.algorithm.dlang.io/latest/mir_ndslice_iterator.html

JackStouffer · 2017-03-05T07:35:31Z

std/range/package.d

+
+        auto opSlice() { return this; }
+
+        string toString()


This is odd. Custom toString definitions aren't the norm in phobos because most printing functions recognize ranges

9il · 2017-03-05T07:56:47Z

To bring this a bit into context: mir-algorithm uses C++-like iterators to avoid this - an example:

The main orElse performance problem is that D does not have virtual function calls for structs like C++ does. With virtual cals we can avoid if conditions.
chain performance problem is if condition too. It can be solved by introducing a generic analog of opApply. It will affect basic Dlang iteration idioms like chain(...).until, chain(...).array and chain(...).filter.
bitwise problem is that it is implementation for random access range and input range APIs is the same. 2 separate implementations would be faster, few times smaller and clearer (plus less bugs). Recently bitpack (for few bits integers) was added into mir-algorithm. It was very easy to add and to test it. Both bitpack and bitwise require only two methods to implement: opIndex and opIndexAssign. That is all, 31 LOC and 42 LOC respectively. A type with opIndex is called field, on top of a field a random access iterator (in terms of C++) can be created using FieldIterator wrapper. On top of any random access iterator (including pointers) can be created a Slice type that implements all random access range primitives including multidimensional features. This approach simplifies multiple times random access range implementations. For already implemented features please see ndslice scheme. Please note that input range API for bitwise is buggy because random number generators should have special this(this) implementation. As input range bitwise alternative for RNGs Bernoulli2Variable can be used.

andralex · 2017-03-05T12:53:16Z

The main orElse performance problem is that D does not have virtual function calls for structs like C++ does. With virtual cals we can avoid if conditions.

I find this claim surprising. Replacing a virtual call having 2-7 possible landing sites with a switch or cascade of if/else statements is a well known and venerable optimization, which to my knowledge has been first documented by a few seminal papers in the 90s, notably:

"Profile-Guided Receiver Class Prediction" by Grove et al (http://www.research.ibm.com/people/d/dgrove/papers/oopsla95.pdf)
"Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis" by Dean et al (http://web.cs.ucla.edu/~palsberg/tba/papers/dean-grove-chambers-ecoop95.pdf)

The optimization called "guarded devirtualization" or "type casing" is commonly taught in graduate classes in compiler design and replaces essentially code that you claim is fast, with code that you claim is slow. The win comes from replacing indirect calls (not inlinable and difficult to speculate) with branches and regular calls (easy to inline and speculate).

Of course, this technique, although considered canon, is of venerable age and may have become stale due to recent advances in CPU technology. However the entire HHVM team has used it in recent times systematically in their JIT PHP compiler implemented in C++. Replacing virtual calls with typecased switch statements has always been fruitful, and can be seen everywhere in the code. (I also found an easy to digest recent independent benchmark: http://dimitri-christodoulou.blogspot.com/2011/09/case-for-replacing-polymorphism-with.html.)

For chain and orElse in particular, we're looking at a branch where speculation may fail only once or (respectively) never. Replacing these with indirect calls and obtaining a performance improvement would be a very remarkable result. I'd be in your debt if you could produce such an experiment.

Regarding bitwise, any improvements would be welcome! I recall you submitted some and/or promised more. Please advise. Thanks!

andralex · 2017-03-05T13:14:14Z

std/range/package.d

+            else
+                alias UT = RetType;
+
+            bool hasFallback; // if the default fallback should be used


Better name: useFallback

9il · 2017-03-06T12:45:12Z

The main orElse performance problem is that D does not have virtual function calls for structs like C++ does. With virtual cals we can avoid if conditions.
I find this claim surprising. Replacing a virtual call having 2-7 possible landing sites with a switch or cascade of if/else statements is a well known and venerable optimization, which to my knowledge has been first documented by a few seminal papers in the 90s, notably:

"Profile-Guided Receiver Class Prediction" by Grove et al (http://www.research.ibm.com/people/d/dgrove/papers/oopsla95.pdf)
"Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis" by Dean et al (http://web.cs.ucla.edu/~palsberg/tba/papers/dean-grove-chambers-ecoop95.pdf)
The optimization called "guarded devirtualization" or "type casing" is commonly taught in graduate classes in compiler design and replaces essentially code that you claim is fast, with code that you claim is slow. The win comes from replacing indirect calls (not inlinable and difficult to speculate) with branches and regular calls (easy to inline and speculate).

Of course, this technique, although considered canon, is of venerable age and may have become stale due to recent advances in CPU technology. However the entire HHVM team has used it in recent times systematically in their JIT PHP compiler implemented in C++. Replacing virtual calls with typecased switch statements has always been fruitful, and can be seen everywhere in the code. (I also found an easy to digest recent independent benchmark: http://dimitri-christodoulou.blogspot.com/2011/09/case-for-replacing-polymorphism-with.html.)

For simple benchmarks orElse maybe faster then vtable. But it ads if conditions while vtable just replace pointers. For example complex pipeline based on chain, choose, chooseAmong and orElse will became slower after each pipeline addition. In other hand vtables plus special iteration patter for chain allow to generate fast and tiny code for any kind of function pipeline.

The advantage of devirtualisation is that it is done by optimiser after virtual tables were constructed.

For chain and orElse in particular, we're looking at a branch where speculation may fail only once or (respectively) never. Replacing these with indirect calls and obtaining a performance improvement would be a very remarkable result. I'd be in your debt if you could produce such an experiment.

stack function was added to mir-algorithm today. It is multidimensional chain analog for ndslice.

import mir.ndslice.allocation: slice;
import mir.ndslice.topology: iota;

// 0, 1, 2
// 3, 4, 5
auto a = iota(2, 3);
// 0, 1
// 2, 3
auto b = iota(2, 2);
// 0, 1, 2, 3, 4
auto c = iota(1, 5);

// 0, 1, 2, | 0, 1
// 3, 4, 5, | 2, 3
// ---------------
// 0, 1, 2,   3, 4
// construction phase
auto s = stack(stack!1(a, b), c);

// allocation phase
auto d = s.slice;
assert(d == [
    [0, 1, 2, 0, 1],
    [3, 4, 5, 2, 3],
    [0, 1, 2, 3, 4],
    ]);

To work with stack special until function was added. Its idea is similar to opApply concept. If a predicate always returns false then until just iterates all elements in a stack. stack + until does not require additional conditions to iterate all elements comparing with chain + Range API.

The simplest performance optimisations that can be added to Phobos are array, copy and put specialisations for chain.

Regarding bitwise, any improvements would be welcome! I recall you submitted some and/or promised more. Please advise. Thanks!

I think that we should move forward with single random access range declaration (ndslice) and have separate implementation for random access ranges and non-random access ranges. I propose to freeze Phobos additions and add only improvements and bugfixes. Other my thoughts about Phobos you already know :-)

andralex · 2017-03-06T13:20:47Z

For simple benchmarks orElse maybe faster then vtable. But it ads if conditions while vtable just replace pointers. For example complex pipeline based on chain, choose, chooseAmong and orElse will became slower after each pipeline addition. In other hand vtables plus special iteration patter for chain allow to generate fast and tiny code for any kind of function pipeline.

That doesn't sound right. Composition with virtual functions (Java-style) or indirect calls (Haskell style) adds one extra indirect call per composition, which are notoriously difficult to optimize.

Do you have any evidence to support the claim that you can compose freely with virtuals and have them all gone by the time the code is run?

To work with stack special until function was added. Its idea is similar to opApply concept. If a predicate always returns false then until just iterates all elements in a stack. stack + until does not require additional conditions to iterate all elements comparing with chain + Range API.

But that's a different design than what you discussed previously.

The simplest performance optimisations that can be added to Phobos are array, copy and put specialisations for chain.

Yah, range methods that specialize certain generic functions would be good to add. I recall you did have a pull request to that effect, what came of it?

9il · 2017-03-06T13:34:36Z

That doesn't sound right. Composition with virtual functions (Java-style) or indirect calls (Haskell style) adds one extra indirect call per composition, which are notoriously difficult to optimize.

I mean D interface style. choose will just return one of its arguments for example, so it will add nothing.

Do you have any evidence to support the claim that you can compose freely with virtuals and have them all gone by the time the code is run?

Nope, we can not remove all virtual calls without advanced JIT, but we can reduce pipeline of virtual calls to single one in expression likechoose(cond1, a, b).orElse(c).chooseAmong(cond2, d, e, f). The result will be just one of a, b, c, d, e, or f. But all ranges should be interfaces/classes (in D).

I recall you did have a pull request to that effect, what came of it?

A PR was for map specialisation. I implemented our ideas for ndslice iterators and fields.

9il · 2017-03-06T13:36:37Z

To work with stack special until function was added. Its idea is similar to opApply concept. If a predicate always returns false then until just iterates all elements in a stack. stack + until does not require additional conditions to iterate all elements comparing with chain + Range API.

But that's a different design than what you discussed previously.

Citation:
chain performance problem is if condition too. It can be solved by introducing a generic analog of opApply

andralex · 2017-03-08T13:12:24Z

@9il yah the relative tradeoffs of internal and external iteration are well trodden ground. A good read: http://gafter.blogspot.com/2007/07/internal-versus-external-iterators.html (and of course the GoF book).

@wilzbach phobos uses external iteration, but you'd do good here to improve some cases by having orElse implement each like this:

void each(alias fun)()
{
     if (useElse) fun(other);
     else source.each!fun;
}

This composes properly and makes for a simple pass through everything. Thx!

andralex · 2017-03-18T07:22:38Z

ping @wilzbach

trikko · 2017-03-23T15:20:21Z

I wonder whether or not a coalesce function would be more useful.

andralex · 2017-03-23T20:33:13Z

@trikko what would be the semantics of coalesce? thx!

trikko · 2017-03-23T20:40:41Z

Just like in SQL. coalesce(a, b, c, d, ...) select the first non-null. I would select a.front if a is a range or only(a).front if not.

dnadlinger · 2017-03-23T20:45:29Z

Isn't this rather like chain?

andralex · 2017-03-23T20:46:43Z

@klickverbot it's a conditional chain

dnadlinger · 2017-03-23T20:51:18Z

More like chain(…).front with automatic application of only I suppose, yes. (I was referring to the more general coalesce; trying to figure out where the design sweet spot in terms of simplicity versus functionality lies.)

trikko · 2017-03-23T21:02:35Z

Isn't coalesce just a orelse with more than two args?

aliak00 · 2018-01-01T23:03:59Z

I'm certainly no expert in D, but is there too much wrong with the following implementation of orElse?:

auto orElse(Range, T)(Range r, T defaultValue) if (isInputRange!Range && is(T : ElementType!Range)) {
  return r.empty ? r.chain([defaultValue]) : r.chain((T[]).init);
}

But you lose a few things so maybe that's not a viable alternative. From the unittests you defined, the following fails:

slicing doesn't work on the returned orElse range, so your tests on infinite ranges fails with Error: no [] operator overload for type Result
Your hasSwappableElements, hasLvalueElements, and hasAssignableElements fail
The @nogc tests fail.

So chain doesn't seem to give you a range that's similar to what you put in. The docs say "If all input ranges offer random access and length, Chain offers them as well."

Maybe an alternative is to have orElse implemented with chain, and "fix" chain to offer opSlice if all input ranges have slicing, etc? Also in the implementation of chain I see an attempt to handle hasSwappableElements but also a comment that says @@@BUG@@@ (not sure what the bug is though)

PS: first post here for me in D's github, so I claim the right to ignorance on D's internals and open source directions incase the above is just way too silly ;)

Edit:

Wait or isn't this just: choose(r.empty, [defaultValue], r) ?

dlang-bot · 2018-04-06T15:03:00Z

Thanks for your pull request, @wilzbach!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#5154"

LightBender · 2024-10-26T23:18:56Z

PR is ancient. Closing as stalled.

wilzbach added the Review:Needs Decision label Feb 19, 2017

wilzbach mentioned this pull request Feb 19, 2017

Fix 12564, 14493: add std.range.walkBack to get the last element of an input range #5153

Closed

wilzbach changed the title ~~Added std.range.seed which provides a default value for empty ranges~~ Added std.range.orElse which provides a default value for empty ranges Feb 21, 2017

DmitryOlshansky added the @andralex label Feb 21, 2017

wilzbach force-pushed the add-std-range-only branch from 8168971 to a1e700f Compare February 22, 2017 04:50

andralex requested changes Feb 28, 2017

View reviewed changes

wilzbach removed the @andralex label Mar 1, 2017

wilzbach added this to the 2.075.0 milestone Mar 4, 2017

JackStouffer reviewed Mar 5, 2017

View reviewed changes

andralex reviewed Mar 5, 2017

View reviewed changes

std/range/package.d

else

alias UT = RetType;

bool hasFallback; // if the default fallback should be used

Copy link

Member

andralex Mar 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better name: useFallback

andralex removed the Review:Needs Decision label Mar 18, 2017

dlang-bot added Merge:Needs Rebase Review:Needs Work Merge:stalled and removed Merge:stalled labels Jun 22, 2017

MartinNowak removed this from the 2.075.0 milestone Jun 25, 2017

wilzbach mentioned this pull request Jul 8, 2017

[RFC] Make Nullable @optional by default vibe-d/vibe.d#1827

Closed

dlang-bot added Merge:stalled Merge:Needs Rebase and removed Merge:Needs Rebase labels Dec 29, 2017

dlang-bot removed the Merge:stalled label Jan 2, 2018

wilzbach mentioned this pull request Jan 16, 2018

Fix issue 17440: Nullable!Class should redirect to Nullable!(Class,null) #6038

Closed

wilzbach mentioned this pull request Mar 25, 2018

Allow custom package logos & add OpenGraph dlang/dub-registry#286

Merged

aliak00 mentioned this pull request Mar 25, 2018

How to avoid the (x.empty ? x.unwrap : someDefault) antipattern? aliak00/optional#11

Closed

Added std.range.orElse which provides a default value for empty ranges

01670f5

wilzbach force-pushed the add-std-range-only branch from a1e700f to 01670f5 Compare April 6, 2018 15:02

wilzbach requested a review from PetarKirov as a code owner April 6, 2018 15:02

dlang-bot removed Merge:Needs Rebase Review:Needs Work labels Apr 6, 2018

dlang-bot added Merge:Needs Rebase Review:Needs Work Merge:stalled labels May 27, 2021

LightBender added the Review:Phantom Zone Has value/information for future work, but closed for now label Oct 26, 2024

LightBender closed this Oct 26, 2024

Uh oh!

Added std.range.orElse which provides a default value for empty ranges #5154

Added std.range.orElse which provides a default value for empty ranges #5154

Uh oh!

Conversation

wilzbach commented Feb 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pineapplemachine commented Feb 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DmitryOlshansky commented Feb 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilzbach commented Feb 21, 2017

Uh oh!

DmitryOlshansky commented Feb 21, 2017

Uh oh!

9il commented Feb 27, 2017

Uh oh!

9il commented Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex left a comment

Choose a reason for hiding this comment

Uh oh!

andralex Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

andralex Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

andralex Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

andralex Feb 28, 2017

Choose a reason for hiding this comment

Uh oh!

andralex commented Feb 28, 2017

Uh oh!

wilzbach commented Mar 5, 2017

Uh oh!

JackStouffer Mar 5, 2017

Choose a reason for hiding this comment

Uh oh!

9il commented Mar 5, 2017

Uh oh!

andralex commented Mar 5, 2017

Uh oh!

andralex Mar 5, 2017

Choose a reason for hiding this comment

Uh oh!

9il commented Mar 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex commented Mar 6, 2017

Uh oh!

9il commented Mar 6, 2017

Uh oh!

9il commented Mar 6, 2017

Uh oh!

andralex commented Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex commented Mar 18, 2017

Uh oh!

trikko commented Mar 23, 2017

Uh oh!

andralex commented Mar 23, 2017

Uh oh!

trikko commented Mar 23, 2017

Uh oh!

dnadlinger commented Mar 23, 2017

Uh oh!

andralex commented Mar 23, 2017

Uh oh!

dnadlinger commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trikko commented Mar 23, 2017

wilzbach commented Feb 19, 2017 •

edited

Loading

pineapplemachine commented Feb 19, 2017 •

edited

Loading

DmitryOlshansky commented Feb 20, 2017 •

edited

Loading

9il commented Feb 27, 2017 •

edited

Loading

9il commented Mar 6, 2017 •

edited

Loading

andralex commented Mar 8, 2017 •

edited

Loading

dnadlinger commented Mar 23, 2017 •

edited

Loading

aliak00 commented Jan 1, 2018 •

edited

Loading