Extend std.range.chunks to work with non-forward input ranges.#5624
Extend std.range.chunks to work with non-forward input ranges.#5624dlang-bot merged 2 commits intodlang:masterfrom
Conversation
std/range/package.d
Outdated
| private size_t curSizeLeft; | ||
| bool empty; | ||
|
|
||
| this(Source _r, size_t _chunkSize) |
There was a problem hiding this comment.
I think these parameter are publicly exposed, so I think it's better to use this.x = x below (even if it's a bit uglier).
There was a problem hiding this comment.
No longer applicable in the new code.
std/range/package.d
Outdated
| private Source r; | ||
| private size_t chunkSize; | ||
| private size_t curSizeLeft; | ||
| bool empty; |
There was a problem hiding this comment.
This is writable by the user -> should be private and wrapped .
std/range/package.d
Outdated
| { | ||
| private Chunks* impl; | ||
|
|
||
| @property bool empty() { return impl.curSizeLeft == 0 || impl.r.empty; } |
There was a problem hiding this comment.
Could be const if impl.r.empty is const. This is typically checked with sth. like:
static if (is(typeof((cast(const typeof(impl.r))impl.r).empty)))
but it's not very clean and thus not done at many places in Phobos ...
There was a problem hiding this comment.
Doesn't dmd infer attributes on these methods, since the outer struct is a template?
std/range/package.d
Outdated
| empty = r.empty; | ||
| } | ||
|
|
||
| @property Chunk front() { return Chunk(&this); } |
There was a problem hiding this comment.
I'm a bit worried about this. Won't the wrapping Chunks get deconstructed when you leave the scope, e.g. by returning its front?
(we should at least test this)
There was a problem hiding this comment.
Haha, just saw this comment... but this is already causing a breakage when compiled with -dip25 -dip1000 because the new isInputRange tests front with typeof((R r) => r.front), which will escape the reference to this, and causes it to reject this as being a range.
I suppose I should turn this into a class rather than a struct... in the original non-Phobos version of this in my own code, the wrapping Chunks is always allocated on the heap. The problem with -dip25 -dip1000 still exists, though.
There was a problem hiding this comment.
OK, just reworked the code a bit to move the implementation bits into a heap-allocated struct, so that the outer Chunks is just a wrapper around a pointer to it. This makes it cleaner and also fixes the problems with -dip25 -dip1000.
Btw two simple things that we could do to improve the status quo:
Opinions? |
|
Sounds like a good idea, kill off the autoindex (which is very ugly anyway) and undocument the wrapper structs. Don't make them private just yet, to avoid breaking any existing code that may actually refer to their names. But hide them from the docs so that no new code will continue to rely on them. |
Great that we are in agreement -> #5625 |
|
|
||
| @safe unittest | ||
| { | ||
| import std.algorithm.comparison : equal; |
There was a problem hiding this comment.
import std.internal.test.dummyrange : ReferenceInputRange;
auto data = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ];
auto r = new ReferenceInputRange!int(data);
std/range/package.d
Outdated
|
|
||
| private this(Source r, size_t chunkSize) | ||
| { | ||
| impl = new Impl(r, chunkSize, chunkSize); |
There was a problem hiding this comment.
Is putting this on the heap really nessesary? It's just that you're throwing out @nogc.
There was a problem hiding this comment.
It's a shame that RefCounted is still unsafe...
There was a problem hiding this comment.
I'm on the fence about this. Ideally, we should use RefCounted here, I think byChunk does that, but then it's a toss-up between whether you want to keep @nogc or @safe, but you can't have both.
I'm OK either way -- if you guys feel it's more consistent to use RefCounted ala byChunk, then it's a relatively easy change.
There was a problem hiding this comment.
Two more arguments for RefCounted:
- making safe RC (
core.rc) is on the agenda, so if we useRefCountednow, chances are good that it might turn@safein the future - it's easier to use
@trustedon your code than the hacky and rather unknownassumeNoGc.
In any case it's probably a good idea to open an issue about it, s.t. it's not forgotten.
std/range/package.d
Outdated
| fewer than $(D chunkSize) elements. | ||
|
|
||
| If `Source` is an input range but not a forward range, the resulting range and | ||
| chunks will be single-pass only: iterating over `front` will shrink the chunk |
There was a problem hiding this comment.
IMO there's no need for the long explaination; people should already know what an input range is if they're reading this
If
Sourceis an input range and not a forward range, each chunk will also be an input range. Any calls topopFrontonChunkswill also invalidate any lingering references to previous values in each chunk.
There was a problem hiding this comment.
Well, I just wanted to make sure people come with no expectations that we will cache range elements or otherwise make the resulting range "better behaved". A lot of Phobos algorithms will act funny when given a non-forward input range (try it sometime), and I have heard complaints in the past about why something doesn't behave "intuitively", even though that is impossible given input range semantics.
There was a problem hiding this comment.
"If Source is a forward range, the resulting chunks will be forward ranges as well. Otherwise (i.e. Source is an input range), the resulting chunks will be input ranges consuming the same input."
| @@ -7147,6 +7208,43 @@ if (isForwardRange!Source) | |||
| assert(equal(retro(array(chunks)), array(retro(chunks)))); | |||
There was a problem hiding this comment.
I would add the following example to show the new functionality
int i;
auto inputRange = generate!(() => ++i).take(10);
auto chunked = inputRange.chunks(2);
assert(inputRange.front.equal([1, 2]));
assert(inputRange.front.empty);
inputRange.popFront;
assert(inputRange.front.equal([3, 4]));or something similar
|
Hmm, I tried converting the heap allocation to use Any ideas what might be wrong? |
|
TBH when I asked about putting it on the heap, I was talking about in general and not about using the GC. Is there a specific reason why you're using an indirection here? |
|
Very good question. The original code I drew this from actually did not have the indirection... but I quickly discovered that mixing non-forward input ranges with by-value semantics was a very bad combination, because in a complex UFCS pipeline, you easily end up inadvertently passing part of the state by value, thus causing the range wrapper to go out of sync with the input range being wrapped (esp. when the latter has by-reference semantics, or references external state). After hacking the code with various workarounds, eventually I just gave up and decided that the cleanest way to guarantee correct behaviour was to simply make the whole thing by-reference, including the Even that turned out to be insufficient in the face of structs' pass-by-value semantics, so eventually all state has to be stored in the wrapper's state, and |
|
I'd very much like the |
|
Gah, I'm an idiot. Was trying to assign |
|
Hmm, do we want |
|
Haha, that was also my reaction. :-P Ideally both, but that will have to wait for |
|
Copy/pasting my PoV from the sub-comments:
|
IIRC it requires the DIP1000 stuff to be fully working |
std/range/package.d
Outdated
| fewer than $(D chunkSize) elements. | ||
|
|
||
| If `Source` is an input range but not a forward range, the resulting range and | ||
| chunks will be single-pass only: iterating over `front` will shrink the chunk |
There was a problem hiding this comment.
"If Source is a forward range, the resulting chunks will be forward ranges as well. Otherwise (i.e. Source is an input range), the resulting chunks will be input ranges consuming the same input."
std/range/package.d
Outdated
|
|
||
| Returns: Forward range of all chunks with propagated bidirectionality, | ||
| conditional random access and slicing. | ||
| Returns: Input range of all chunks with propagated forwardness, |
There was a problem hiding this comment.
"forwardness" is a bit awkward, rephrase? Just say what you mean, you've already explained the details: "Returns: Range of chunks"
std/range/package.d
Outdated
| bidirectionality, conditional random access and slicing. | ||
| */ | ||
| struct Chunks(Source) | ||
| if (isForwardRange!Source) |
There was a problem hiding this comment.
The fact that implementation is distinct for input and forward ranges is an implementation detail. Use the weakest constraint here and use static if inside the struct to fork the implementation.
There was a problem hiding this comment.
Originally, this is what I did but it made the diff look really ugly, that's why I went with splitting it into two overloads. But since you asked for it, I'll go back to doing it that way.
std/range/package.d
Outdated
| size_t _chunkSize; | ||
| } | ||
|
|
||
| /// ditto |
std/range/package.d
Outdated
| private Source r; | ||
| private size_t chunkSize; | ||
| private size_t curSizeLeft; | ||
| private bool _empty; |
There was a problem hiding this comment.
To prevent being overwritten from outside, as pointed out by @wilzbach's review.
std/range/package.d
Outdated
| private this(Source r, size_t chunkSize) | ||
| { | ||
| impl = RefCounted!Impl(r, chunkSize, chunkSize); | ||
| impl._empty = r.empty; |
There was a problem hiding this comment.
does impl = RefCounted!Impl(r, chunkSize, chunkSize, r.empty); work?
| { | ||
| impl.curSizeLeft--; | ||
| impl.r.popFront(); | ||
| } |
There was a problem hiding this comment.
Is this impl.curSizeLeft -= impl.r.popFrontN(impl.curSizeLeft);?
std/range/package.d
Outdated
| if (!impl.r.empty) | ||
| impl.curSizeLeft = impl.chunkSize; | ||
| else | ||
| impl._empty = true; |
There was a problem hiding this comment.
Looks like impl._empty is just a cache for impl.r.empty, why not eliminate? Alternatively, put a 0 in impl.chunkSize when done.
There was a problem hiding this comment.
Can't eliminate it because the original range may have been exhausted after the last chunk is iterated, but we still need to allow one more call to popFront to pop off the last chunk (conceptually, of course it's a no-op).
But your idea of setting impl.chunkSize to 0 when empty is a good one, it lets us eliminate _empty after all. I'll go with that.
andralex
left a comment
There was a problem hiding this comment.
thx for a solid piece of work
|
please submit a changelog entry too |
std/range/package.d
Outdated
| } | ||
|
|
||
| /// Non-forward input ranges are supported, but with limited semantics. | ||
| /*@safe*/ unittest // FIXME: can't be @safe because RefCounted isn't. |
There was a problem hiding this comment.
Ping @quickfur, this needs to be marked @system in order to pass circle.
std/range/package.d
Outdated
| assert(chunked.front.equal([3, 4])); | ||
| } | ||
|
|
||
| /*@safe*/ unittest |
changelog/std-range-input-chunks.dd
Outdated
| `std.range.chunks` was extended to support non-forward input ranges. | ||
|
|
||
| Now `std.range.chunks` can be used with input ranges that are not forward | ||
| ranges, albeit with limited semantics as imposed by the underlying range. |
…ut ranges. Add input range example.
Rationale: sometimes all you can get is an input range (e.g.,
File.byLine), and it is undesirable to have to buffer the entire input range just to be able to callchunkson it. This extension ofchunksto support non-forward input ranges allows for the possibility of caching such ranges by chunks so that you process it incrementally while offering better range primitives on the cached portion of the range.For example:
For now, I'm implementing this as a separate overload of
Chunks, due to the unfortunate historical accident thatChunkswas exposed as a public API rather than encapsulated as a Voldemort or private module-global struct. The implementation is fundamentally different, since without forward range primitives the current code simply cannot be made to work correctly. Originally I tried to use static if inside the struct to separate the two implementations, but it was very messy and the diff was a sight to behold. So I'm keeping it as a separate overload for now.