Skip to content

Fix Issue 15798 - std.algorithm.mutation.copy takes target by value#6039

Closed
JackStouffer wants to merge 2 commits intodlang:masterfrom
JackStouffer:issue15798
Closed

Fix Issue 15798 - std.algorithm.mutation.copy takes target by value#6039
JackStouffer wants to merge 2 commits intodlang:masterfrom
JackStouffer:issue15798

Conversation

@JackStouffer
Copy link
Contributor

copy is designed in such a way that passing in a generic output range as the target value makes no sense, as copy is supposed to return the "remainder" of target. As such, its inputs should be restricted to inputs that are assignable input ranges or slices.

Consider the following from the stackoverflow example

import std.stdio;
import std.digest.digest;
import std.digest.md;
import std.algorithm;

void main() {
    string s = "Hello!\n";
    auto d1 = makeDigest!MD5;
    auto d2 = makeDigest!MD5;
    foreach (ubyte b; s) {
        d1.put(b);
    }
    s.copy(d2);
    writeln(digest!MD5(s).toHexString);
    writeln(d1.finish().toHexString);
    writeln(d2.finish().toHexString);
}

This outputs

E134CED312B3511D88943D57CCD70C83
E134CED312B3511D88943D57CCD70C83
D41D8CD98F00B204E9800998ECF8427E

Because copy takes the target by value. What the user should have done is use std.range.primitives.put to copy an input range to an output range which is taken by reference.

This change adds a deprecation message that alerts the user. Any usage of copy in this way is very likely deprecating broken code.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @JackStouffer!

Bugzilla references

Auto-close Bugzilla Description
15798 std.algorithm.mutation.copy takes target by value

std/stdio.d Outdated
.sort() // sort the lines
.copy( // copy output of .sort to an OutputRange
stdout.lockingTextWriter()); // the OutputRange
put(
Copy link
Contributor Author

@JackStouffer JackStouffer Jan 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rare case where copy works as expected, the only reason being stdout.lockingTextWriter represents an output range which has only one global output destination, and therefore can be passed by value.

But generally it's a very bad idea to accept output ranges by value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boo, this breaks the UFCS chain. Is there as UFCS-able put in Phobos?

std/stdio.d Outdated
stdin // read from stdin
.byLineCopy(Yes.keepTerminator) // copying each line
.array() // convert to array of lines
.sort() // sort the lines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to repeat @CyberShadow's comment as it got hidden with the last rebase:

Boo, this breaks the UFCS chain. Is there as UFCS-able put in Phobos?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, that's a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existential problem with put, because the member hook is also called put.

@JackStouffer
Copy link
Contributor Author

@CyberShadow In this specific instance you can call lockingTextWriter.put. But it's better to encourage people to use the global put as it takes care of the type of use cases where output ranges only define a char accepting put, and a string is passed.

@JackStouffer
Copy link
Contributor Author

I get that using std.range.put isn't pretty, but it is more correct. This code worked by chance and not by design.

@CyberShadow
Copy link
Member

But it's not UFCS-able :(

@wilzbach
Copy link
Contributor

I get that using std.range.put isn't pretty,
But it's not UFCS-able :(

Wasn't this example one of the few prime examples of how elegant D code can look like? 😞

@CyberShadow
Copy link
Member

CyberShadow commented Jan 16, 2018

UFCS is one of the more beautiful parts of D. Even though the code worked by chance, it saddens me that code like this now has to become uglier.

I often try to structure my code to improve its UFCS-ability, e.g. [1] [2]. I remember using this very pattern (for copying a range to a file) some time recently as well.

@JackStouffer
Copy link
Contributor Author

There is a solution here: define some function X (we can bikeshed this) that just forwards to put, but because it's not called put we don't have to worry about name collision with the method of the type. That way it can be used in UFCS chains.

@CyberShadow
Copy link
Member

Yep, + invert args. I guess .reverseArgs!put will work now, but it's not really idiomatic. Anyway not a blocker I guess.

@schveiguy
Copy link
Member

I'm not 100% convinced this is the right path. "it should only work for arrays" seems short-sighted. It should only work for output ranges that don't copy their state by value. However, we have no mechanism to test for this. Even if we made it a rule that output ranges must be reference types, there's no static way to check for this.

Is there no other path forward here?

@JackStouffer
Copy link
Contributor Author

@schveiguy it also works for ranges with assignable values a-la the retro example with copy.

@schveiguy
Copy link
Member

schveiguy commented Jan 16, 2018

it also works for ranges with assignable values

That's not necessarily a requirement:

struct HasAssignableElements
{
   int front;
   void popFront() {front = 0;}
   enum empty = false;
}

@JackStouffer
Copy link
Contributor Author

@schveiguy I don't understand what that's supposed to illustrate. This would be treated as an output range by copy before this change and would discard all the values it's given except the last.

@schveiguy
Copy link
Member

many output ranges will fail silently

I don't think there are many output ranges that have this problem. Most reference the data anyway, or are lvalues.

I am on the side of not breaking working, valid, reasonable code like the example stdin.byLineCopy(Yes.keepTerminator).array.sort.copy(stdout.lockingTextWriter).

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look too good. @JackStouffer have you considered changing copy such that it takes its target by auto ref?

std/stdio.d Outdated
output, // writing directly to stdout
stdin // read from stdin
.byLineCopy(Yes.keepTerminator) // copying each line
.array() // convert to array of lines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an aside, the parens here must go

@schveiguy
Copy link
Member

have you considered changing copy such that it takes its target by auto ref?

We can't do that completely, because that changes the semantics of copy with forward ranges that are input ranges with assignable elements:

auto buffer = new int[100];
iota(15).copy(buffer);
writeln(buffer[0 .. 15]);

Currently, this will print the numbers from 0 to 14. If copy is auto-ref, it will print 15 zeros.

I think we can use @JackStouffer's identification of the types of ranges that copy has problems with in this PR and just make THAT overload auto-ref. See #6039 (comment)

@wilzbach
Copy link
Contributor

As no one pointed this out before - Jenkins failure is at tools:

changed.d(122): Deprecation: function std.algorithm.mutation.copy!(MapResult!(to, FilterResult!(__lambda5, Splitter!(cast(Flag)false, char[], Wrapper))), Appender!(int[])).copy is deprecated - std.algorithm.mutation.copy to non-arrays should be replaced with std.range.primitives.put

https://github.com/dlang/tools/blob/master/changed.d#L122

@JackStouffer
Copy link
Contributor Author

@schveiguy @andralex @CyberShadow Removed the deprecation and made the new overload take the output range by auto ref. For the new overload, I also made it return void to avoid (heh) confusion with returning "unfilled" portions of output ranges when that doesn't apply.

@schveiguy
Copy link
Member

I like this direction better.

I have considered further that it's possible an output range accepted by reference may break some code. Namely an output range that does put elements into an external location, but isn't expected to affect the lvalue that was used.

It might be possible to deprecate this functionality, but it might be too clever. If you pass in an lvalue by reference, you can pragma(msg, "Warning, in version 2.0XX, this call will accept the target by reference, and affect the original"), and then make a copy before using it.

Copy link
Member

@schveiguy schveiguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is solid, but we don't need to ban usage of copy now that it should work for lvalues.

File file = File(filename);

//As digests imlement OutputRange, we could use std.algorithm.copy
//As digests imlement OutputRange, we could use std.range.put
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this? copy should work for this now. Alternatively, the comment looks odd since you are calling hash.put, which I first thought was a UFCS call of std.range.put. If we leave the comment as calling std.range.put, maybe include the full call to show how it's different.

File file = File(filename);

//As digests implement OutputRange, we could use std.algorithm.copy
//As digests implement OutputRange, we could use std.range.put
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

auto oneMillionRange = repeat!ubyte(cast(ubyte)'a', 1000000);
auto ctx = makeDigest!MD5();
copy(oneMillionRange, &ctx); //Note: You must pass a pointer to copy!
put(ctx, oneMillionRange);
Copy link
Member

@schveiguy schveiguy Jan 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can revert this, and remove the &. Leave the note, but just say how ctx must be an lvalue for this to work.

Hash hash;
hash.start();
copy(range, &hash);
put(hash, range);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can revert, and also switch to UFCS if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this one isn't a public example, and std.range is more likely to have already been imported, I'll leave this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I didn't notice that it wasn't an example.

auto oneMillionRange = repeat!ubyte(cast(ubyte)'a', 1000000);
auto ctx = new MD5Digest();
copy(oneMillionRange, ctx);
put(ctx, oneMillionRange);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

{
copy(data, File(fileName, "wb").lockingBinaryWriter);
auto w = File(fileName, "wb").lockingBinaryWriter;
put(w, data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer reverting this, but I'm fine with the new version as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this one to remove the global std.algorithm.mutation import.


Before this release, $(REF copy, std, algorithm, mutation) took all
$(REF_ALTTEXT output ranges, isOutputRange, std, range, primitives) by value.
Leads to some issues when output ranges aren't designed around a reference to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads

Before this release, $(REF copy, std, algorithm, mutation) took all
$(REF_ALTTEXT output ranges, isOutputRange, std, range, primitives) by value.
Leads to some issues when output ranges aren't designed around a reference to
heap memory, e.g. the hash ranges in $(MREF std,digest,package):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to drop the package

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a place to see what this actually produces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, DAutoTest will show a preview, but it's failing atm due to trailing whitespace:

Searching for trailing whitespace
./changelog/pending.dd:180:With this release, when the target output range in a call to 
./changelog/pending.dd:188:Before this release, $(REF copy, std, algorithm, mutation) took all 
posix.mak:875: recipe for target 'test' failed

assert(h2.finish().toHexString == "D41D8CD98F00B204E9800998ECF8427E");
-------

This issue no longer occurs as long as `target` is an lvalue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarity I would copy the same test with the behavior in >= 2.079

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just note that in 2.079 going forward, h1 and h2's results will be identical.

isInputRange!SourceRange &&
isOutputRange!(TargetRange, ElementType!SourceRange) &&
!isArray!TargetRange &&
!hasAssignableElements!TargetRange)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the last two overloads differ, so static if could simply things and provide better error messages here.
I assume you wanted to make it explicit that void is returned? The docs already do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are meaning here, but I don't know how you do auto ref correctly without an overload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah fair enough :/

Returns:
The unfilled part of target
The unfilled part of `target` if `target` is not a `put` defining output
range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void otherwise


import std.range.primitives;
import std.traits : isArray, isBlitAssignable, isNarrowString, Unqual, isSomeChar;
import std.traits : isArray, isBlitAssignable, isNarrowString, Unqual, isSomeChar, isDynamicArray;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this import needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what the test originally was (not isArray). It may be leftover.

isInputRange!SourceRange &&
isOutputRange!(TargetRange, ElementType!SourceRange))
isOutputRange!(TargetRange, ElementType!SourceRange) &&
(isArray!TargetRange || hasAssignableElements!TargetRange))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be just hasAssignableElements!TargetRange?

string a;
isArray!(typeof(a)).writeln; // true
isOutputRange!(typeof(a), char).writeln; // false
a[0] = "b"; // cannot modify immutable expression

https://run.dlang.io/is/ivpUfM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. I think hasAssignableElements would fail on a char[], but not sure if that's a valid output range anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested, char[] is not an output range for char, which is... strange. It looks like there's special handling for characters in put, so I would have expected this to work.

$(REF copy, std, algorithm, mutation) is any `put` defining output range, the
target is taken as `auto ref`. This new overload is also `void` returning
vs the other overloads which return the unfilled portion of the given range
or array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't exact enough. It's really if the output range is not an input range with assignable elements. By deduction, this means it defines put, which is a necessary condition, but not a sufficient condition to trigger the auto-refness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. one could define an input range with assignable elements that ALSO defines put for some optimization. In this case, it won't be auto-ref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah! If someone defines an output range which defines put and also has assignable elements, logically one would assume that any library function would prioritize put because that's supposedly the fundamental operation of output ranges. But copy prioritizes the r.front assignment, which according to the principal of least surprise is wrong. But "fixing" it would be a breaking change!

Thankfully, an output range which is also an input range with assignable elements is really rare.

Copy link
Member

@schveiguy schveiguy Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would expect that if it does define assignable elements, either interface will properly work in this scenario (copy by value). So I think it's OK.

The real problem would be if you somehow had an assignable element input range of T, which was also an output range of U ;) But I won't worry about that case...

@schveiguy
Copy link
Member

I would like to hear from @andralex about the implications of using auto ref here, particularly surrounding this scenario: #6039 (comment)

But other than that, LGTM.

@JackStouffer
Copy link
Contributor Author

Fixed

assert(h1.finish().toHexString == "E134CED312B3511D88943D57CCD70C83");
assert(h2.finish().toHexString == "D41D8CD98F00B204E9800998ECF8427E");
// this fails in 2.079 and later
assert(h1.finish().toHexString != h2.finish().toHexString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you allowed to call finish more than once? I'm not familiar with the API of digests. Also the assert is trivially obvious (the two strings are different per the above asserts).

I think in the suggestion I was saying, you could just put a note after the code saying the two digests will be the same in 2.079 and later.

Alternatively, you could leave out the 3rd assert and put as a comment: "In 2.079 and later, both digests will have the same value".

@JackStouffer
Copy link
Contributor Author

Ping @andralex

@schveiguy
Copy link
Member

Just wanted to say I am good with the rewording, thanks.

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's integrate the two versions into one that takes auto ref. Then distinguish cases with static if inside.

This will simplify the constraints. For the arrays, the auto ref won't matter anyway. Thanks!

@schveiguy
Copy link
Member

@andralex sorry, I'm not sure if you saw my previous comment. Switching to auto-ref does matter for arrays -- now you will advance the array along.

I guarantee there is code out there that does something like this:

auto buffer = new int[100];
iota(15).copy(buffer);
writeln(buffer[0 .. 15]);

Which will be semantically different if copy takes it's target via ref.

@andralex
Copy link
Member

andralex commented Jan 20, 2018

@schveiguy just create a local copy of the array inside the copy function

@JackStouffer
Copy link
Contributor Author

Closing this as I'm no longer interested. If someone else wants to shepherd these changes, feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants