Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Fix LargeArrayBuilder.CopyTo returning incorrect end-of-copy position #23817

Merged
merged 4 commits into from
Sep 7, 2017
Merged

Fix LargeArrayBuilder.CopyTo returning incorrect end-of-copy position #23817

merged 4 commits into from
Sep 7, 2017

Conversation

OmarTawfik
Copy link
Contributor

Fixes #23680
Replaces #23730

cc @stephentoub @VSadov @jamesqo

@@ -421,5 +421,18 @@ public void GetEnumerableOfConcatCollectionChainFollowedByEnumerableNodeShouldBe
Assert.Equal(0xf00, en.Current);
}
}

[Fact]
public void CollectionInterleavedWithLazyEnumerables_ToArray_Regression()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we need the "_Regression" suffix on these tests? In a year it won't matter to someone reading the code whether we added a test proactively or in response to a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub I personally added _Regression because someone might think "what a weird test. What is this supposed to cover?" / etc. in the future. Admittedly, it might be more helpful to remove the suffix and just link to the github issue in a comment:

        public void CollectionInterleavedWithLazyEnumerables_ToArray()
        {
            // This is a regression test for https://github.com/dotnet/corefx/issues/23680
            ...
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a weird test. What is this supposed to cover?

That's why test naming is important, to qualify what the test is for. Comments are good, too, of course. But "_Regression" as a suffix is so moment-in-time. If you were to follow true TDD, for example, everything would need to be suffixed with "_Regression" because the tests are all written before any code is and thus all fail initially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment with the bug URL.

{
buffer = GetBuffer(++row);
copied = CopyToCore(buffer, 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any tests that enter this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests with larger data sets

// end of the current buffer.
int copyCount = Math.Min(buffer.Length, count);
// Copy until we satisfy `count` or reach the end of the current buffer.
int copyCount = Math.Min(sourceBuffer.Length - sourceIndex, count);

if (copyCount > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any tests where copyCount == 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmarTawfik I don't think this can happen anymore. Add these asserts and remove the conditional:

+Debug.Assert(count > 0);
 Debug.Assert(sourceBuffer.Length > sourceIndex);

 // Copy until we satisfy `count` or reach the end of the current buffer.
 int copyCount = Math.Min(sourceBuffer.Length - sourceIndex, count);
+Debug.Assert(copyCount > 0);

public void CollectionInterleavedWithLazyEnumerables_ToArray_Regression()
{
var results = new[] { 4, 5, 6 }
.SelectMany(i => i == 5 ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should have tests with other operators and with varying sizes, not just SelectMany/Concat and not just with one or a few items.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since various operators specialize on the source collection type, seems like we need tests on various source types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not just SelectMany/Concat

I explained in my last PR's description that since Append/Prepend never use the incorrect returned CopyPosition, they shouldn't be affected by this bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained in my last PR's description that since Append/Prepend never use the incorrect returned CopyPosition, they shouldn't be affected by this bug.

Even if that's true of the current implementation, you're assuming the current implementation will never change. Tests are important both to verify existing behavior and to help prevent future regressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is affecting only SelectMany() and Concat(). I've added tests for small and large data sets.

@@ -149,8 +149,11 @@ public void CopyTo(T[] array, int arrayIndex, int count)
count -= reservedCount;
}

// Finish copying after the final marker.
_builder.CopyTo(position, array, arrayIndex, count);
if (count > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beneficial for scenarios when we insert a marker at the very end, e.g. nonCollection.Append().Append()./* a bunch of times */ or nonCollection.Concat(collection), so 👍

? new CopyPosition(row + 1, 0)
: new CopyPosition(row, copied);

int CopyToCore(T[] sourceBuffer, int sourceIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is it customary in corefx to put a local func at the very end of the containing method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have very few local methods; there's no convention yet. But unless there's an obviously better place, end of the method makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the custom for this repo. I've not seen a lot of local functions used here yet. But that pattern is common.

{
T[] buffer = GetBuffer(index: row);
Debug.Assert(sourceBuffer.Length > sourceIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned previously this assert failed. It's good to see that's no longer the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because of the if (count > 0) check added to the calling method.

// end of the current buffer.
int copyCount = Math.Min(buffer.Length, count);
// Copy until we satisfy `count` or reach the end of the current buffer.
int copyCount = Math.Min(sourceBuffer.Length - sourceIndex, count);

if (copyCount > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmarTawfik I don't think this can happen anymore. Add these asserts and remove the conditional:

+Debug.Assert(count > 0);
 Debug.Assert(sourceBuffer.Length > sourceIndex);

 // Copy until we satisfy `count` or reach the end of the current buffer.
 int copyCount = Math.Min(sourceBuffer.Length - sourceIndex, count);
+Debug.Assert(copyCount > 0);

@@ -421,5 +421,18 @@ public void GetEnumerableOfConcatCollectionChainFollowedByEnumerableNodeShouldBe
Assert.Equal(0xf00, en.Current);
}
}

[Fact]
public void CollectionInterleavedWithLazyEnumerables_ToArray_Regression()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub I personally added _Regression because someone might think "what a weird test. What is this supposed to cover?" / etc. in the future. Admittedly, it might be more helpful to remove the suffix and just link to the github issue in a comment:

        public void CollectionInterleavedWithLazyEnumerables_ToArray()
        {
            // This is a regression test for https://github.com/dotnet/corefx/issues/23680
            ...
        }

public void CollectionInterleavedWithLazyEnumerables_ToArray_Regression()
{
var results = new[] { 4, 5, 6 }
.SelectMany(i => i == 5 ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not just SelectMany/Concat

I explained in my last PR's description that since Append/Prepend never use the incorrect returned CopyPosition, they shouldn't be affected by this bug.


return copied == buffer.Length
? new CopyPosition(row + 1, 0)
: new CopyPosition(row, copied);
Copy link
Contributor

@jamesqo jamesqo Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I believe the previous setup I had in place which looked like

/* first copy */
if (count == 0)
{
    // Note it's `column + copied` instead of `copied`
    return new CopyPosition(row, column + copied).Normalize(buffer.Length);
}

do
{
    /* subsequent copies */
}
while (count > 0);

return new CopyPosition(row, copied).Normalize(buffer.Length);

was correct. With the current code, something like

new TE<int>(new[] { 1 }).Concat(new TE<int>(new[] { 2 })).Concat(new TE<int>(new[] { 3 })).Concat(new[] { 4 }).ToArray()

or

new[] { 1, 2, 3, 4 }.SelectMany(i => i == 4 ? new[] { i } : new TE<int>(new[] { i })).ToArray()

should result in 1 2 2 4. Can you verify that's true?

If it is, you can add a regression test for them named NonCollectionThatDoesNotBeginAtStartOfRow_FollowedByOtherNonCollections. What happens is the LAB the non-collections are buffered into looks like this:

    C0 C1 C2
R0:  1  2  3

The first LAB.CopyTo takes (row = 0, column = 0) as input, copies 1 item (1), and returns (0, 1). The 2nd LAB.CopyTo takes (0, 1) as input, copies 1 item (2) and returns (0, 1) when it should return (0, 2). The 3rd LAB.CopyTo then takes (0, 1) as input and copies 2 instead of 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code is not needed because we now assert that count is never zero. However, I'll add the tests mentioned.

@stephentoub
Copy link
Member

I have to ask: is all of this complexity and potential bug farm really worth it, vs just reverting #14675?
cc: @VSadov

@jamesqo
Copy link
Contributor

jamesqo commented Sep 6, 2017

@stephentoub I can see where you're coming from, and the practical part of me agrees with you. However, a part of me would be sad to see the code go... 😢 If you want, I can try cleaning up the code and renaming things after this PR to make it easier for other people to grasp what it does, to make any future bugs intuitive/obvious to fix for others. So far, although I've tried explaining it a lot, nobody seems to understand my code but me.

I am waiting to hear what @VSadov has to say though.

@jamesqo
Copy link
Contributor

jamesqo commented Sep 7, 2017

@OmarTawfik OK, I was wrong in my earlier comment when I said that

new TE<int> { 1 }.Concat(new TE<int> { 2 }).Concat(new TE<int> { 3 }).Concat(new[] { 4 }).ToArray()

would fail. I've debugged a little to see why. Here's a test case that actually does fail:

new TE<int> { 1 }.Concat(new[] { 2 }).Concat(new TE<int> { 3 }).Concat(new[] { 4 }).Concat(new TE<int> { 5 }).ToArray()
// Expected: 1 2 3 4 5
// Actual: 1 2 3 4 3

The fix is to go back to the previous setup in LAB.CopyTo which looked like

/* first copy */
if (count == 0)
{
    // Note it's `column + copied` instead of `copied`.
    return new CopyPosition(row, column + copied).Normalize(buffer.Length);
}

do
{
    /* subsequent copies */
}
while (count > 0);

return new CopyPosition(row, copied).Normalize(buffer.Length);

@jamesqo
Copy link
Contributor

jamesqo commented Sep 7, 2017

@OmarTawfik After you apply the above change to avoid the bug I pointed out, this PR looks fine to me. I have some minor nits but those can wait until a later PR.

if (copyCount > 0)
{
Array.Copy(buffer, column, array, arrayIndex, copyCount);
int CopyToCore(T[] sourceBuffer, int sourceIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simpler without a local function, but this will work too.

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@OmarTawfik
Copy link
Contributor Author

@jamesqo thanks for the additional test case!

T[] buffer = GetBuffer(row);
int copied = CopyToCore(buffer, column);

if(count == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in case you end up making other edits, there's a missing space between if and (

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, FWIW, I find it very confusing that CopyToCore is capturing count and modifying it. There's no indication at the call site that's happening, which means it looks like the only way count could be 0 is if it was passed in as 0, and yet there's a debug assert for that above.

{
// See https://github.com/dotnet/corefx/issues/23680

var concats = arrays[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we try to use var only when the type is obvious from the right hand side, which it isn't here or below

@stephentoub
Copy link
Member

If you believe this fully addresses the bug, with all corner cases covers, then LGTM (other than my comments).

@OmarTawfik
Copy link
Contributor Author

@stephentoub addressed the nit and added a couple more cases. I believe this is ready. I'll merge this as soon as it is green and prepare a PR for the release branch.

@OmarTawfik
Copy link
Contributor Author

@dotnet-bot test Tizen armel Debug Build

@OmarTawfik OmarTawfik merged commit 9b88997 into dotnet:master Sep 7, 2017
@OmarTawfik OmarTawfik deleted the fix-to-array branch September 7, 2017 21:28
@karelz karelz added this to the 2.1.0 milestone Sep 7, 2017
danmoseley pushed a commit that referenced this pull request Sep 9, 2017
…#23865)

* Fix LargeArrayBuilder.CopyTo returning incorrect end-of-copy position (#23817)

* Fix #23680

* PR Feedback

* More tests

* More tests

* Not using a local function
marek-safar pushed a commit to mono/corefx that referenced this pull request Jan 19, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…dotnet/corefx#23817)

* Fix dotnet/corefx#23680

* PR Feedback

* More tests

* More tests


Commit migrated from dotnet/corefx@9b88997
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq bug Product bug (most likely)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants