Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Fix LargeArrayBuilder.CopyTo returning incorrect end-of-copy position #23817

Merged
merged 4 commits into from
Sep 7, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions src/Common/src/System/Collections/Generic/LargeArrayBuilder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ public void CopyTo(T[] array, int arrayIndex, int count)
public CopyPosition CopyTo(CopyPosition position, T[] array, int arrayIndex, int count)
{
Debug.Assert(arrayIndex >= 0);
Debug.Assert(count >= 0 && count <= Count);
Debug.Assert(count > 0 && count <= Count);
Debug.Assert(array?.Length - arrayIndex >= count);

// Go through each buffer, which contains one 'row' of items.
Expand All @@ -216,25 +216,36 @@ public CopyPosition CopyTo(CopyPosition position, T[] array, int arrayIndex, int
int row = position.Row;
int column = position.Column;

for (; count > 0; row++, column = 0)
T[] buffer = GetBuffer(row);
int copied = CopyToCore(buffer, column);

while (count > 0)
{
buffer = GetBuffer(++row);
copied = CopyToCore(buffer, 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any tests that enter this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests with larger data sets


return copied == buffer.Length
? new CopyPosition(row + 1, 0)
: new CopyPosition(row, copied);
Copy link
Contributor

@jamesqo jamesqo Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I believe the previous setup I had in place which looked like

/* first copy */
if (count == 0)
{
    // Note it's `column + copied` instead of `copied`
    return new CopyPosition(row, column + copied).Normalize(buffer.Length);
}

do
{
    /* subsequent copies */
}
while (count > 0);

return new CopyPosition(row, copied).Normalize(buffer.Length);

was correct. With the current code, something like

new TE<int>(new[] { 1 }).Concat(new TE<int>(new[] { 2 })).Concat(new TE<int>(new[] { 3 })).Concat(new[] { 4 }).ToArray()

or

new[] { 1, 2, 3, 4 }.SelectMany(i => i == 4 ? new[] { i } : new TE<int>(new[] { i })).ToArray()

should result in 1 2 2 4. Can you verify that's true?

If it is, you can add a regression test for them named NonCollectionThatDoesNotBeginAtStartOfRow_FollowedByOtherNonCollections. What happens is the LAB the non-collections are buffered into looks like this:

    C0 C1 C2
R0:  1  2  3

The first LAB.CopyTo takes (row = 0, column = 0) as input, copies 1 item (1), and returns (0, 1). The 2nd LAB.CopyTo takes (0, 1) as input, copies 1 item (2) and returns (0, 1) when it should return (0, 2). The 3rd LAB.CopyTo then takes (0, 1) as input and copies 2 instead of 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code is not needed because we now assert that count is never zero. However, I'll add the tests mentioned.


int CopyToCore(T[] sourceBuffer, int sourceIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is it customary in corefx to put a local func at the very end of the containing method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have very few local methods; there's no convention yet. But unless there's an obviously better place, end of the method makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the custom for this repo. I've not seen a lot of local functions used here yet. But that pattern is common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simpler without a local function, but this will work too.

{
T[] buffer = GetBuffer(index: row);
Debug.Assert(sourceBuffer.Length > sourceIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned previously this assert failed. It's good to see that's no longer the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because of the if (count > 0) check added to the calling method.


// During this iteration, copy until we satisfy `count` or reach the
// end of the current buffer.
int copyCount = Math.Min(buffer.Length, count);
// Copy until we satisfy `count` or reach the end of the current buffer.
int copyCount = Math.Min(sourceBuffer.Length - sourceIndex, count);

if (copyCount > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any tests where copyCount == 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmarTawfik I don't think this can happen anymore. Add these asserts and remove the conditional:

+Debug.Assert(count > 0);
 Debug.Assert(sourceBuffer.Length > sourceIndex);

 // Copy until we satisfy `count` or reach the end of the current buffer.
 int copyCount = Math.Min(sourceBuffer.Length - sourceIndex, count);
+Debug.Assert(copyCount > 0);

{
Array.Copy(buffer, column, array, arrayIndex, copyCount);
Array.Copy(sourceBuffer, sourceIndex, array, arrayIndex, copyCount);

arrayIndex += copyCount;
count -= copyCount;
column += copyCount;
}
}

return new CopyPosition(row: row, column: column);
return copyCount;
}
}

/// <summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,11 @@ public void CopyTo(T[] array, int arrayIndex, int count)
count -= reservedCount;
}

// Finish copying after the final marker.
_builder.CopyTo(position, array, arrayIndex, count);
if (count > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beneficial for scenarios when we insert a marker at the very end, e.g. nonCollection.Append().Append()./* a bunch of times */ or nonCollection.Concat(collection), so 👍

{
// Finish copying after the final marker.
_builder.CopyTo(position, array, arrayIndex, count);
}
}

/// <summary>
Expand Down
13 changes: 13 additions & 0 deletions src/System.Linq/tests/ConcatTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -421,5 +421,18 @@ public void GetEnumerableOfConcatCollectionChainFollowedByEnumerableNodeShouldBe
Assert.Equal(0xf00, en.Current);
}
}

[Fact]
public void CollectionInterleavedWithLazyEnumerables_ToArray_Regression()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we need the "_Regression" suffix on these tests? In a year it won't matter to someone reading the code whether we added a test proactively or in response to a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub I personally added _Regression because someone might think "what a weird test. What is this supposed to cover?" / etc. in the future. Admittedly, it might be more helpful to remove the suffix and just link to the github issue in a comment:

        public void CollectionInterleavedWithLazyEnumerables_ToArray()
        {
            // This is a regression test for https://github.com/dotnet/corefx/issues/23680
            ...
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a weird test. What is this supposed to cover?

That's why test naming is important, to qualify what the test is for. Comments are good, too, of course. But "_Regression" as a suffix is so moment-in-time. If you were to follow true TDD, for example, everything would need to be suffixed with "_Regression" because the tests are all written before any code is and thus all fail initially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment with the bug URL.

{
var results = new TestEnumerable<int>(new[] { 1 })
.Concat(new[] { 2 })
.Concat(new TestEnumerable<int>(new[] { 3 }))
// Do not omit this ToArray()! There was a previous bug where the ToArray() implementation
// was incorrect, while iterating with foreach produced the correct results.
.ToArray();

Assert.Equal(new[] { 1, 2, 3 }, results);
}
}
}
14 changes: 14 additions & 0 deletions src/System.Linq/tests/SelectManyTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -477,5 +477,19 @@ public void ThrowOverflowExceptionOnConstituentLargeCounts(int[] counts)
IEnumerable<int> iterator = counts.SelectMany(c => Enumerable.Range(1, c));
Assert.Throws<OverflowException>(() => iterator.Count());
}

[Fact]
public void CollectionInterleavedWithLazyEnumerables_ToArray_Regression()
{
var results = new[] { 4, 5, 6 }
.SelectMany(i => i == 5 ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should have tests with other operators and with varying sizes, not just SelectMany/Concat and not just with one or a few items.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since various operators specialize on the source collection type, seems like we need tests on various source types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not just SelectMany/Concat

I explained in my last PR's description that since Append/Prepend never use the incorrect returned CopyPosition, they shouldn't be affected by this bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained in my last PR's description that since Append/Prepend never use the incorrect returned CopyPosition, they shouldn't be affected by this bug.

Even if that's true of the current implementation, you're assuming the current implementation will never change. Tests are important both to verify existing behavior and to help prevent future regressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is affecting only SelectMany() and Concat(). I've added tests for small and large data sets.

(IEnumerable<int>)new List<int>() { i } :
new TestEnumerable<int>(new int[] { i }))
// Do not omit this ToArray()! There was a previous bug where the ToArray() implementation
// was incorrect, while iterating with foreach produced the correct results.
.ToArray();

Assert.Equal(new[] { 4, 5, 6 }, results);
}
}
}