Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

[WIP] Fix LargeArrayBuilder.CopyTo returning incorrect end-of-copy position #23730

Closed
wants to merge 4 commits into from

Conversation

jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Sep 1, 2017

Note: This only affects the LAB.CopyTo overload that returns a CopyPosition. The other void overload isn't affected. The only type that uses this method is SparseArrayBuilder (SAB), so really only consumers of SAB are affected. Only LINQ uses SAB even though it's in Common, specifically in the functions Concat, SelectMany, Append, and Prepend. Append and Prepend weren't affected by this bug for reasons that will be explained later.


The job of LAB.CopyTo is to accept a CopyPosition to start copying from, and to return a CopyPosition representing the position copied up to. Previously, it was always returning an incorrect CopyPosition because of this line:

for (; count > 0; row++, column = 0)

The loop statement here, row++, column = 0, is the cause of the bug; it should not be running on the very last iteration of the loop. To give a better understanding of what it's causing, consider the following repro from @OmarTawfik:

var results = new TestEnumerable<int>(new[] { 1 })
    .Concat(new[] { 2 })
    .Concat(new TestEnumerable<int>(new[] { 3 }))
    .ToArray();

Assert.Equal(new[] { 1, 2, 3 }, results); // Actual: { 1, 2, 1 }

First, the SAB buffers the contents of all non-collections into a LAB, and stores references to the collections in an array (without copying the contents of those collections). It also maintains a list of "markers", which are indices at which the contents of the collections will be inserted in the future. So the above calls will create this:

LAB: [ [1, 3] ]
Markers: [ [ Index: 1, CollectionCount: 1 ] ]
CollectionReferences: [ [ 2 ] ]

Then in SAB.ToArray comes the bug. When a marker is sandwiched between the contents of two non-collections, this bug occurs.

  • We start out with [ 0 0 0 ] for the output array.

  • What happens in the above example is SAB.ToArray (correctly) calls LAB.CopyTo to copy the contents of the first collection, with position = (0, 0) and count = 1. Now the output array is [ 1 0 0 ].

    • It receives back the position (1, 0) due to aforementioned bug, when it should really receive (0, 1).
  • Then we (correctly) reserve space for the second collection, skipping over the 2nd slot.

  • Then we continue copying, starting from the position we received from the last LAB.CopyTo call.

    • Since the LAB doesn't even have 2 buffers, you would think starting the copy from (1, 0) (which means index 0 in the buffer at index 1) should throw an exception/raise an assert/etc. However, due to the way it's currently written, GetBuffer(1) returns the same as GetBuffer(0), the first buffer which is [ 1 3 ].
    • So then we copy 1 item starting at index 0 of [ 1 3 ] to the output array, resulting in [ 1 0 1 ].
  • Then we fill in the collections' contents, finally resulting in [ 1 2 1 ].


This bug doesn't affect Append or Prepend because they don't use the faulty position from CopyTo; to put it another way, there's no way they can sandwich a marker between 2 non-collections. If we call nonCollection.Append(item).Prepend(item).ToArray(), we get this during the first stage above:

Marker | NonCollection | Marker

This PR fixes the implementation, refactors some loop logic into a CopyToCore method and adds regression tests.

@OmarTawfik
Copy link
Contributor

@dotnet-bot test this please

@OmarTawfik
Copy link
Contributor

Fixes #23680

@mmitche
Copy link
Member

mmitche commented Sep 1, 2017

@dotnet-bot test OSX x64 Debug Build (machine out of disk space)

@mmitche
Copy link
Member

mmitche commented Sep 1, 2017

@OmarTawfik Actually i'm not sure about that anymore....It looks like the tests may simply have eaten like the entire swap space. As soon as the job was kiklled it went back to have 60GB free.

@OmarTawfik
Copy link
Contributor

@mmitche I think there is an issue with this change. From the logs:

14:53:39      at System.Diagnostics.Debug.Assert(Boolean condition, String message, String detailMessage)
14:53:39      at System.Collections.Generic.LargeArrayBuilder`1.<CopyTo>g__CopyToCore15_0(T[] sourceBuffer, Int32 sourceIndex, <>c__DisplayClass15_0& ) in /Users/dotnet-bot/j/workspace/dotnet_corefx/master/osx-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/Common/src/System/Collections/Generic/LargeArrayBuilder.cs:line 216

Looking into it.

@@ -197,6 +211,24 @@ public void CopyTo(T[] array, int arrayIndex, int count)
/// <returns>The position in this builder that was copied up to.</returns>
public CopyPosition CopyTo(CopyPosition position, T[] array, int arrayIndex, int count)
{
int CopyToCore(T[] sourceBuffer, int sourceIndex)
{
Debug.Assert(sourceBuffer.Length > sourceIndex);
Copy link
Contributor

@OmarTawfik OmarTawfik Sep 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesqo it seems that this is false sometimes, even with simple cases like: new int[] { 1 }.Concat(new int[] { 2 }).ToArray(), these two are equal. I believe the problem is having no elements to copy
(count: 0). The function should simply return.

@stephentoub
Copy link
Member

@jamesqo, @OmarTawfik, @VSadov, what is the status of this fix? The bug is quite impactful.

@OmarTawfik
Copy link
Contributor

OmarTawfik commented Sep 5, 2017

@stephentoub, If @jamesqo has no time for that today, I'll cherry-pick the fix, fix up the tests, and submit a PR later.

@stephentoub
Copy link
Member

stephentoub commented Sep 5, 2017

I'll cherry-pick the fix, fix up the tests, and submit a PR

@OmarTawfik, please do, thanks. It'd be good to get this fix in today or tomorrow at the latest.

@stephentoub
Copy link
Member

Replaced by #23817

@stephentoub stephentoub closed this Sep 6, 2017
@karelz karelz modified the milestone: 2.1.0 Sep 6, 2017
@jamesqo jamesqo deleted the lab-toarray branch September 6, 2017 19:02
@jamesqo
Copy link
Contributor Author

jamesqo commented Sep 6, 2017

Ugh, I really wish I had time to check my notifications yesterday. Regardless thanks @OmarTawfik for picking this up.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants