Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringBuilderTests.<FailureOnLargeString> [InlinedCallFrame] (System.Buffer.__Memmove) test timeout #40625

Open
benaadams opened this issue Aug 10, 2020 · 16 comments
Labels
area-System.Runtime disabled-test The test is disabled in source code against the issue
Milestone

Comments

@benaadams
Copy link
Member

Libraries Test Run release coreclr Windows_NT x64 Debug

https://helix.dot.net/api/2019-06-17/jobs/c34b0e31-15cf-4557-819a-06d9c812b956/workitems/System.Runtime.Nls.Tests/console

      Process: 7628 dotnet PrivateMemory: 8100003840
      Timed out at 8/10/2020 5:56:34 PM after 60000ms waiting for remote process.
      Wrote mini dump to: C:\helix\work\workitem\uploads\7628.tpwrerhj.lnu.dmp
      	Process ID: 7628
      	Handle: 1000
      	Name: dotnet
      	MainModule: C:\helix\work\correlation\dotnet.exe
      	StartTime: 8/10/2020 5:55:34 PM
      	TotalProcessorTime: 00:01:00.1875000
      	Threads:
      		Thread #1 (OS 0x6A0)   [MTA]
      			[InlinedCallFrame] (System.Buffer.__Memmove)
      			[InlinedCallFrame] (System.Buffer.__Memmove)
      			System.Buffer._Memmove(Byte*, Byte*, UIntPtr)
      			System.String.wstrcpy(Char*, Char*, Int32)
      			System.Text.StringBuilder.ThreadSafeCopy(Char*, Char[], Int32, Int32)
      			System.Text.StringBuilder.Append(Char*, Int32)
      			System.Text.StringBuilder.Append(Char[])
      			System.Text.Tests.StringBuilderTests+<>c.<FailureOnLargeString>b__168_1()
      			System.AssertExtensions.ThrowsAnyInternal(System.Action, System.Type[])
      			System.AssertExtensions.ThrowsAny[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Action)
      			System.Text.Tests.StringBuilderTests+<>c.<FailureOnLargeString>b__168_0()
      			[DebuggerU2MCatchHandlerFrame]
      			[HelperMethodFrame_PROTECTOBJ] (System.RuntimeMethodHandle.InvokeMethod)
      			System.Reflection.MethodBase.Invoke(System.Object, System.Object[])
      			Microsoft.DotNet.RemoteExecutor.Program.Main(System.String[])
      		Thread #2 (OS 0xB98) [Finalizer] [Background] [MTA]
      			[DebuggerU2MCatchHandlerFrame]
      		Thread #3 (OS 0x2210) [Thread pool worker] [Background] [MTA]
      
      Stack Trace:
        /_/src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs(225,0): at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose(Boolean disposing)
        /_/src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs(58,0): at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose()
        /_/src/libraries/System.Runtime/tests/System/Text/StringBuilderTests.cs(2237,0): at System.Text.Tests.StringBuilderTests.FailureOnLargeString()
  Finished:    System.Runtime.Nls.Tests

Seen in #40326 (comment)

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Aug 10, 2020
@hoyosjs
Copy link
Member

hoyosjs commented Aug 10, 2020

This was present on 11 of the last 100 builds:

763756 https://dev.azure.com/dnceng/public/_build/results?buildId=763756
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-(Windows.Nano.1809.Amd64.Open)windows.10.amd64.serverrs5.open@mcr.microsoft.com/dotnet-buildtools/prereqs:nanoserver-1809-helix-amd64-08e8e40-20200107182504
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
763761 https://dev.azure.com/dnceng/public/_build/results?buildId=763761
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
763731 https://dev.azure.com/dnceng/public/_build/results?buildId=763731
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
763737 https://dev.azure.com/dnceng/public/_build/results?buildId=763737
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
763700 https://dev.azure.com/dnceng/public/_build/results?buildId=763700
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
763638 https://dev.azure.com/dnceng/public/_build/results?buildId=763638
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
763617 https://dev.azure.com/dnceng/public/_build/results?buildId=763617
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
763443 https://dev.azure.com/dnceng/public/_build/results?buildId=763443
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
762936 https://dev.azure.com/dnceng/public/_build/results?buildId=762936
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
    System.Text.Tests.StringBuilderTests.FailureOnLargeString
762695 https://dev.azure.com/dnceng/public/_build/results?buildId=762695
  net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open

@hoyosjs hoyosjs added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Aug 10, 2020
@hoyosjs
Copy link
Member

hoyosjs commented Aug 10, 2020

Introduced on #40500 cc: @danmosemsft

Sample dump from #39640 (it's 7.5GB so it can take a bit to download)

@danmoseley
Copy link
Member

Odd. Thanks for the context @hoyosjs . I don't think this was introduced by the product change, but looks like a test issue. I will disable the test now.

@danmoseley
Copy link
Member

Could you please sign off on my PR?

@danmoseley
Copy link
Member

On my machine, it takes ~6 seconds.

      <test name="System.Text.Tests.StringBuilderTests.FailureOnLargeString" type="System.Text.Tests.StringBuilderTests" method="FailureOnLargeString" time="6.0387396" result="Pass" />

net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open -- do you know whether this implies release libraries on debug CLR (and thus debug CRT)? Or maybe the problem is the machine happens to be short of physical memory.

I could maybe increase the timeout from 60 seconds to 5 minutes, say.

@benaadams
Copy link
Member Author

Might be extra time due to swapping; can you conditionally disable based on machine memory? (e.g. < 16GB don't run)

@hoyosjs
Copy link
Member

hoyosjs commented Aug 10, 2020

net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open -- do you know whether this implies release libraries on debug CLR (and thus debug CRT)? Or maybe the problem is the machine happens to be short of physical memory.

No, this is Debug libraries on top of retail coreclr (this is a libraries test):
image

The dump is inside VC\Tools\MSVC\14.28.29115\crt\src\x64\memcpy.asm:114:

rep movs byte ptr [rdi], byte ptr [rsi]

with: RCX: 0x00000001C7E1810, RDI: 0x0000021324CB2000, RSI: 0x0000021232502020

The Q/call was

System.Private.CoreLib.dll!System.Buffer._Memmove(byte* dest = 0x0000021252de1030, byte* src = 0x0000021160631050, nuint len = 0x00000000ee6b27e0) Line 499

That seems to be working to me, it's just resource constrained (the pointers add up, the processor is just looping the copy. The dump was taken amidst the operation). The memory load on that machine is 96%, and this process is be far the biggest hog (8GB) and the most memory is ~9.15GB. Not sure if the new PR will really help here.

@hoyosjs hoyosjs added area-System.Runtime disabled-test The test is disabled in source code against the issue and removed blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' untriaged New issue has not been triaged by the area owner labels Aug 10, 2020
@danmoseley
Copy link
Member

Hmm, so what's special about the configurations where it works OK? Do those machines/containers have more RAM?

@hoyosjs
Copy link
Member

hoyosjs commented Aug 10, 2020

I've seen it fail only in two queues: Windows.10.Amd64.Server19H1.ES.Open and windows.10.amd64.serverrs5.open inside a nano server container. @dotnet/dnceng, are these two queues somehow different? I tried to get details on the queue, but I can only see they have a scale limit of 20 for the Spanish queue. However, no information on cores, processor, or memory available in them, and I don't know how the container is initialized nor if they use Job Objects.

@danmoseley
Copy link
Member

Happy to make it conditional on >16GB (it would still run locally) but it would be good to get the answer to your question.

@benaadams
Copy link
Member Author

Happy to make it conditional on >16GB (it would still run locally)

I'd guess >= 16GB would be better 😉

@jakubstilec
Copy link

@hoyosjs checking now

@jakubstilec
Copy link

@hoyosjs both Windows.10.Amd64.Server19H1.ES.Open and windows.10.amd64.serverrs5.open are Standard_D2_v3 - 2CPU, 8GB RAM.
We have another queue Windows.10.Amd64.Server19H1.Open similar to Windows.10.Amd64.Server19H1.ES.Open. Only difference is that it's using a little older version of image and US version. But VMs are the same.

@danmoseley
Copy link
Member

OK I'll make it >=16GB

@danmoseley danmoseley self-assigned this Aug 11, 2020
@danmoseley danmoseley added this to the 6.0.0 milestone Aug 11, 2020
@danmoseley danmoseley modified the milestones: 6.0.0, Future Aug 12, 2021
@danmoseley
Copy link
Member

Looks like I forgot about this. Well, this does not block shipping. Moving to future.

@danmoseley danmoseley removed their assignment Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime disabled-test The test is disabled in source code against the issue
Projects
None yet
Development

No branches or pull requests

5 participants