Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

fixes for ARM32 #34559

Merged
merged 2 commits into from
Jan 18, 2019
Merged

fixes for ARM32 #34559

merged 2 commits into from
Jan 18, 2019

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Jan 11, 2019

fixes #33991 [arm32/Linux] System.Net.Http.Functional.Tests failures on arm32 linux

The change has three distinct parts. When I ewas investigating this on machine given to me, there were some tests always failing like this:

* Hostname was NOT found in DNS cache
  *   Trying 127.0.0.1...
  * TCP_NODELAY set
  * Connected to 127.0.0.1 (127.0.0.1) port 50729 (#0)
  > GET /foo HTTP/1.1
  Host: 127.0.0.1:50729
  Accept: */*
  Cookie: cookie1=value1
  Connection: close

  < HTTP/1.1 302 Found
  < Date: Wed, 09 Jan 2019 18:52:40 GMT
  < Content-Length: 0
  < Location: /bar
  < Connection: close
  <
  * Closing connection 0
  * Issue another request to this URL: 'http://127.0.0.1:50729/bar'
  * Hostname was found in DNS cache
  *   Trying 127.0.0.1...
  * TCP_NODELAY set
  * Connection timed out after -39 milliseconds
  * Closing connection 1

Increasing times did not help and we clearly timed out before we should. I also try same tests on same OS/Curl version on x64 and everything was working fine. I think this is bug in Curl as I verified that we pass proper values and types in debugger. I don't know if this is because time_t is 32bit or if this is ARM architecture but when passing in infinite (int.MaxValue) timeout value it somehow overflows. I was ready to disable offending tests but I found out that if I pass infinity - 1s it works as expected. This is clearly workaround for curl but it makes curl work fine on ARM32.

After that I was getting random timeouts when running tests in parallel. They were not specific one and the failure was always timeout. I bumped timeout to 20s and it was OK for inner loop with occasional timeouts when running outerloop. With 30, I did no see any single failure with many runs. There may be better ways how to do that is this clearly seems to be related to host CPU performance.
But for now using the ARCH is easy and solves the problem without counting cortes or trying to compute system computational power.

Last part of this change is improvement to NtAuthServer.
I have seen occasional failures on other platforms as well but on my ARM it was failing consistently. There seems to be race condition in Dispose() when test was skipped and we are still trying to start task. This change allows to swallow two more exception caused by early dispose/late task start and it also increases timeout on Wait()

With this did many successful outerloop runs on ARM32.

  === TEST EXECUTION SUMMARY ===
     System.Net.Http.Functional.Tests  Total: 6601, Errors: 0, Failed: 0, Skipped: 50, Time: 389.613s
  /ssd/toweinfu/wfurt-corefx-arm/src/System.Net.Http/tests/FunctionalTests
  ----- end 01:15:13 ----- exit code 0 ----------------------------------------------------------
  exit code 0 means Exited Successfully

@wfurt wfurt self-assigned this Jan 11, 2019
@wfurt wfurt requested a review from a team January 11, 2019 23:23
catch (InvalidOperationException)
{
// Ignore.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's hiding a symptom. Is there an underlying problem we should be fixing instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I don't think so. We initialize instances when the class loads. And then we skip the test and we immediately hit Dispose() Now, on busy/slow system this happens before the task runs end executes the listener call. That can either lead to timeout, object disposed exception or invalid operation based on exact timing. In that case I saw on ARM32 the actual test was skipped but hit "Error: (not Fail) just loading and unloading the test class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on busy/slow system this happens before the task runs end executes the listener call

What's the stack of the exceptions you're seeing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from my x64 VM Machine

       Condition(s) not met: "IsSelectedSitesTestEnabled"
      [Test Class Cleanup Failure (System.Net.Http.Functional.Tests.NtAuthTests)] System.AggregateException
        System.AggregateException : One or more errors occurred. (Cannot access a disposed object.
        Object name: 'listener'.)
        ---- System.ObjectDisposedException : Cannot access a disposed object.
        Object name: 'listener'.
        Stack Trace:
          /root/coreclr/src/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs(2843,0): at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
          /root/coreclr/src/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs(2714,0): at System.Threading.Tasks.Task.Wait()
          /home/furt/git/wfurt-corefx/src/System.Net.Http/tests/FunctionalTests/NtAuthTests.cs(57,0): at System.Net.Http.Functional.Tests.NtAuthServer.Dispose()
          /home/furt/git/wfurt-corefx/src/System.Net.Http/tests/FunctionalTests/NtAuthTests.cs(95,0): at System.Net.Http.Functional.Tests.NtAuthServers.Dispose()
          ----- Inner Stack Trace -----
          /home/furt/git/wfurt-corefx/src/System.Net.HttpListener/src/System/Net/Managed/ListenerAsyncResult.Managed.cs(197,0): at System.Net.ListenerAsyncResult.GetContext()
          /home/furt/git/wfurt-corefx/src/System.Net.HttpListener/src/System/Net/Managed/HttpListener.Managed.cs(347,0): at System.Net.HttpListener.EndGetContext(IAsyncResult asyncResult)
          /home/furt/git/wfurt-corefx/src/System.Net.HttpListener/src/System/Net/HttpListener.cs(290,0): at System.Net.HttpListener.<>c.<GetContextAsync>b__44_1(IAsyncResult iar)
          /root/coreclr/src/System.Private.CoreLib/src/System/Threading/Tasks/FutureFactory.cs(529,0): at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
          --- End of stack trace from previous location where exception was thrown ---
          /home/furt/git/wfurt-corefx/src/System.Net.Http/tests/FunctionalTests/NtAuthTests.cs(34,0): at System.Net.Http.Functional.Tests.NtAuthServer.<.ctor>b__4_0()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok. GetContext documents both InvalidOperationException and ObjectDisposedException as occurring in cases like this, where the HttpListener may not be listening or has been disposed of already, so... ok. Subsequently it might be nice to try to restructure the tests to avoid this, but I agree your tactical fix is good to go ahead with now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally started on path to add synchronization.
But than it looked like unnecessary complexity for cases when we do not run the test anyway.

@wfurt
Copy link
Member Author

wfurt commented Jan 15, 2019

@dotnet-bot test Packaging All Configurations x64 Debug please

// Set timeout limit on the connect phase.
SetCurlOption(CURLoption.CURLOPT_CONNECTTIMEOUT_MS, int.MaxValue);
// Set timeout limit on the connect phase. curl has bug on ARM so use max - 1s.
SetCurlOption(CURLoption.CURLOPT_CONNECTTIMEOUT_MS, int.MaxValue - 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth opening a bug against curl, even if we have a work around. Their bug tracker is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it. I would need to probably isolate C repo and show behavior with some public server.

@wfurt
Copy link
Member Author

wfurt commented Jan 17, 2019

@dotnet-bot test this please

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@stephentoub stephentoub merged commit 4eccf3f into dotnet:master Jan 18, 2019
@karelz karelz added this to the 3.0 milestone Mar 18, 2019
@wfurt wfurt deleted the http_arm_33991 branch June 15, 2020 18:21
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* arm fixes

* use PassingTestTimeoutMilliseconds instead of hardcoded number


Commit migrated from dotnet/corefx@4eccf3f
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[arm32/Linux] System.Net.Http.Functional.Tests failures on arm32 linux
5 participants