Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Fedora 23 (Docker), dotnet restore fails with timeouts for dotnet/core-setup #3190

Closed
dagood opened this issue Jul 21, 2016 · 3 comments
Closed

Comments

@dagood
Copy link

dagood commented Jul 21, 2016

When running build.sh for core-setup in a Fedora 23 container, it fails during restore with timeouts like these:

log  : Failed to download package from 'https://dotnetmyget.blob.core.windows.net/artifacts/dotnet-core/nuget/v3/flatcontainer/system.collections/4.0.11-rc2-23931/system.collections.4.0.11-rc2-23931.nupkg'.
log  : The HTTP request to 'GET https://dotnetmyget.blob.core.windows.net/artifacts/dotnet-core/nuget/v3/flatcontainer/system.collections/4.0.11-rc2-23931/system.collections.4.0.11-rc2-23931.nupkg' has timed out after 100000ms.
log  : Retrying 'FindPackagesByIdAsync' for source 'https://dotnetmyget.blob.core.windows.net/artifacts/dotnet-core/nuget/v3/flatcontainer/microsoft.codeanalysis.analyzers/index.json'.
log  : The HTTP request to 'GET https://dotnetmyget.blob.core.windows.net/artifacts/dotnet-core/nuget/v3/flatcontainer/microsoft.codeanalysis.analyzers/index.json' has timed out after 100000ms.
log  : Retrying 'FindPackagesByIdAsync' for source 'https://api.nuget.org/v3-flatcontainer/microsoft.codeanalysis.analyzers/index.json'.
log  : The HTTP request to 'GET https://api.nuget.org/v3-flatcontainer/microsoft.codeanalysis.analyzers/index.json' has timed out after 100000ms.
log  : Failed to download package from 'https://dotnetmyget.blob.core.windows.net/artifacts/dotnet-core/nuget/v3/flatcontainer/system.diagnostics.debug/4.0.11-rc2-23931/system.diagnostics.debug.4.0.11-rc2-23931.nupkg'.
log : The HTTP request to 'GET https://dotnetmyget.blob.core.windows.net/artifacts/dotnet-core/nuget/v3/flatcontainer/system.diagnostics.debug/4.0.11-rc2-23931/system.diagnostics.debug.4.0.11-rc2-23931

I've set up a somewhat minimal repro Dockerfile. This is based on what happens when you call build.sh in core-setup. I first repro'd the issue by rerunning build.sh on a core-setup Linux build machine running the Fedora container, then I created this repro Dockerfile on OSX. It seems to consistently fail.

Here are some logs for build.sh runs where I tweaked the restore command to have --verbosity debug. At the end are stack traces:

Something that looks odd to me is (from the disable-parallel log):

(Where {url} = https://dotnetmyget.blob.core.windows.net/artifacts/dotnet-core/nuget/v3/flatcontainer/system.buffers/4.0.0-rc2-24008/system.buffers.4.0.0-rc2-24008.nupkg)
Line 3044: info :   GET {url}
Line 3395: info :   OK {url} 78703ms
Line 3509: log  : Failed to download package from '{url}'.
Line 3510: log  : The download of '{url}' timed out because no data was received for 60000ms.

That 78s is between the 60s retry timeout and a 100s "general" timeout I saw a few other places in the log. I reproduced a 70s delay on Windows with Fiddler and it wasn't a problem, though. Could be a coincidence.

Env info:

dotnet --version = 1.0.0-preview2-003121
Also confirmed with 1.0.0-preview3-003150 with and without --disable-parallel.

@dagood
Copy link
Author

dagood commented Jul 21, 2016

Looks similar to #2945 to me, any ideas @joelverhagen?

@dagood
Copy link
Author

dagood commented Jul 22, 2016

Using NuGet.CommandLine.XPlat 3.5.0-rc1-1639 with --disable-parallel works. It looks like the network really was just timing out enough times to fail, and reducing HTTP concurrency to 1 (the CLI's older NuGet dependency doesn't/can't do this) makes each individual request go fast enough.

I may have figured out how OK {url} can take 78s but Failed to download says a timeout happened at 60s. It looks like the OK is just for the initial response, and downloading the contents takes some additional time--and that's what's timing out at 60s. (https://github.com/NuGet/NuGet.Client/blob/91abff5/src/NuGet.Core/NuGet.Protocol.Core.v3/HttpSource/HttpRetryHandler.cs#L89-L105) I think the 100s timeouts I was seeing were for the OKs.

I was overthinking it--this issue is identical to #2945. (Closing.)

@dagood dagood closed this as completed Jul 22, 2016
@joelverhagen
Copy link
Member

Using NuGet.CommandLine.XPlat 3.5.0-rc1-1639 with --disable-parallel works.

Yes, this was going to be my first suggestion. We have seen various platforms have issues with high HTTP parallelism. We plan to in the future allow an arbitrary level of HTTP parallelism.

I may have figured out how OK {url} can take 78s but Failed to download says a timeout happened at 60s. It looks like the OK is just for the initial response, and downloading the contents takes some additional time--and that's what's timing out at 60s.

This is a correct assessment. We wait 100 seconds for HTTP headers to come back (source) before timing out. During the download of the content body, we fail if we can't fill our buffer within 60 seconds (source).

This behavior differs slightly in nuget push because you can specify the timeout as a command line argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants