Pass token for moved files on GitHub #3387
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Since the merge of #3384, the bot has been spamming mysterious errors about rate limits:
There is also some substantial metadata "churn" as the intermittent (un)availability of files causes properties to toggle back and forth between values or existence and non-existence:
Cause
This has happened before, so we knew that these were for files in repos that had been moved or renamed. Previously we addressed this by updating enough of our references to such files to make the errors stop, see KSP-CKAN/NetKAN#8488, KSP-CKAN/NetKAN#8489, KSP-CKAN/NetKAN#8490, KSP-CKAN/NetKAN#8491, KSP-CKAN/NetKAN#8492. However, this time the culprits are all remote version file URLs contained inside of mod downloads; we have submitted pull requests to update them, but it will take a while for enough authors to merge those pull requests and release new versions to stop the error spam.
What we did not understand previously was why redirected URLs were triggering these errors. We thought maybe GitHub inherently penalized redirects. The .NET documentation has finally given us a clue:
https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebrequest.allowautoredirect?view=netframework-4.7.2
When we request a file from a GitHub repo that has been moved or renamed, two HTTPS requests occur:
WebClient
's automatic attempt to retrieve the redirected URL. As per the above quoted paragraph, our token is not used here!!, so this request counts against the bot's unauthenticated rate limit (60 / hour!).https://docs.github.com/en/rest/reference/rate-limit
So every time the author of an indexed mod changes their user name, or the name of a repo, or moves a repo from one owner to another (like an organization), this makes the bot perform an unauthenticated request, and as soon as 60 of them happen in one hour, we start getting these errors. Presumably a significant fraction of the repositories that we now query after #3384 have been moved or renamed.
Other fix ideas
The .NET documentation quoted above sent me on a wild goose chase, see dotnet/dotnet-api-docs#6778. The
System.Net.AuthenticationManager
thing is designed for a different use case and doesn't work for us. So we won't be doing that, and hopefully they'll update the docs to save other users that trouble.On the other hand, obeying permanent redirects is a good idea; once we receive a 301 from GitHub, we could store the before and after URLs in a table and skip the first request the next time. I want to fix the token passing issue separately first though, so we could confirm that it solves the error spam by itself. A future PR can improve handling of permanent redirects.
Changes
Now
TimeoutWebClient
is renamedRedirectingTimeoutWebClient
.MakeDefaultHttpClient
is deleted and its functionality is rolled intoRedirectingTimeoutWebClient
, as is themimeType
stuff.The
GetWebRequest
function now sets the user agent and MIME type for every request, to make sure they're available for redirects. Then it generates a request object using the parent class, turns off auto redirect, and sets the timeout.The
GetWebResponse
function is overridden and handles redirects by checking theLocation
header. If set, it callsGetWebRequest
andGetWebResponse
again to get the redirected location. If the URL's host changes, then theAuthorization
header is purged so we don't accidentally pass our GitHub API to some random other host.This should mean that we now always send our token when communicating with the GitHub API, so the error spam should end.