Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass token for moved files on GitHub #3387

Merged
merged 1 commit into from
May 29, 2021

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented May 28, 2021

Problem

Since the merge of #3384, the bot has been spamming mysterious errors about rate limits:

image

There is also some substantial metadata "churn" as the intermittent (un)availability of files causes properties to toggle back and forth between values or existence and non-existence:

image

Cause

This has happened before, so we knew that these were for files in repos that had been moved or renamed. Previously we addressed this by updating enough of our references to such files to make the errors stop, see KSP-CKAN/NetKAN#8488, KSP-CKAN/NetKAN#8489, KSP-CKAN/NetKAN#8490, KSP-CKAN/NetKAN#8491, KSP-CKAN/NetKAN#8492. However, this time the culprits are all remote version file URLs contained inside of mod downloads; we have submitted pull requests to update them, but it will take a while for enough authors to merge those pull requests and release new versions to stop the error spam.

What we did not understand previously was why redirected URLs were triggering these errors. We thought maybe GitHub inherently penalized redirects. The .NET documentation has finally given us a clue:

https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebrequest.allowautoredirect?view=netframework-4.7.2

The Authorization header is cleared on auto-redirects and HttpWebRequest automatically tries to re-authenticate to the redirected location. In practice, this means that an application can't put custom authentication information into the Authorization header if it is possible to encounter redirection. Instead, the application must implement and register a custom authentication module. The System.Net.AuthenticationManager and related class are used to implement a custom authentication module. The AuthenticationManager.Register method registers a custom authentication module.

When we request a file from a GitHub repo that has been moved or renamed, two HTTPS requests occur:

  1. The initial request of the URL, usually from a netkan or version file; GitHib responds with a 301 status and sets the Location field to a URL that it thinks is better. Our token is used here, so this request counts against the bot's authenticated rate limit (5000 / hour).
  2. WebClient's automatic attempt to retrieve the redirected URL. As per the above quoted paragraph, our token is not used here!!, so this request counts against the bot's unauthenticated rate limit (60 / hour!).

https://docs.github.com/en/rest/reference/rate-limit

So every time the author of an indexed mod changes their user name, or the name of a repo, or moves a repo from one owner to another (like an organization), this makes the bot perform an unauthenticated request, and as soon as 60 of them happen in one hour, we start getting these errors. Presumably a significant fraction of the repositories that we now query after #3384 have been moved or renamed.

Other fix ideas

The .NET documentation quoted above sent me on a wild goose chase, see dotnet/dotnet-api-docs#6778. The System.Net.AuthenticationManager thing is designed for a different use case and doesn't work for us. So we won't be doing that, and hopefully they'll update the docs to save other users that trouble.

On the other hand, obeying permanent redirects is a good idea; once we receive a 301 from GitHub, we could store the before and after URLs in a table and skip the first request the next time. I want to fix the token passing issue separately first though, so we could confirm that it solves the error spam by itself. A future PR can improve handling of permanent redirects.

Changes

Now TimeoutWebClient is renamed RedirectingTimeoutWebClient. MakeDefaultHttpClient is deleted and its functionality is rolled into RedirectingTimeoutWebClient, as is the mimeType stuff.

The GetWebRequest function now sets the user agent and MIME type for every request, to make sure they're available for redirects. Then it generates a request object using the parent class, turns off auto redirect, and sets the timeout.

The GetWebResponse function is overridden and handles redirects by checking the Location header. If set, it calls GetWebRequest and GetWebResponse again to get the redirected location. If the URL's host changes, then the Authorization header is purged so we don't accidentally pass our GitHub API to some random other host.

This should mean that we now always send our token when communicating with the GitHub API, so the error spam should end.

@HebaruSan HebaruSan added Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Pull request Netkan Issues affecting the netkan data Network Issues affecting internet connections of CKAN labels May 28, 2021
@HebaruSan HebaruSan requested a review from DasSkelett May 28, 2021 23:07
@HebaruSan

This comment has been minimized.

@HebaruSan

This comment has been minimized.

@HebaruSan
Copy link
Member Author

I just realized that we can fix a few of the upstream values by updating outdated source code links on SpaceDock; done for these:

@DasSkelett
Copy link
Member

Hm, should we really always execute a redirect if the Location header is given, even if it's not a 3XX status code?
Some badly implemented web server might cause issues with this.

@HebaruSan
Copy link
Member Author

I guess checking the status shouldn't hurt anything...

Copy link
Member

@DasSkelett DasSkelett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you so much! The redirect rate limit mystery (probably) solved after all these years :P
Let's fight back against the notification and commit flood!

@DasSkelett DasSkelett merged commit a01971a into KSP-CKAN:master May 29, 2021
@HebaruSan HebaruSan deleted the fix/redir-auth branch May 29, 2021 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is not working as intended Core (ckan.dll) Issues affecting the core part of CKAN Netkan Issues affecting the netkan data Network Issues affecting internet connections of CKAN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants