Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce GitHub URLs into the authenticated API in Netkan #2946

Merged
merged 1 commit into from
Dec 17, 2019

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Dec 17, 2019

Problem

The bot intermittently reports "GitHub API rate limit exceeded" for various modules, most commonly Telemachus, but also recently many of the KerboKatz utilities and the Half/QuarterRSS textures. This is a custom error message created in #2904 that indicates the server headers include X-RateLimit-Remaining: 0.

Cause

As @techman83 noted in #2945, the rate limit API reported 0 remaining unauthenticated API usages (but we are nowhere near the limit for authenticated usages):

{
  "resources": {
    "core": {
      "limit": 60,
      "remaining": 0,
      "reset": 1576563098
    },
    "search": {
      "limit": 10,
      "remaining": 10,
      "reset": 1576561641
    },
    "graphql": {
      "limit": 0,
      "remaining": 0,
      "reset": 1576565181
    },
    "integration_manifest": {
      "limit": 5000,
      "remaining": 5000,
      "reset": 1576565181
    }
  },
  "rate": {
    "limit": 60,
    "remaining": 0,
    "reset": 1576563098
  }
}

This means that when this error occurs, the bot has accessed too many GitHub URLs without using a token. Normally these would consist of:

  • Metanetkans (121 of 122 hosted on GitHub)
  • KSP-AVC remote version files (many hosted on GitHub, others on ksp-avc.cybutek.net)
  • KSP-AVC version file used as a kref (not hosted on GitHub because we would just use a github kref)

Unlike krefs where we construct GitHub API calls from other known properties, these URLs are read out of string properties that can contain any full URL, but often one from GitHub. Up to now we only use generic code to retrieve the URL normally, without any special processing.

Changes

Now when Netkan retrieves such URLs, it checks for the known GItHub hosts and URL patterns. If we find a match, we retrieve the file via the API instead. If we have a token, it will be used. This will reduce the number of unauthenticated GitHub calls we make.

Note that the core and clients are functionally unchanged and will still access all URLs the same way as before. We're just updating Netkan.

Fixes #2945.

@HebaruSan HebaruSan added Enhancement Pull request Netkan Issues affecting the netkan data Network Issues affecting internet connections of CKAN labels Dec 17, 2019
string ghRepo = null;
string ghBranch = null;
string ghPath = null;
if (TryGetGitHubPath(url, out ghOwner, out ghRepo, out ghBranch, out ghPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to save some lines, you can also do inline declaration:

Suggested change
if (TryGetGitHubPath(url, out ghOwner, out ghRepo, out ghBranch, out ghPath))
if (TryGetGitHubPath(url, out string ghOwner, out string ghRepo, out string ghBranch, out string ghPath))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice language feature!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is!

@DasSkelett
Copy link
Member

I don't understand the GitHub /rate_limit endpoint. It tells me two different values in the header and in the body:

$ curl -v -H "Authorization: token <token>" "https://api.github.com/rate_limit"

< X-RateLimit-Limit: 5000
< X-RateLimit-Remaining: 4964
< X-RateLimit-Reset: 1576623587

"rate": {
"limit": 5000,
"remaining": 5000,
"reset": 1576625285
}

Copy link
Member

@DasSkelett DasSkelett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But works fine. The unauthorized request pool is no longer touched as long as an API token is provided, neither for AVC vrefs nor metanetkans.

@HebaruSan HebaruSan merged commit 2bbfce6 into KSP-CKAN:master Dec 17, 2019
@techman83
Copy link
Member

Brilliant! We can put a check into our scheduler and skip runs that are below X remaining API calls.

@HebaruSan HebaruSan deleted the feature/avc-metanetkan-gh-api branch December 18, 2019 03:44
@HebaruSan
Copy link
Member Author

Hmm, there was no way this was ever going to help with "GitHub API rate limit exceeded", because that error message is only raised for (authenticated) API calls already!

@techman83
Copy link
Member

Is that correct? Whenever I've checked it was only our un-authenticated calls that were exhausted, not authenticated. Unless we're hitting some other throttling?

@HebaruSan
Copy link
Member Author

Yup, the error message is in GitHubApi.Call, which is specifically for the API and always passes the token:

private string Call(string path, string mimeType = null)
{
var url = new Uri(ApiBase, path);
Log.DebugFormat("Calling {0}", url);
try
{
return _http.DownloadText(url, _oauthToken, mimeType);
}
catch (NativeAndCurlDownloadFailedKraken k)
{
if (k.responseStatus == 403 && k.responseHeader.Contains("X-RateLimit-Remaining: 0"))
{
throw new Kraken($"GitHub API rate limit exceeded: {path}");
}
throw;
}
}

Other throttling is quite possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Netkan Issues affecting the netkan data Network Issues affecting internet connections of CKAN Pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] NetKAN being rate limited by unauthenticated requests
3 participants