-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github badges are intermittently inaccessible #1245
Comments
cc @espadrine |
The badges are working again, and I think our "main" rate limit just reset:
Generated with https://github.com/paulmelnikow/github-limited |
Now intermittently broken, though plenty of rate limit left.
Here's an example: https://img.shields.io/github/tag/expressjs/express.svg |
I sent this to @espadrine about an hour ago:
I like getting to the bottom of things, and want to fix this, however my options are limited. |
I set up a status page: https://status.shields-server.com/ It runs a static badge, the Github license badge, and the npm license badge, and for each one, looks for some of the expected right text. I'm happy to cover the cost for a couple months ($5.50) but it might be good to migrate to something else soon. When I created shields-server.com, I set up cnames for s0.shields-server.com, s1.shields-server.com, s3.shields-server.com, though it'd be better to make these subdomains of shields.io and dump the extra domain. |
Nice page, Should help get some insight on what's going wrong, |
Yea, thanks, it should help. The code on the three servers should be the same. There are interesting patterns in the downtime: https://status.shields-server.com/779605524 The three servers had correlated downtime around 15:30 (that’s NY time). One of them also had downtime an hour earlier, around 14:30. Two had downtime around 13:33 / 13:43. The duration of the downtime varies from server to server. For example, s0 was down from 15:28 to 15:49, s1 from 15:30 to 15:36, and s2 from 15:32 to 15:48. Correlated downtime suggests there is some shared state, pointing to rate limit exhaustion as a factor. Downtime about an hour apart might correlate with rate limit resets. The skew in recovery time might be explained by caching, though there might be other explanations too. |
Yeah it's quite strange that the down times are very similar, would setting a very low max-age help with possible caching issues? |
As far as I can tell, |
I just wanted to clarify that the caching I think might be involved is the Shields internal vendor cache in Interesting that we're still seeing hourly downtime, though less correlated between servers. I wonder if it is related to hours since uptime. |
Still seems to be failing ~20% of the time, |
Just bumping to say I experience non-loading badges for several days now. Every other refresh I get |
Indeed, this has happened with a good chunk of requests over the last few days. https://status.shields-server.com/ Things have been much worse the last 22 hours because of #1263, which is unrelated service-provider downtime that took out one of our servers. |
Good to know. That explains why the stats for s1 are sometimes slightly worse. |
To re-summarize:
I just emailed this plan:
|
@paulmelnikow I was checking the links: https://img.shields.io/codecov/c/github/bragful/ephp.svg They are working but they take too much time to load (around 15 seconds). Github works using a proxy to retrieve this kind of images so, the error retrieved by the browser is a 504 (Gateway timeout). Did you check the amount of requests your system is receiving to generate the badges? If I can help you with something just let me know. |
@manuel-rubio Yea, that's unfortunate. See #1263. |
While working on
I found the issue. It's a dumb thing I introduced in #1118. Fixed in #1266. AFAICT production has been running with anonymous quota. I'm shocked this has been working as well as it has. Admittedly, not that well, though I'd have expected what we have to work for the first few seconds of every hour. Either the server is using a different github secret from the one I expect, or as likely, the Shields IPs do indeed have special treatment from GitHub. I'm still eager to get the new code shipped, as it has a lot more tests. And of course #1263 remains an issue. |
Opened #1267 with an auth debug endpoint + logging. |
I didn't really answer this question @manuel-rubio! There are four ways you can help:
|
I'm not sure whether this is due to one of the recent changes…
#1142 #1117 #1195 #1186 #1118
…or simply #1119, which is a bug that causes a token to erroneously be considered exhausted once it's used for a search request.
People can't add new tokens either (#1243), exacerbating this slightly, but that will be fixed in #1038.
The first report was roughly 16 hours after deploy.
The text was updated successfully, but these errors were encountered: