-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instances showing 502s #94
Comments
Okay, so this was quite nasty. I looked up the most recent copy in the CDX index, and grabbed the WARC records from the WARC Server (this should probably be a helper Juypter notebook, as it's pretty straightforward calls to internal APIs!). CDX: Visit http://cdx.api.wa.bl.uk/data-heritrix?url=https%3A%2F%2Fwww.mind.org.uk&sort=reverse&limit=1
WARC: Take filename and offset/length (55649806/23663). Convert to range start-end (55649806 to 55649806+23663-1, i.e. 55649806-55673468. Run a range request, like this:
That showed that we do have proper 200 content for that URL. It also indicated that there was an extremely large cookie in the original response.
So, the suspicion fell on such large headers breaking some buffers or limits. But where? These requests go through a few proxies.... The 'raw' service is available at port Running
After quite a lot of guesswork and experimentation, where just upping the overall buffer size wasn't helping, I discovered an (AFAICT) undocumented configuration parameter However, then the layers of proxies still were thowing errors. In this case, it seems the initial proxy needs:
and the front-end NGINX proxy needed to match this (128k was not enough!):
So these same changes need rolling out, along with a new This was such a nightmare that I thing we should open an issue in |
Note that, having rolled out these changes on the DEV system, this now works: https://dev.webarchive.org.uk/wayback/archive/20220720112219/https://www.mind.org.uk/ But at the time of writing, prod is broken: https://www.webarchive.org.uk/act/wayback/archive/20220720112219/https:/www.mind.org.uk/ |
Thanks for finding the solution, Andy, and thank you for the detailed breakdown. These steps are really helpful, another thing I can try when investigating issues |
I was hoping you'd find that useful! Under ukwa/ukwa-services#100 I've worked through better tracing of these kind of things, and BETA now works too. Still need to roll out to PROD. |
Rolled out now. |
Gah, of course, need to do the same for QA Wayback.... e.g. https://www.webarchive.org.uk/act/wayback/archive/20220723104224/https://www.mind.org.uk/ |
Changes in be607d4 mean https://dev.webarchive.org.uk/act/wayback/archive/20220723104224/https://www.mind.org.uk/ now works. Needs rolling out by @GilHoggarth to BETA and then PROD |
Rolled out ukwa-services/ingest/w3act master onto beta swarm. |
Tagged the code in ingest/w3act and released onto production. |
This relates to www.mind.org.uk captures: https://www.webarchive.org.uk/act/wayback/archive/*/http://www.mind.org.uk/
The text was updated successfully, but these errors were encountered: