Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strategy#page_content: allow cURL to --fail-with-body #16741

Merged
merged 2 commits into from
Feb 24, 2024

Conversation

mavenor
Copy link
Contributor

@mavenor mavenor commented Feb 23, 2024

TL;DR

livecheck for sites behind DDoS protection is broken. We need an error code from cURL for #11517 to work; #15351 seems to have broken that. This explicitly adds --fail-with-body to the PAGE_CONTENT_CURL_ARGS so that strategy#page_content calls return an error code, but sidestep potentially misguiding HTTP codes from… confused servers (see this part of @samford’s comment on #15437).


  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example. No. Should I? 😅 (I’d need help coming up with them)
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

Huh?

  • Try getting page content with both headers #11517 addressed the issue of livecheck failing for some sites that are protected by e.g. Cloudflare, by having us try the URL again with a fake Safari-like UA string. ([:default, :browser].each do)
  • At the time, strategy#page_content and strategy#page_headers used the (relatively) raw curl_with_workarounds function, and not curl_output (introduced in Use curl_headers and curl_output for Livecheck strategies. #15351).
  • curl_output by default doesn’t pass --fail unlike its incumbent curl_with_workarounds: I’m guessing this is since --fail omits printing any body received at all — I’m led to believe from the struck-out portion of @samford’s comments on Strategy: Pass --max-redirs to #curl_headers #15437 that this is undesirable for some servers, which may provide useful content despite a 400+ HTTP status code. (While the struck-out portion was, in fact, inapplicable to page_headers — which the PR is all about, it seems like it might have an impact on page_content?)*
    strategy#page_headers doesn’t face this issue at all, since curl_headers doesn’t set show_output, and in turn, the —-fail flag is passed to cURL.
  • Not passing any variant of --fail[...] means that 400+ error codes don’t trigger non-zero exit codes. From curl(1),

22: HTTP page not retrieved. The requested URL was not found or returned another error with the HTTP error code being 400 or above. This return code only appears if -f, --fail is used.

  • --fail-with-body allows cURL to exit with an error code, while still printing whatever body it received; the servers that don’t play nice that @samford spoke of should still be fetched from, while livecheck will at least try again on error, which it doesn’t do right now.

🧪?

I’ve tested this with vitalsource-bookshelf (which brought me to it in the first place). On forcing verbose: true in the curl_output call, I noticed that only a single cURL call was made, despite a 403: status.success? was returning true when it shouldn’t! Unfortunately, I couldn’t test this elsewhere, as none of the original examples in #11517 hold good anymore; e.g., camo-studio’s URL seems perfectly happy responding to cURL with Homebrew’s :default UA (e.g. Homebrew/4.2.9-92-gd0a3f09-dirty...), keycue doesn’t seem to mind receiving no UA at all, and the other two casks were removed.
I also ran grep -r ':fake' <cask-tap> and tried a few random samples, but everything I tried fell under the above two categories. TL;DR I’m unsure how to properly test this out 🥲

*If that isn’t the case, we could probably consider simply adding --fail

I feel silly, like I’m literally making a huge deal of a single line of code, so if you’ve reached here, thanks for bearing with my rant xD. I want to help, not be a burden, so I did as much digging as I could to compensate for me being new and my lack of Ruby skill. I hope this is, indeed, helpful like I intended it to be

since `curl_output` (introduced in Homebrew#15351) sets `show_output: true`,
it doesn’t let it unless some form of --fail[...] is passed explicitly
@samford
Copy link
Member

samford commented Feb 23, 2024

This appears to work as expected with the proposed vitalsource-bookshelf livecheck block but I just need to make sure that this doesn't negatively affect any existing checks before approving. I'm currently doing comparative runs (with/without the --fail-with-body flag) across homebrew/core and homebrew/cask to confirm and I'll come back to review this afterward.

@mavenor
Copy link
Contributor Author

mavenor commented Feb 24, 2024

Thanks @samford! 😁

Copy link
Member

@samford samford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished doing comparative runs across homebrew/core and homebrew/cask and this looks like a beneficial change overall. I'm happy to merge after we work out the comment language.

As you mentioned in the PR description, this allows a few checks to benefit from the user agent fallback logic and I saw that in a few checks. I didn't see any new failures that weren't already occurring without the --fail-with-body option.

The other difference was that livecheck surfaces a better error (from curl) in some cases where we were simply returning an Unable to get versions error before. The error from curl includes the HTTP response code, which gives us an immediate idea why the check is failing. Since --fail-with-body continues to return the body content like normal, we can benefit from the better error messages while maintaining livecheck's existing behavior.

The differences I observed are below. [Note that I was using a VPN and some of these checks only failed for that reason.]

homebrew/core

Fixed by change

acpica: needs a :browser user agent to work (Unable to get versions error before)

Same failure but better error

ansifilter: curl: (22) The requested URL returned error: 502
highlight: curl: (22) The requested URL returned error: 502
libsmi: curl: (22) The requested URL returned error: 403

homebrew/cask

Fixed by change

jt-bridge: needs a :browser user agent to work (555 response before, with "555 Security Incident Detected" in body HTML content)
outguess: needs a :browser user agent to work (403 (Forbidden) response before)

Same failure but better error

elasticwolf: curl: (22) The requested URL returned error: 404
fing-cli: curl: (22) The requested URL returned error: 404
firestorm: curl: (22) The requested URL returned error: 403
gamemaker: curl: (22) The requested URL returned error: 403
grids: curl: (22) The requested URL returned error: 403
opal-composer: curl: (22) The requested URL returned error: 403
preference-manager: curl: (22) The requested URL returned error: 403

Library/Homebrew/livecheck/strategy.rb Outdated Show resolved Hide resolved
@samford is working on the ability to specify custom UA (among other things)
in livecheck blocks; "retrying" will cease to be relevant

Co-authored-by: Sam Ford <1584702+samford@users.noreply.github.com>
@mavenor
Copy link
Contributor Author

mavenor commented Feb 24, 2024

Great! I’m glad I brought about something positive (however small 😂). Thanks for doing those long comparative runs across all of core and cask! xD
I’ve made the comment language changes you brought up 👍🏼

Copy link
Member

@samford samford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mavenor!

@samford samford merged commit e9cb65b into Homebrew:master Feb 24, 2024
26 checks passed
@mavenor mavenor deleted the fix-livecheck-page-content branch February 24, 2024 21:28
@MikeMcQuaid
Copy link
Member

Thanks @mavenor, great work here!

@mavenor
Copy link
Contributor Author

mavenor commented Feb 26, 2024

@MikeMcQuaid my pleasure! Shout out to y'all for making Homebrew such a welcoming place for prospective contributors xD (and also for the mere existence of Homebrew lol)

@github-actions github-actions bot added the outdated PR was locked due to age label Mar 29, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants