-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub::API: Allow passing options to curl #15364
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @samford!
Library/Homebrew/utils/github/api.rb
Outdated
@@ -181,7 +181,7 @@ def self.credentials_error_message(response_headers, needed_scopes) | |||
end | |||
|
|||
def self.open_rest( | |||
url, data: nil, data_binary_path: nil, request_method: nil, scopes: [].freeze, parse_json: true | |||
url, data: nil, data_binary_path: nil, request_method: nil, scopes: [].freeze, parse_json: true, **options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add named, keyword parameters instead of just an untyped Hash like options
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that explicit keyword parameters are necessary to be able to add meaningful type signatures but this is a somewhat complicated situation at the moment:
#curl_with_workarounds
has a gnarly method signature, with an*args
splat (curl arguments like the URL,--location
, etc.), eight keyword parameters, and an**options
double splat (#curl_args
arguments).#curl_args
(which#curl_with_workarounds
calls) has eight keyword parameters (different from#curl_with_workarounds
eight keyword parameters).#curl_output
(which#open_rest
calls), simply uses an**options
double splat to handle the keyword arguments for#curl_with_workarounds
and#curl_args
.- Though I only mentioned
debug
above, what we really need to do in livecheck is to pass**Strategy::DEFAULT_CURL_OPTIONS
to#open_rest
. This is a hash with eight values, five of which are arguments for#curl_with_workarounds
and three are arguments for#curl_args
. - The
GitHub
module doesn't have any type signatures at the moment and there are ten methods using a double splat there.
To avoid using **options
, we would have to add 16 keyword parameters (i.e., the existing parameters for #curl_with_workarounds
and #curl_args
) to any method where we want to be able to change those parameters. We would also then have to explicitly pass those arguments in the #curl_output
call. If we add only the keyword parameters we currently need, this would still be eight and we would have to modify this method again if we ever add any more to DEFAULT_CURL_OPTIONS
. This adds a lot of overhead and wouldn't immediately benefit us because GitHub
(and Utils::Curl
) methods don't have type signatures yet.
In the longer-term, one potential solution may be to create [separate] objects/structs for the keyword arguments for #curl_with_workarounds
and #curl_args
. With that setup, we could expect a specific type (e.g., CurlArgsOptions
) and get rid of the related double splats. However, that's just a thought (I haven't tested the idea) and there may be challenges in making it work.
With that in mind, I agree that this isn't what we should be doing but it feels like the only reasonable way of dealing with this until the situation is improved in Utils::Curl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid using
**options
, we would have to add 16 keyword parameters (i.e., the existing parameters for#curl_with_workarounds
and#curl_args
) to any method where we want to be able to change those parameters.
This seems better for a typing perspective. In this case specifically, though, presumably you don't need to pass all 16? Can you consider just adding the ones you need?
In the longer-term, one potential solution may be to create [separate] objects/structs for the keyword arguments for
#curl_with_workarounds
and#curl_args
.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems better for a typing perspective. In this case specifically, though, presumably you don't need to pass all 16? Can you consider just adding the ones you need?
We currently need eight for livecheck (across #curl_with_workarounds
and #curl_args
). I can add just those but my concern is that it will bloat the #open_rest
method signature and #curl_output
call without buying us anything meaningful until Utils::Curl
and GitHub
have type signatures. I don't think anyone will attempt to type GitHub
until after Utils::Curl
is typed and that won't happen until the Curl
arguments situation is improved. When that happens, we will need to replace either **options
or the explicit keyword parameters in #open_rest
and it's a bit easier to replace **options
.
That said, it's not a huge issue either way, so if you really want to move forward with explicit keyword parameters instead of **options
here, I'll update this accordingly.
Edit: I've updated this to handle the eight options that we currently use in livecheck.
Are all of these options really necessary when interacting with GitHub? I understand we need them for random websites that are failing so requests don't take too long, but GitHub should usually work without timing out, no? |
Basically, yes. Some of them are for controlling output and the others control curl's behavior when a request doesn't work correctly. When everything's working correctly at GitHub, the latter options won't matter. However, if we don't set the |
I just thought that there wouldn't be much of a need for running |
It's more of a concern for runs with multiple formulae/casks, where you may have a mix of working and failing checks. Tap runs involve thousands of checks in our case, so we need to be sure that the limits are appropriate for livecheck, otherwise failing checks would make those runs take even longer than they already do.
While technically possible, what we need in livecheck may not make sense for other areas where I think it makes more sense to simply use the explicit options in livecheck, as that will guarantee that all the curl-based checks are consistent/predictable. |
f6f868c
to
5118faa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me when @reitermarkus comment is addressed!
5118faa
to
7502871
Compare
We previously didn't use I've rebased this and updated it to:
If you would prefer for the |
7502871
to
8cb6260
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better, thanks! One more comment.
release = GitHub.get_latest_release(generated[:username], generated[:repository]) | ||
release = GitHub::API.open_rest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this use GitHub.get_latest_release
? That seems much nicer than using open_rest
here. If it means more arguments need added there: I think make those changes there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's to avoid having to add all the same keyword parameters to #get_latest_release
and pass them into the method's #open_rest
call. As it stands, that method is a very slim wrapper around #open_rest
, where all it does is create a URL from the user
and repo
arguments before calling #open_rest
(we already do this in GithubLatest#generate_input_values
):
def self.get_latest_release(user, repo)
url = "#{API_URL}/repos/#{user}/#{repo}/releases/latest"
API.open_rest(url, request_method: :GET)
end
Assuming we won't use **options
, the required changes would turn the method into something like this:
def self.get_latest_release(
user,
repo,
debug: nil,
verbose: nil,
timeout: nil,
connect_timeout: nil,
max_time: nil,
retries: nil
)
url = "#{API_URL}/repos/#{user}/#{repo}/releases/latest"
curl_options = {
debug: debug,
verbose: verbose,
timeout: timeout,
connect_timeout: connect_timeout,
max_time: max_time,
retries: retries,
}.compact
API.open_rest(url, request_method: :GET, **curl_options)
end
This is the method without a type signature, so it will cover even more lines if/when this is eventually typed.
The method call in GithubLatest#find_versions
would be:
release = GitHub.get_latest_release(
generated[:username],
generated[:repository]),
**Strategy::DEFAULT_CURL_OPTIONS,
)
The only difference is that we would be calling a GitHub
method and the arguments would be different (it's the same number either way).
As mentioned, we already generate the relevant API URL in GithubLatest
, so we don't benefit from having #get_latest_version
generate the URL for us. We would only benefit from it if/when the API URL changes in the future (i.e., GithubLatest
would automatically get any fix in GitHub
) but that seems pretty unlikely and fixing it in GithubLatest
is very easy. We can get most of that benefit by using GitHub::API_URL
in the generated URL string in GithubLatest
and I can push a commit to address that.
[For what it's worth, we generate the URL in the strategy because we need to print it in debug runs and include it in verbose JSON output, so it's not something that we can omit from the strategy and we can't get the URL that was used back from #open_rest
/#github_latest_release
.]
Past that, this would be the only GitHub
method that supports curl options and it would only support the ones that we've explicitly added here. If we ever need more, we would have to modify #open_rest
and update any affected GitHub
methods. So, we would end up with a mishmash of option support that's a challenge to maintain and reason about.
So, while using GitHub#get_latest_release
may feel a little nicer on some level, everything that's required to make it happen feels decidedly not nice. Using #open_rest
allows us to avoid making more of a mess and we don't lose anything meaningful in the process. [The proposed GithubReleases
strategy also uses an #open_rest
call in its #find_versions
method, so using #open_rest
in GithubLatest
isn't that different.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the required changes would turn the method into something like this:
Maybe a stupid question: why does Livecheck need all of these options?
Past that, this would be the only
GitHub
method that supports curl options and it would only support the ones that we've explicitly added here.
This seems better to me. I also think it's worth adding typing as you go along here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a stupid question: why does Livecheck need all of these options?
Always happy to explain if/when it's helpful. Some of the options are explained in comments above the related Strategy
constants but I'll go over them all here, for the sake of completion:
debug: false
:brew livecheck --debug
output is carefully crafted but curl methods print additional output for debugging when the--debug
flag is present. Withoutdebug: false
, we would see thecurl
command in the middle of the debug output when we runbrew livecheck --debug
on a check that uses curl underneath. This is what's currently happening forGithubLatest
checks when--debug
is used and is part of the impetus for this PR.verbose: false
: I believe I originally addedverbose
mostly as a precaution alongsidedebug
, with a similar rationale. Basically, we use--verbose
for--debug
and--json
runs to include more information in the output and I didn't want the flag to causecurl
to print output that we don't want/need (or more than we want/need). However, I did some testing withverbose
removed and didn't see any difference, so I'll update this to drop that option (and we can always add it back if/when there's a confirmed need for it).connect_timeout: CURL_CONNECT_TIMEOUT
(10): The default--connect-timeout
value in curl (the application) can be up to two minutes, so we need to set a more reasonable duration to avoid a lengthy wait when a connection can't be established.max_time: CURL_MAX_TIME
(15): curl (the application) doesn't have a default--max-time
value, so we set a value to ensure curl will time out in a reasonable amount of time.timeout: CURL_PROCESS_TIMEOUT
(20): Thecurl
process will sometimes hang indefinitely (despite setting the--max-time
argument) and it needs to be quit for livecheck to continue.retries: 0
: Retrying a failed request makes sense in critical parts ofbrew
(e.g., downloading a file for installation, etc.) but it's arguably not as useful for livecheck, in my experience. Some servers that livecheck checks are a little flaky and will randomly/periodically time out but an immediate retry is pretty much guaranteed to fail (though they sometimes work if you wait a while to retry or simply get lucky). With that in mind, my thinking is that retrying failed requests in livecheck would typically just be a waste of effort (i.e., unnecessarily using resources on the host and remote server), so we don't bother because we can just run the check again later. If it's a long-term issue, we simply modify the check. If you need evidence to either support or oppose this position I can try to gather data but it's a bit tricky because this retry would happen within curl itself, so we would only be able to see whether retrying potentially allows a check to work that otherwise may not. To get a better sense, I would have to do something like implement some temporary retry logic inUtils::Curl
that we can keep track of and check those statistics. That would probably be left to a follow-up PR, though.
This seems better to me. I also think it's worth adding typing as you go along here.
Modifying #get_latest_release
doesn't feel like a worthwhile tradeoff to me but I've already explained that in detail (and outlined the benefits/detriments), so I'll go ahead and update this accordingly. In the grand scheme of things, this is [theoretically] only a temporary situation, as the added parameters will be replaced if/when we improve the args/options situation in Utils::Curl
methods in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug: false: brew livecheck --debug output is carefully crafted
Without debug: false, we would see the curl command in the middle of the debug output when we run brew livecheck --debug on a check that uses curl underneath.
This feels like a smell to me for this method to behave consistently except when used by livecheck? I think livecheck output should be consistent with the rest of Homebrew rather than have special-case exceptions here.
connect_timeout: CURL_CONNECT_TIMEOUT (10): The default --connect-timeout value in curl (the application) can be up to two minutes, so we need to set a more reasonable duration to avoid a lengthy wait when a connection can't be established.
max_time: CURL_MAX_TIME (15): curl (the application) doesn't have a default --max-time value, so we set a value to ensure curl will time out in a reasonable amount of time.
timeout: CURL_PROCESS_TIMEOUT (20): The curl process will sometimes hang indefinitely (despite setting the --max-time argument) and it needs to be quit for livecheck to continue.
These all seem like things where our current curl
usage can have better defaults rather than special-casing livecheck here.
retries: 0: Retrying a failed request makes sense in critical parts of brew (e.g., downloading a file for installation, etc.) but it's arguably not as useful for livecheck, in my experience.
This seems very sensible, good call 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a smell to me for this method to behave consistently except when used by livecheck? I think livecheck output should be consistent with the rest of Homebrew rather than have special-case exceptions here.
While I understand the general desire for consistency, we do this because the curl debug text wouldn't provide any meaningful benefit for livecheck debugging probably 99+% of the time and it appears right in the middle of livecheck's debug output. For example, this is what we currently see when running brew livecheck --debug
with a GithubLatest
check:
/opt/homebrew/Library/Homebrew/brew.rb (Formulary::FormulaLoader): loading /opt/homebrew/Library/Taps/homebrew/homebrew-core/Formula/adios2.rb
Formula: adios2
Livecheckable?: Yes
URL (stable): https://github.com/ornladios/ADIOS2/archive/v2.9.0.tar.gz
Strategy: GithubLatest
/usr/bin/env /opt/homebrew/Library/Homebrew/shims/shared/curl --disable --cookie /dev/null --globoff --show-error --user-agent Homebrew/4.0.17-65-g082017e\ \(Macintosh\;\ arm64\ Mac\ OS\ X\ 13.3.1\)\ curl/7.87.0 --header Accept-Language:\ en --retry 3 --location https://api.github.com/repos/ornladios/ADIOS2/releases/latest --header Accept:\ application/vnd.github\+json --write-out '
'\%\{http_code\} --header Authorization:\ token\ ****** --header X-GitHub-Api-Version:2022-11-28 --dump-header /private/tmp/github_api_headers20230511-56768-1ig1ga4
URL (strategy): https://api.github.com/repos/ornladios/ADIOS2/releases/latest
Regex (strategy): /v?(\d+(?:\.\d+)+)/i
Matched Versions:
2.9.0
adios2: 2.9.0 ==> 2.9.0
The desired output is:
/opt/homebrew/Library/Homebrew/brew.rb (Formulary::FormulaLoader): loading /opt/homebrew/Library/Taps/homebrew/homebrew-core/Formula/adios2.rb
Formula: adios2
Livecheckable?: Yes
URL (stable): https://github.com/ornladios/ADIOS2/archive/v2.9.0.tar.gz
Strategy: GithubLatest
URL (strategy): https://api.github.com/repos/ornladios/ADIOS2/releases/latest
Regex (strategy): /v?(\d+(?:\.\d+)+)/i
Matched Versions:
2.9.0
adios2: 2.9.0 ==> 2.9.0
There have been a handful of times over the years where seeing the raw curl command is a little helpful (though not essential) but it's not worth having at the expense of livecheck's default debug output. I could maybe understand having it available for --debug --verbose
(where we already adapt the debug output to include a little more information, like printing the underlying Version
objects for matched versions instead of only the text). However, it would come at the expense of readability, one way or another.
Since we can't control when/how debug output is printed in Utils::Curl
(e.g., we can't collect that output and print it in a more appropriate location), the only way to make that possible with the current setup would be to not print any output for a given check until after it completes. In that situation, the curl output would appear before the "URL", "Strategy", etc. output.
That's a poor experience for users because they wouldn't know what check is being run until after it's done. The curl output would provide an indication that the check is being run but it's very dense and identifying the URL in that text is a chore. Additionally, that would only be the case for checks that use curl. Some checks (e.g., Git
) would show nothing until after the check completes.
The reason why we don't do this is because printing information as soon as it's available provides feedback to the user that livecheck is working (and curl output is just noise the vast majority of the time). It's technically possible to print information as we go while also printing the curl output but it would require a fundamental reworking of the debug output and I think it would end up being more verbose and take more effort to visually parse compared to the current formatting.
These all seem like things where our current
curl
usage can have better defaults rather than special-casing livecheck here.
What makes that tricky is Utils::Curl
has to support a variety of use cases, which sometimes have opposing requirements. We may be able to have somewhat tighter defaults in Curl
methods but they would likely be higher than the values that we're using in livecheck.
Kind of like the retry situation, having longers timeouts/max values may make sense for critical parts of brew (installation, etc.), where it's preferable to wait and potentially succeed rather than fail in a shorter amount of time. The values for livecheck are oriented towards not having any single check hold up the run for too long (a homebrew/core tap run already takes around an hour as it is).
So, while I agree that there may be some value in investigating whether we can have better details in Utils::Curl
, livecheck will almost certainly still have to set different values that are appropriate for its specific context. That said, I'm not familiar enough with other parts of brew to say what would be good default values, so it would probably be good to create an issue and ping Homebrew/brew for ideas if we want to pursue it outside of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this is what we currently see when running brew livecheck --debug with a GithubLatest check:
I think it's desirable/better to see the curl
call there, albeit with better whitespace usage.
That's a poor experience for users because they wouldn't know what check is being run until after it's done.
Users aren't using --debug
for typical usage. Non-contributor users aren't really using livecheck
for that matter, either.
and curl output is just noise the vast majority of the time
This is the case of most debug output most of the time. It's not useful until it is.
We may be able to have somewhat tighter defaults in Curl methods but they would likely be higher than the values that we're using in livecheck.
This seems fine.
a homebrew/core tap run already takes around an hour as it is
I don't think this is worth optimising for. It's unclear that anyone except you is doing this.
I'm not familiar enough with other parts of brew to say what would be good default values, so it would probably be good to create an issue and ping Homebrew/brew for ideas if we want to pursue it outside of this PR.
I am familiar: let's talk about it here and bring in @Homebrew/brew for more thoughts.
`#page_headers` and `#page_content` are both now using `#curl_output` either directly or indirectly, so the default values for the `print_stdout` and `print_stderr` options should align with the existing values in `DEFAULT_CURL_OPTIONS` and can now be omitted. I ran some tests with `verbose` removed and it doesn't *appear* to have any visible effect. I can't remember if I originally set it in response to a situation that we experience or if it was a precaution (after experiencing issues with `debug`) but I think it was only the latter. If this removal causes issues, we can always add it back.
This commit makes it possible to pass additional curl options to `GitHub#get_latest_release` and `GitHub::API#open_rest`. These options are passed on to `#curl_output` and `#curl_args` in turn. This is only the subset of `#curl_with_workarounds` and `#curl_args` options that `livecheck` currently uses. This is intended to ensure that curl behaves in a way that's appropriate for livecheck. For example, `brew livecheck --debug` output is carefully crafted but `Utils::Curl` methods print additional debug output in the middle unless we override it with `debug: false`. In this scenario, we need to be able to pass `debug: false` to `#curl_options` to prevent its debug output from printing and that's one use case that this supports.
Passing `DEFAULT_CURL_OPTIONS` to `#get_latest_release` ensures that curl operates in a fashion that's appropriate for livecheck. The most visible effect is that this will prevent `#curl_output` from printing the command in the middle of carefully-crafted `brew livecheck --debug` output.
This will ensure that the strategy automatically inherits any change to `API_URL` in the future. This currently doesn't make a functional difference (i.e., the URL we actually use is created in `GitHub#get_latest_release`) but it's still an improvement.
669b44a
to
bb78b3c
Compare
The latest push:
I tested this again and livecheck continues to work as expected. For what it's worth, I created type signatures for some additional |
This adds type signatures to `#get_latest_release` and `#open_rest`. I'm not deeply familiar with these methods but this is what I found when looking at current usage throughout brew and confirmed by running `brew typecheck` until all the issues were resolved. This also includes changes in `GitHub::API` methods to address areas where arguments differed from the expected type.
bb78b3c
to
2a51ff0
Compare
Thanks @samford, looking forward to it! |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?This PR makes it possible to pass additional options to
GitHub::API#open_rest
, which are passed on to#curl_output
(and also to#curl_args
as a result).For example, we carefully control debug output in livecheck but sometimes we call methods from other parts of Homebrew that call a
Utils::Curl
method and the curl debug output is printed in the middle of livecheck's debug output. In this scenario, we need to be able to passdebug: false
to#curl_options
to prevent its debug output from printing and that's one use case that this supports.With that in mind, this is primarily intended for use in #15270 and #15260. We'll have to switch to using a direct
#open_rest
call inGithubLatest
(instead ofGitHub::get_latest_release
) but the strategy already generates the appropriate API URL, so it's a very easy change.For what it's worth, it's technically possible to add
**options
to some of theGitHub
methods and pass them through to the#open_rest
call. However, 10 out of ~30 are already using the double splat for another purpose (e.g.,issues(repo:, **filters)
,search_issues(query, **qualifiers)
, etc.). Since it's not a change that can be uniformly rolled out to all theGitHub
methods, I decided to keep this PR simple for now.