Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper download.InMemory for downloading files without writing to disk #3693

Merged
merged 4 commits into from
Jun 9, 2023

Conversation

anjannath
Copy link
Member

this helper is put in a separate package pkg/download/inmemory to avoid import cycles which occurs due to the network package being imported in the pkg/download package, the cycle is network->ssh->constants->version as a result we'll not be able to import pkg/download in pkg/crc/version

currently the helper is used for downloading release-info.json file and the signed bundle hashes file sha256sum.txt.sig

@openshift-ci openshift-ci bot requested review from evidolob and praveenkumar June 1, 2023 11:39
@openshift-ci
Copy link

openshift-ci bot commented Jun 1, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from anjannath. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anjannath anjannath requested a review from cfergeau June 1, 2023 11:39
@cfergeau
Copy link
Contributor

cfergeau commented Jun 1, 2023

this helper is put in a separate package pkg/download/inmemory to avoid import cycles which occurs due to the network package being imported in the pkg/download packag

This import cycle already caused issues in the past, I think another way of breaking the cycle is by moving the proxy code (HTTPTransport()) to its own package

pkg/crc/machine/bundle/metadata.go Outdated Show resolved Hide resolved
// Download takes an endpoint in the form of a string (URL) or a *http.Request
// allowing caller to pass a custom http request to the downloader library, it
// takes a http.RoundTripper as the second argument to be used by the client
func Download[T string | *http.Request](endpoint T, transport http.RoundTripper) (io.ReadCloser, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the use of the generics by always setting the user-agent in this method, which makes sense in my opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think in zabbix we ignore requests with http agent crc * from the download data count, with the current use of this (for getting release info and bundle hashes) it should be okay, will remove the type parameters and set the user-agent in the function itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more import cycles, to set the user agent to crc/<crc_version>we have to import the version package in inmemory package and the version package has to import the inmemory package to download the release-info.json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to be consistent imo and always use the same user agent. I guess for now you could add a userAgent string parameter to Download, and a version.UserAgent(), and pass it explicitly. Otherwise this needs more code movement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it functionally change anything? what are you planning to do with the 'useragent'; track it as telemetry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading the crc version file is done with user agent set, downloading the signature file is done without a user agent set, and there are 2 code paths in the download code to deal with both situations.
I don't mind dropping the user agent or always setting it, but it would be better to always do the same thing.

See also #3693 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we request the release-info.json file from the CRC codebase to check if a new version exists, we set the User-Agent for that call as, e.g crc 2.20.0 this was added to make sure that we are not skewing the download counts by our own tool making http requests to the download location.

and in zabbix where this download data is collected filters the requests having user agent crc * so these are not added to the download count

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User-Agent seems is not used to filter out http requests from crc, not sure how i reached that, but it doesn't make sense also as for collecting the download data from splunk we can use a query language and can filter out the release-info.json or the sha256sum.txt.sig files.

@gbraad
Copy link
Contributor

gbraad commented Jun 2, 2023

I think another way of breaking the cycle is by moving the proxy code (HTTPTransport()) to its own package

is this the alternative you prefer or is this something that can ALSO be done.

@cfergeau
Copy link
Contributor

cfergeau commented Jun 2, 2023

I think another way of breaking the cycle is by moving the proxy code (HTTPTransport()) to its own package

is this the alternative you prefer or is this something that can ALSO be done.

That's merely an alternative suggestion which can be ignored or done, no strong feeling. However, it's at least the 3rd time we get import cycles because of HTTPTransport being in the network package (happened to me in some experiments, quite sure Praveen saw that in one of his PRs, but avoided the cycle differently, now Anjan hits it too). So we could try to solve this once and for all.

@anjannath
Copy link
Member Author

i haven't looked at solving this in a better way so not sure how much work it'll be, created #3697 for planning/tracking

Copy link
Contributor

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, there are several mentions of inmemory.Download in the commit logs which no longer are accurate.
You should also mention in the last commit log that this should help with bug xxx (the one with the download timeouts).

@anjannath anjannath force-pushed the bundle-hash branch 3 times, most recently from b4c452f to 33746ad Compare June 7, 2023 10:08
@anjannath
Copy link
Member Author

Looks good to me, there are several mentions of inmemory.Download in the commit logs which no longer are accurate.
You should also mention in the last commit log that this should help with bug xxx (the one with the download timeouts).

Thanks, updated the commit logs!

@anjannath anjannath changed the title Add inmemory.Download for downloading files without writing to disk Add helper DownloadInMemory for downloading files without writing to disk Jun 8, 2023
anjannath added 3 commits June 8, 2023 10:33
… disk

Currently we have separate code that downloads the release-info.json file
and separate code that downloads the signed bundle hashes file, both  the
files are downloaded in memory this helper will unify that code and we'll
use this helper to download both of these files

It sets the http User-Agent to `crc/<version>` which is currently used for
downloading the file release-info.json
as a result of using the helper the User-Agent for the http request
is `crc/<version>`

this was not the case before, but since we were setting the user-agent
for downloading the `release-info.json` file it makes sense to use the
same User-Agent for downloading the file `sha256sum.txt.sig`

this also fixes crc-org#3686
@anjannath
Copy link
Member Author

Changed the function name to download.InMemory, as otherwise, gorevive complains about the name..

Error: pkg/download/download.go:91:6: exported: func name will be used as download.DownloadInMemory by other packages, and that stutters; consider calling this InMemory (revive)
func DownloadInMemory(url, userAgent string, transport http.RoundTripper) (io.ReadCloser, error) {
     ^

@anjannath anjannath changed the title Add helper DownloadInMemory for downloading files without writing to disk Add helper download.InMemory for downloading files without writing to disk Jun 8, 2023
@anjannath anjannath added the has-to-be-in-release This PR need to go in coming release. label Jun 9, 2023
@praveenkumar praveenkumar merged commit ef416c4 into crc-org:main Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-to-be-in-release This PR need to go in coming release.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

CRC Setup is failing with "unable to get verified hash for default bundle"
4 participants