Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Leech) Logs file type in gcs does not match file extension. #141

Open
pdewilde opened this issue Sep 28, 2023 · 6 comments
Open

(Leech) Logs file type in gcs does not match file extension. #141

pdewilde opened this issue Sep 28, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@pdewilde
Copy link
Contributor

pdewilde commented Sep 28, 2023

TL;DR

Logs are being saved in gcs with the file extension .tar.gz, but the archives are actually zip files. The file extension should either be updated to .zip or archives should be compressed to .tar.gz format and existing files should be re-compressed.

Observed behavior

$ gunzip  logs.tar.gz 
gzip: logs.tar.gz has more than one entry -- unchanged

$ file logs.tar.gz 
logs.tar.gz: Zip archive data, at least v2.0 to extract, compression method=deflate

$ unzip logs.tar.gz
Archive:  logs.tar.gz
@pdewilde pdewilde added the bug Something isn't working label Sep 28, 2023
@pdewilde pdewilde changed the title (Leech) Logs file extension in gcs does not match file extension. (Leech) Logs file type in gcs does not match file extension. Sep 28, 2023
@pdewilde
Copy link
Contributor Author

gcsPath := fmt.Sprintf("gs://%s/%s/%s/artifacts.tar.gz", f.LogsBucketName, event.RepositorySlug, event.DeliveryID)

Seems like we hardcoded the gcs filename and therefore ignore the extension of the log file we download from GitHub

@pdewilde
Copy link
Contributor Author

pdewilde commented Nov 2, 2023

Seems like there may be a bit more complication than I thought. We say we will accept "application/vnd.github+json", but unless we request the gzip encoding, my understanding is that the body should be transparently uncompressed by the http client.

There are a few options I need to look into:

  1. Specifying the content type means that the go http lib assumes whatever I get back is what I want, even if the response headers specify a zip transport encoding.
  2. Somehow we are compressing via zip during the upload process.
  3. GCS is compressing for us, but not in a transparent way.

I'll need to get some github credentials to reproduce the actual http requests locally to figure out what exactly is going on.

@pdewilde
Copy link
Contributor Author

pdewilde commented Nov 2, 2023

@sethvargo
Copy link
Contributor

GCS will apply gzip compression for transit if the client accepts it. Here's an example of writing a tgz object to GCS.

@pdewilde
Copy link
Contributor Author

pdewilde commented Nov 2, 2023

OK, then I'm suspecting that its the body from the GitHub api that is getting zipped but its not getting unzipped by the http client for some reason, I'll have to take a closer look.

I wouldn't expect that as the content-type we said we accepted was a json type, not application/zip

@sethvargo
Copy link
Contributor

It's pretty nuanced, but https://cloud.google.com/storage/docs/transcoding. Content-Encoding is probably more relevant here. Similarly, Accept and Accept-Encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants