-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http/fetch: Optimise memory consumption #378
Conversation
http/fetch/archive_fetcher.go
Outdated
} | ||
|
||
// Extracts the tar file. | ||
if err = tar.Untar(f, dir, tar.WithMaxUntarSize(-1)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really keen on exposing the MaxUntarSize
as that does not feel like Fetch's responsibility. Forcing an unlimited max untar size is also not good, as that cause issues with highly compressible files. It would be good to get some ideas for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see nothing wrong with having MaxUntarSize
as option here, it's the Fetcher responsibility to download and extract the archive.
The previous `http/fetch` logic would load into memory the tar file, causing large files to increase the likelihood of concurrent reconciliations to cause OOM. The Fetch func downloads a file, and then hashs it content and if the checksum matches, it then goes ahead and extract its contents. The `resp.Body` is not a `io.SeekReader`, which means that to avoid loading the full size of the file into memory, we need to save it into a temporary file, and then load the file to the subsequent operations. With this approach the memory consumption per operation was reduced from 23mb to 2.1mb: ``` Benchmark_Fetch-16 5 227630480 ns/op 23003358 B/op 19511 allocs/op Benchmark_FetchNew-16 5 227570375 ns/op 2106795 B/op 19504 allocs/op ``` The tar size use was 7mb. Expanding on preventing programming, the download process and subsequent operations are short-circuited after a Max Download Size is reached. With a max limit set to 100 bytes, the error message yielded is: `artifact is 7879239 bytes greater than the max download size of 100 bytes` Signed-off-by: Paulo Gomes <paulo.gomes@weave.works>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @pjbgf 🏅
- update fluxcd/pkg/tar to v0.2.0 (fluxcd/pkg#377) - update fluxcd/pkg/http/fetch to v0.2.0 (fluxcd/pkg#378) Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
The previous
http/fetch
logic would load into memory the tar file, causing large files to increase the likelihood of concurrent reconciliations to cause OOM.The Fetch func downloads a file, and then hashws it content and if the checksum matches, it then goes ahead and extract its contents. Given that
resp.Body
is not aio.SeekReader
, to avoid loading the full size of the file into memory, we need to save it into a temporary file, and then use aio.Reader
to read it on the subsequent operations. With this approach the memory consumption per operation was reduced from 23mb to 2.1mb:The tar size used was 7mb.
Expanding on defensive programming, the download process and subsequent operations are short-circuited after a Max Download Size is reached. With a max limit set to 100 bytes, the error message yielded is:
artifact is 7879239 bytes greater than the max download size of 100 bytes
Further details of improvements:
github.com/fluxcd/pkg/http/fetch.(ArchiveFetcher).Fetch
github.com/fluxcd/pkg/http/fetch.(ArchiveFetcher).FetchNew
Should further decrease the likelihood of: fluxcd/kustomize-controller#725
PS: It is important to notice that fetch/untar are not the highest memory consumers on KC, and therefore any further optimisations shall yield only marginal gains.