Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race issue on copyFn when there are multiple containers or initContainers #422

Merged
merged 3 commits into from
Jan 26, 2023
Merged

Fix data race issue on copyFn when there are multiple containers or initContainers #422

merged 3 commits into from
Jan 26, 2023

Conversation

Caomoji
Copy link
Contributor

@Caomoji Caomoji commented Dec 30, 2022

Fixes #421

Creates a new ImageCopier object with its own imagePullPolicy to avoid a data race on the container object.

This new object is dedicated to the image copy process as it was done before but with its own subcontext and abstracted from the core mutation webhook processing.

A new configurable imageCopyDeadline parameter has been added alongside an implementation of timeout throughout the different steps of the image copy. Every function called as part of the copy has been updated to include context evaluation and support abortion in case of context timeout.

@estahn
Copy link
Owner

estahn commented Jan 23, 2023

@Caomoji Thanks, much appreciated. Extracting the ImageCopier into its own thing was on my backlog somewhere :) ... Looking good. Also thanks for updating the documentation with the new params.

@Izzette Any further concerns on the code?

@codecov
Copy link

codecov bot commented Jan 23, 2023

Codecov Report

Base: 59.74% // Head: 67.34% // Increases project coverage by +7.60% 🎉

Coverage data is based on head (b46c7cd) compared to base (e84b623).
Patch coverage: 74.17% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #422      +/-   ##
==========================================
+ Coverage   59.74%   67.34%   +7.60%     
==========================================
  Files           5        6       +1     
  Lines         313      392      +79     
==========================================
+ Hits          187      264      +77     
  Misses        103      103              
- Partials       23       25       +2     
Impacted Files Coverage Δ
pkg/config/config.go 0.00% <ø> (ø)
pkg/secrets/kubernetes.go 57.69% <60.00%> (ø)
pkg/webhook/image_swapper.go 62.24% <74.07%> (+5.64%) ⬆️
pkg/webhook/image_copier.go 74.57% <74.57%> (ø)
pkg/secrets/dummy.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@estahn
Copy link
Owner

estahn commented Jan 23, 2023

@Caomoji Would you mind rebasing and solving the conflicts?

Comment on lines +46 to +48
The option `imageCopyDeadline` (default: `8s`) defines the duration after which the image copy if aborted.

This option only applies for `immediate` and `force` image copy strategies.
Copy link

@Izzette Izzette Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the default 8 really? Since the parameter is an integer of seconds? (Maybe I'm wrong about the configuration parsing). It might also be worth noting that a value of 0 will be treated as if no value was provided (although that's pretty normal for Go projects, the user may already expect this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding parsing of this field, if an integer is passed then it will be interpreted as a duration expressed in nanoseconds, while if a string with a correct format (see https://pkg.go.dev/time#ParseDuration) is passed, then it's interpreted using that unit. For instance 8s is effectively converted to a duration of 8 seconds, just as 8000000000 would be too.
Please let me know if you think we should be more explicit about this in the documentation.

Copy link

@Izzette Izzette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Izzette
Copy link

Izzette commented Jan 23, 2023

@Izzette Any further concerns on the code?

Notably, I've already done a very thorough review of this diff with @Caomoji in our fork, if you want to see what was discussed in the past. https://github.com/BackMarket/k8s-image-swapper/pull/4

@estahn
Copy link
Owner

estahn commented Jan 24, 2023

@Caomoji Let me know if you need help resolving the conflicts.

@Caomoji
Copy link
Contributor Author

Caomoji commented Jan 24, 2023

@estahn Sorry for the delay, I am monitoring this fix in one of our live environments to make sure it effectively works before rebasing here, but I will do so shortly 👍, thank you!

@estahn
Copy link
Owner

estahn commented Jan 24, 2023

@Caomoji All good, thanks so much for your efforts.

@estahn estahn added the bug Something isn't working label Jan 24, 2023
@Caomoji
Copy link
Contributor Author

Caomoji commented Jan 25, 2023

@estahn We discovered that the context passed in the Mutate function would get canceled quite rapidly, thus causing unexpected timeout behavior in delayed imageCopyPolicy mode, and unexpectedly short deadline in other modes.
To avoid cancellation propagation, and as it seems no data from this parent context is really being used, we decided to create a new one without inheriting from any value from it. Please let us know if you think we should still keep some information from this context.

Besides, some updates were made on logs to adapt messages that were present on the main branch.

We will resolve conflicts that this update will cause with this PR, please do not hesitate to provide additional feedback on those changes that, we believe, would open the path to a more registry-agnostic approach while allowing authentication to source registries 🙏

@estahn
Copy link
Owner

estahn commented Jan 26, 2023

@Caomoji A bit confused, are you saying this PR should not be merged?

@estahn estahn merged commit a133c78 into estahn:main Jan 26, 2023
@Izzette
Copy link

Izzette commented Jan 26, 2023

🥳 🎊 Thanks for your help and support @estahn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race issue during image copy with multiple containers
3 participants