Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use background context rather than cancelled context #216

Merged
merged 2 commits into from
Jul 19, 2022

Conversation

terryyylim
Copy link
Collaborator

@terryyylim terryyylim commented Jul 19, 2022

Bug

Turing's dependency on Kubernetes API was updated in #183 and most Kubernetes APIs now require context to be passed which caused a regression. When ctx.Done() is called, the context was cancelled and ctx.Err() returns a non-nil error. Thus, when passed to ListPods API, when the ctx.Err() value is first checked, the call exits given a non-nil error and pods that satisfy the provided labels are never returned, causing termination logs to NOT be propagated to the user.

Fix

The fix uses a new empty context via context.Background(), which will not be cancelled and that allows Kubernetes Go Client ListPods API to retrieve the correct pods based on provided labels. Then, the container error logs from Terminated state can be propagated as event logs to the user.

image

Enhancements

In addition, this PR standardizes how context is being used for the following workflows:

  • Creation/Updation: Use ctx with Deployment timeout as specified in config
  • Deletion: Use context.Background() with no timeouts (Some functions in api/turing/cluster/controller.go have their timeout parameter removed)
    • GracePeriodSeconds exposed by Kubernetes Go client's metav1.DeleteOptions{} is optional

@terryyylim terryyylim requested a review from a team July 19, 2022 08:37
@terryyylim terryyylim self-assigned this Jul 19, 2022
@krithika369
Copy link
Collaborator

  1. Could we indicate in the description that this is a regression?
  2. Have you also checked the calling code in api/turing/service/router_deployment_service.go as I see we are using context.Background() in some places and original context with deployment timeout in the others.

@terryyylim
Copy link
Collaborator Author

  1. Could we indicate in the description that this is a regression?

Done.

  1. Have you also checked the calling code in api/turing/service/router_deployment_service.go as I see we are using context.Background() in some places and original context with deployment timeout in the others.

In functions where original context are being passed to functions, a defer statement has been specified for the context cancel call. In cases where ctx.Done() is called, the context is no longer used and errors are being returned. Unless you mean something else here like standardising how context is being initialised and passed? (i.e along the lines of if original context is not required, we use context.Background() instead of original context)

@krithika369
Copy link
Collaborator

We have 2 deployment-related timeout values here:

  • Timeout is for the deployment of the new version (so the original ctx can be used for all creation / update operations; this can help cancel the deployment when we hit timeout)
  • DeletionTimeout is for the cleaning up of old resources (such as from the previous version; so this can be applied to all deletion actions). At the same time, it's probably ok to also use context.Background here and use the timeout value only when the API specifically needs it (example).

Lastly, I think context.Background() is also appropriate when we have operations that are mandatory (like getting the pod description).

@terryyylim terryyylim force-pushed the Fix_incorrect_context branch from d15ceb1 to 1b52b70 Compare July 19, 2022 10:11
Copy link
Collaborator

@krithika369 krithika369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM. Please feel free to merge once the test cases are addressed.

@terryyylim terryyylim force-pushed the Fix_incorrect_context branch from 1b52b70 to 13ac62d Compare July 19, 2022 11:54
@terryyylim terryyylim merged commit 2a7eb84 into caraml-dev:main Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants