Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0135] Revert PVC creation #6893

Merged

Conversation

QuanZhang-William
Copy link
Member

@QuanZhang-William QuanZhang-William commented Jun 30, 2023

Part of #6740 and unblocks #6635. PVCs are created if a user specifies VolumeClaimTemplate as the source of a pipelinerun workspace. Prior to this change, such pvcs are created via affinity assistant statefulset when affinity assistant is enabled (in both workspaces or pipelineruns coschedule mode).

To delete such pvcs when the owning pipelinerun is completed, we must explicitly delete those pvcs in the reconciler since the owner of such pvcs is the affinity assistant statefulset instead of the pipelinerun. The problem of such deletion mechanism is that such pvcs are left in terminating state when the owning pipelinerun is completed but not deleted. This is because the pvcs are protected by kubernetes.io/pvc-protection finalizer, which does not allow the pvcs to be deleted until the pipelinerun consuming the pvc is deleted. Leaving pvcs in terminating state is confusing to cluster operators. Instead of changing the pvc deletion behavior in such backward incompatible way, it is better to make it configurable so that it is backward compatible, as prototyped in #6635.

This commit reverts the pvc creation behavior per-workspace coschedule mode, which changes the owner of the pvcs back to the pipelinerun instead of the affinity assistant statefulset. After the commit, the pvcs created by specifying VolumeClaimTemplate are left in bounded state instead of terminating. This commit is the prerequisite of #6635.

This commit does NOT reverts the pvc creation behavior per-pipelinerun coschedule mode as we will enforce the deletion of pvcs when the owning pipelinerun is completed. This is a better practice and there is no backward compatability concern since per-pipelinerun coschedule mode is a new feature.

Changes

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

TEP-0135: Revert the owner of `PVCs` created by `pipelinerun VolumeClaimTemplate` back to `pipelinerun`. The `PVCs` bounded to the `pipelinerun` is now in `bounded` state when the `pipelinerun` is completed but not deleted.

@tekton-robot
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@tekton-robot tekton-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesnt merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 30, 2023
@QuanZhang-William
Copy link
Member Author

/test all

@tekton-robot tekton-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesnt merit a release note. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Jun 30, 2023
@QuanZhang-William
Copy link
Member Author

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 30, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 95.2% -2.9
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.1% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 95.2% -2.9
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.1% -0.4

@QuanZhang-William QuanZhang-William marked this pull request as ready for review June 30, 2023 15:30
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2023
@tekton-robot tekton-robot requested a review from jerop June 30, 2023 15:30
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 95.2% -2.9
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.1% -0.4

@QuanZhang-William
Copy link
Member Author

cc @lbernick @pritidesai @afrittoli

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 30, 2023
pkg/reconciler/pipelinerun/pipelinerun.go Show resolved Hide resolved
pkg/reconciler/pipelinerun/pipelinerun.go Outdated Show resolved Hide resolved
pkg/reconciler/pipelinerun/affinity_assistant_test.go Outdated Show resolved Hide resolved
@@ -88,7 +88,7 @@ func (c *Reconciler) createOrUpdateAffinityAssistantsPerAABehavior(ctx context.C
}
for claimTemplate, workspaceName := range claimTemplatesToWorkspace {
aaName := getAffinityAssistantName(workspaceName, pr.Name)
err := c.createOrUpdateAffinityAssistant(ctx, aaName, pr, []corev1.PersistentVolumeClaim{*claimTemplate}, nil, unschedulableNodes)
err := c.createOrUpdateAffinityAssistant(ctx, aaName, pr, nil, []corev1.PersistentVolumeClaimVolumeSource{{ClaimName: claimTemplate.Name}}, unschedulableNodes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this works because it assumes that pvcs have already been created from the volume claim template and that the function affinityAssistantStatefulSet only uses the volume claim templates in the statefulset spec. This is a bit hard to understand, and if we change affinityAssistantStatefulSet this will result in incorrect behavior. Maybe affinityAssistantStatefulSet needs to have some awareness of the affinity assistant mode, or we could use some explanatory comments + better variable naming of args passed into createOrUpdateAffinityAssistant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I think affinityAssistantStatefulSet should have the sole responsibility to create an StatefulSet based on the given input (so usually we don't change it), and all our business logic should go to createOrUpdateAffinityAssistantPerAABehavior and createOrUpdateAffinityAssistant.

Let me try to find some better variable name in createOrUpdateAffinityAssistantsPerAABehavior and put some comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the variable names are accurate, so I added some doc for this part, PTAL 😁 !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how I would refactor this (involves rewriting CreatePVCsForWorkspacesWithoutAffinityAssistant):

case aa.AffinityAssistantPerWorkspace:
                 ...
		for claimTemplate, workspaceBinding := range claimTemplatesToWorkspace {
			aaName := getAffinityAssistantName(workspaceBinding, pr.Name)
                         pvc, err := c.pvcHandler.GetOrCreatePVC(workspaceBinding, pr)
                         < handle err>
                         err := c.createOrUpdateAffinityAssistant(ctx, aaName, pr, nil, []corev1.PersistentVolumeClaimVolumeSource{pvc}, unschedulableNodes)
                         < handle err>

However, I know this PR is just meant to revert deletion behavior for the existing affinity assistant, and I want to keep this PR scoped to the minimal set of changes to do that. (I recognize I've already asked for some refactoring, but this chunk of code can be hard to grapple with :)) Maybe refactoring changes could be split into a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice! I think we can put the refactoring to another PR to keep this one scoped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this PR but I have a strong preference that refactoring changes happen first. typically refactoring changes that get deferred to later just get deprioritized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lee, I did some investigation and realized that doing such refactoring in this PR will introduce a bunch of code changes. This makes this PR quite big and harder to review.

To address your concern, I have created an issue tracking it and assigned to myself: #6915 (will make sure to prioritize). Merging this PR as it is can help unblocking the followup PRs where I will add e2e support for coschedule pipelineruns mode, and I can work on the refactoring work in parallel (and the reviews for the followup PRs can happen in parallel).

WDTY?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I appreciate it!

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 5, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 98.1% -0.1
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.1% -0.4

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 98.1% -0.1
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.1% -0.4

@QuanZhang-William
Copy link
Member Author

/test pull-tekton-pipeline-beta-integration-tests

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.2% -0.3

@QuanZhang-William
Copy link
Member Author

  1. The reason we didn't catch this behavior change originally is due to a lack of integration tests, and I think the way k8s handles deletion of PVCs when the PV still exists would be hard to mock in a unit test without being very brittle. I think we should add integration tests that ensure PVCs are in the desired state after completion of the pipelinerun.

I think we add the integration test when we work on changing the PVC deletion behavior PR since the deletion behavior is pending change and we need to modify the integration test at time anyways.

I disagree-- the purpose of this PR is to change the deletion behavior, and I feel I'd have to test it out locally to ensure the PVCs are still in "bound" state on completion of the pipelinerun. This signals to me we need an integration test for this behavior. When the deletion behavior is modified, the tests can be easily modified as well; this will make it even easier to understand the impacts of the change.

SGTM, I have added an integration test to test the AA behavior from end to end. PTAL

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.2% -0.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 91.5% 91.2% -0.3

@QuanZhang-William
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests
status code 500 Internal Server Error when retrieving image

knativetest "knative.dev/pkg/test"
)

func TestAffinityAssistant_PerWorkspace(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks Quan.

// When Affinity Assistant is enabled, the PersistentVolumeClaims will be created by the Affinity Assistant StatefulSet VolumeClaimTemplate instead.
// This function is only called when the Affinity Assistant behavior is AffinityAssistantPerWorkspace or AffinityAssistantDisabled.
// When the Affinity Assistant behavior is AffinityAssistantPerPipelineRun or AffinityAssistantPerPipelineRunWithIsolation,
// the PersistentVolumeClaims will be created by the Affinity Assistant StatefulSet VolumeClaimTemplate instead.
func (c *defaultPVCHandler) CreatePVCsForWorkspacesWithoutAffinityAssistant(ctx context.Context, wb []v1.WorkspaceBinding, ownerReference metav1.OwnerReference, namespace string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to change this function name now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, renamed it to CreatePVCsForWorkspaces as the usage of this function to create PVC means it is separated from Affinity Assistant StatefulSet

Comment on lines 58 to 60
// This function is only called when the Affinity Assistant behavior is AffinityAssistantPerWorkspace or AffinityAssistantDisabled.
// When the Affinity Assistant behavior is AffinityAssistantPerPipelineRun or AffinityAssistantPerPipelineRunWithIsolation,
// the PersistentVolumeClaims will be created by the Affinity Assistant StatefulSet VolumeClaimTemplate instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of docstrings that explain how a function is called. I think it would be sufficient to explain that it creates PVCs for workspaces with volumeClaimTemplates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the usage docstring for now. Will further look into the explanation in the followup refactor PR

@@ -88,7 +88,7 @@ func (c *Reconciler) createOrUpdateAffinityAssistantsPerAABehavior(ctx context.C
}
for claimTemplate, workspaceName := range claimTemplatesToWorkspace {
aaName := getAffinityAssistantName(workspaceName, pr.Name)
err := c.createOrUpdateAffinityAssistant(ctx, aaName, pr, []corev1.PersistentVolumeClaim{*claimTemplate}, nil, unschedulableNodes)
err := c.createOrUpdateAffinityAssistant(ctx, aaName, pr, nil, []corev1.PersistentVolumeClaimVolumeSource{{ClaimName: claimTemplate.Name}}, unschedulableNodes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this PR but I have a strong preference that refactoring changes happen first. typically refactoring changes that get deferred to later just get deprioritized.

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2023
@JeromeJu
Copy link
Member

/assign

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 92.0% 91.8% -0.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 92.0% 91.8% -0.3

knativetest "knative.dev/pkg/test"
)

func TestAffinityAssistant_PerWorkspace(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Thanks for the comments step by step in the e2e test. Wondering it might be beneficial to add a summary to the docstring here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, docstring added

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 92.0% 91.8% -0.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 92.0% 91.8% -0.3

@@ -184,14 +185,16 @@ func TestCreateAndDeleteOfAffinityAssistantPerPipelineRun(t *testing.T) {

// TestCreateAndDeleteOfAffinityAssistantPerWorkspace tests to create and delete an Affinity Assistant
// per workspace for a given PipelineRun
func TestCreateAndDeleteOfAffinityAssistantPerWorkspace(t *testing.T) {
func TestCreateAndDeleteOfAffinityAssistantPerWorkspaceOrDisabled(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update the docstring for this test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed!

Part of [tektoncd#6740] and unblocks [tektoncd#6635]. `PVCs` are created if a user specifies `VolumeClaimTemplate` as the source of a `pipelinerun workspace`.
Prior to this change, such `pvcs` are created via `affinity assistant statefulset` when `affinity assistant` is enabled (in both `workspaces` or `pipelineruns`
coschedule mode).

To delete such pvcs when the owning `pipelinerun` is completed, we must explicitly delete those pvcs in the reconciler since the owner of such pvcs is the `affinity assistant statefulset`
instead of the `pipelinerun`. The problem of such deletion mechanism is that such `pvcs` are left in `terminating` state when the owning `pipelinerun` is `completed` but not `deleted`.
This is because the pvcs are protected by `kubernetes.io/pvc-protection` `finalizer`, which does not allow the `pvcs` to be deleted until the `pipelinerun` consuming the `pvc` is deleted.
Leaving pvcs in `terminating` state is confusing to cluster operators. Instead of changing the pvc deletion behavior in such backward incompatible way,
it is better to make it configurable so that it is backward compatible, as prototyped in [tektoncd#6635].

This commit reverts the pvc creation behavior `per-workspace` coschedule mode, which changes the owner of the `pvcs` back to the `pipelinerun` instead of the
`affinity assistant statefulset`. After the commit, the pvcs created by specifying `VolumeClaimTemplate` are left in `bounded` state instead of `terminating`.
This commit is the prerequisite of [tektoncd#6635].

This commit does NOT reverts the pvc creation behavior `per-pipelinerun` coschedule mode as we will enforce the deletion of pvcs when the owning `pipelinerun` is completed.
This is a better practice and there is no backward compatability concern since `per-pipelinerun` coschedule mode is a new feature.

[tektoncd#6740]: tektoncd#6740
[tektoncd#6635]: tektoncd#6635
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 92.0% 91.8% -0.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/pipelinerun/affinity_assistant.go 98.2% 97.2% -1.0
pkg/reconciler/pipelinerun/pipelinerun.go 92.0% 91.8% -0.3

Copy link
Member

@JeromeJu JeromeJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering the concerns on testing offline and adding the excellent commit message. Discussed couple of implementation choices with @QuanZhang-William offline.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 11, 2023
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JeromeJu, lbernick, Yongxuanzhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot merged commit b769b56 into tektoncd:main Jul 11, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants