Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEVHAS-53] Use a Kubernetes Job to Generate GitOps Resources #256

Closed
wants to merge 27 commits into from

Conversation

johnmcollier
Copy link
Member

@johnmcollier johnmcollier commented Feb 2, 2023

What does this PR do?:

Note: This PR supersedes #231 and #241

This PR updates HAS to use a Kubernetes Job by default to generate GitOps resources.

In all the following is done:

  • By default now in HAS, when GitOps resources need to be generated (either base or overlays) during the reconciliation of a Component/SEB, HAS will launch a Kubernetes Job that will execu
    • If the Job fails, HAS will attempt to retrieve the job's logs and report back the error message in the component/seb status. If logs cannot be retrieved, or the job never launched, a separate error message is reported instead
    • The Job will be considered "failed" after a 5 minute timeout, or 5 successive failures.
  • If HAS is launched with the DO_LOCAL_GITOPS_GEN environment variable, HAS will instead generate the GitOps resources locally, instead of using a job
  • Additionally, if HAS is launched with the ALLOW_LOCAL_GITOPS_GEN environment variable, any Component or SEB with the allowLocalGitOpsGen annotation set to true, will have its GitOps resources generated locally
  • The gitops/ package and subpackages have been moved to the new gitops-generator/ submodule.
  • Testing:
    • Tests added for the new gitops-generator/ submodule
    • GitHub actions to build and test the new module, along with pushing its image up to quay.io/redhat-appstudio/gitops-generator
    • Controller tests updated to test both generation-locally, and generation-by-job.

Which issue(s)/story(ies) does this PR fixes:

https://issues.redhat.com/projects/DEVHAS/issues/DEVHAS-53?filter=allopenissues

PR acceptance criteria:

  • Unit/Functional tests

  • Documentation

  • Client Impact

How to test changes / Special notes to the reviewer:

Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
…e into newjobchanges

Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
…e into newjobchanges

Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
…e-system

Signed-off-by: John Collier <jcollier@redhat.com>
…tart up

Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
Signed-off-by: John Collier <jcollier@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Feb 2, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: johnmcollier

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: John Collier <jcollier@redhat.com>
@@ -29,5 +29,5 @@ apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: controller
newName: quay.io/redhat-appstudio/application-service
newTag: next
newName: quay.io/jcollier/application-service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

github.com/openshift-pipelines/pipelines-as-code v0.0.0-20220622161720-2a6007e17200
github.com/openshift/api v0.0.0-20210503193030-25175d9d392d
github.com/redhat-appstudio/application-api v0.0.0-20221205185405-03f73a06d978
github.com/redhat-developer/gitops-generator v0.0.0-20230113152345-19efcd5ec104
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably pull the latest here

@@ -40,11 +39,11 @@ func GenerateTektonBuild(outputPath string, component appstudiov1alpha1.Componen
tektonResourcesDirName := ".tekton"

if err := GenerateBuild(appFs, filepath.Join(componentPath, tektonResourcesDirName), component, gitopsConfig); err != nil {
return util.SanitizeErrorMessage(fmt.Errorf("failed to generate tekton build in %q for component %q: %s", componentPath, componentName, err))
return fmt.Errorf("failed to generate tekton build in %q for component %q: %s", componentPath, componentName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this is removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is... a good question. I think it's a holdover from earlier testing. This should be reverted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return err
}
// Determine if we're using a Kubernetes job for gitops generation, or generating locally
localGitopsGen := (r.AllowLocalGitopsGen && component.Annotations["allowLocalGitopsGen"] == "true") || (r.DoLocalGitOpsGen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AllowLocalGitopsGen allows certain resources to generate gitops locally then how come we need r.AllowLocalGitopsGen? By this if statement the controller needs r.AllowLocalGitopsGen to be true always irrespective of the annotation. Am I wrong or something? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I didn't want to always allow local GitOps generation, only if we explicitly chose to enable it.

TBH, this is primarily for testing, since suite_test.go sets the fields for Reconcilers, so we can't toggle DoLocalGitopsGen between false/true fort the controller tests


deployAssociatedComponents, err := devfileParser.GetDeployComponents(compDevfileData)
if err != nil {
//log.Error(err, "unable to get deploy components")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments here in these files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +511 to +514
jobNamespace := r.GitOpsJobNamespace
if jobNamespace == "" {
jobNamespace = component.Namespace
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we create job in the application service namespace as default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitOps secret & service account are already present in the HAS namespace + avoids using up user quota for resources in the user namespace

ctrlclient "sigs.k8s.io/controller-runtime/pkg/client"
)

var gitopsJobImage = "quay.io/redhat-appstudio/gitops-generator:latest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd declare this as a const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, oops. Agreed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it wise to pull in the latest? what if there are breaking changes from the repo? We should probably consider having releases on the gitops-generator repo and pulling in image versions that we know are compatible/tested with HAS eventually

Copy link
Member

@maysunfaisal maysunfaisal Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to start doing releases in gitops-generator repo.. But if we dont pull in the latest here, we are going to get some of the errors back we faced in infra-deploy PR 1291!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, agreed for now but something to think about once we stabilize

}

func GetPodLogs(ctx context.Context, client ctrlclient.Client, clientset kubernetes.Interface, jobName string, jobNamespace string) (string, error) {
jobPodList, err := clientset.CoreV1().Pods(jobNamespace).List(ctx, v1.ListOptions{LabelSelector: "job-name=" + jobName})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious as to why a clientset is being used here, rather than a call with the client, something like:

err = client.List(ctx, jobPodList,
		client.InNamespace(jobNamespace),
		client.MatchingLabels{label: "job-name=" + jobName})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason in particular. Figured since I had to use the clientset to get the logs, I might as well be consistent and use it to get the pods too.

pkg/gitopsjob/job.go Outdated Show resolved Hide resolved
select {
case <-timeout:
stay = false
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a default here if its empty

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise go linting complains about using it with a channel, but this was the most straightforward approach

// Wait for the job to succeed, error out if the 5 min timeout is reached
err = gitopsjob.WaitForJob(log, context.Background(), r.Client, r.GitOpsJobClientSet, jobName, jobNamespace, 5*time.Minute)
if err != nil {
return r.CleanUpJobAndReturn(log, jobName, jobNamespace, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we return like return r.CleanUpJobAndReturn() then the gitopsjob.WaitForJob() err will be lost

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanUpJobAndReturn explicitly only logs the error from the deletion, and we return the original error from WaitForJob.

func (r *ComponentReconciler) CleanUpJobAndReturn(log logr.Logger, jobName, jobNamespace string, err error) error {
	delErr := gitopsjob.DeleteJob(context.Background(), r.Client, jobName, jobNamespace)
	if delErr != nil {
		log.Error(err, "unable to delete gitops-generation job")
	}
	return err
}

It's a little clunky but should do the trick

@@ -528,8 +543,7 @@ func (r *ComponentReconciler) generateGitops(ctx context.Context, req ctrl.Reque
}
component.Status.GitOps.CommitID = commitID

// Remove the temp folder that was created
return r.AppFS.RemoveAll(tempDir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to make this call in GenerateGitopsBase() otherwise we will have a host of temp dirs for every reconcile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -154,5 +163,9 @@ var _ = AfterSuite(func() {
cancel()
By("tearing down the test environment")
err := testEnv.Stop()
if err != nil {
time.Sleep(4 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 not sure why

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found on a GitHub issue I can no longer find. Without the sleep the tests will sometimes fail to stop at the end of execution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.SetConditionAndUpdateCR(ctx, req, &appSnapshotEnvBinding, err)
return ctrl.Result{}, err
}
// If the Job succeeds, delete it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no delete happening

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awkward. Oops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// AllowLocalGitopsGen allows for certain resources to generate gitops resources locally, *if* an annotation is present on the resource. Defaults to false
AllowLocalGitopsGen bool

GitOpsJobClientSet *kubernetes.Clientset
}

//+kubebuilder:rbac:groups=appstudio.redhat.com,resources=snapshotenvironmentbindings,verbs=get;list;watch;create;update;patch;delete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to mention the job resource and its verbs here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary since the component controller specifies them, but I'll add them here to just to be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

johnmcollier and others added 4 commits February 7, 2023 14:18
Co-authored-by: Maysun Faisal <31771087+maysunfaisal@users.noreply.github.com>
Co-authored-by: Maysun Faisal <31771087+maysunfaisal@users.noreply.github.com>
Signed-off-by: John Collier <jcollier@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Feb 9, 2023

@johnmcollier: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images d24a233 link true /test images
ci/prow/application-service-e2e d24a233 link true /test application-service-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants