Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): respect workflowDefaults when validating workflows. Fixes #10946. Fixes #11465 #13642

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

MasonM
Copy link
Contributor

@MasonM MasonM commented Sep 21, 2024

Fixes #10946 and Fixes #11465

Motivation

Submitting a workflow with a variable reference that points to a label or annotation that comes from workflowDefaults will always fail with an error like the following:

msg="Failed to submit workflow: rpc error: code = InvalidArgument desc = templates.whalesay-template: failed to resolve {{workflow.labels.arg-name}}

Modifications

This error is happening here:

return fmt.Errorf("failed to resolve {{%s}}", tag)

The scope argument comes from ValidateWorkflow(), which isn't aware of workflowDefaults, and therefore doesn't include any labels/annotations defined therein. This is purely a validation issue, since workflowDefaults is merged into the workflow at runtime by

func (wfc *WorkflowController) setWorkflowDefaults(wf *wfv1.Workflow) error {

This updates ValidateWorkflow() to take in the workflowDefaults (if any) and use that to populate the context with any labels/annotations, which is ultimately used to populate the scope argument. Most of the changes in this PR is just passing around the workflowDefaults so that it can ultimately get passed to ValidateWorkflow().

Verification

I opted to modify the existing TestSubmitWorkflowTemplateWorkflowMetadataSubstitution E2E test to verify this is working. I could've added a new test, but considering how slow the CI pipeline already is, that seems suboptimal.

$ make TestSubmitWorkflowTemplateWorkflowMetadataSubstitution
<SNIP>
Checking expectation workflow-template-submittable-t8f5r
workflow-template-submittable-t8f5r : Succeeded 
Found node id=workflow-template-submittable-t8f5r type=Pod
=== PASS: WorkflowTemplateSuite/TestSubmitWorkflowTemplateWorkflowMetadataSubstitution
--- PASS: TestWorkflowTemplateSuite (8.24s)
    --- PASS: TestWorkflowTemplateSuite/TestSubmitWorkflowTemplateWorkflowMetadataSubstitution (8.24s)
=== RUN   TestWorkflowSuite
--- PASS: TestWorkflowSuite (0.00s)
PASS
ok      github.com/argoproj/argo-workflows/v3/test/e2e  38.532s

…argoproj#10946

Submitting a workflow with a variable reference that points to a label
or annotation that comes from `workflowDefaults` will always fail with
an error like the following:
```
msg="Failed to submit workflow: rpc error: code = InvalidArgument desc = templates.whalesay-template: failed to resolve {{workflow.labels.arg-name}}
```

This is happening because the workflow is merged with `workflowDefaults`
at runtime by
https://github.com/argoproj/argo-workflows/blob/2dac1266b446d059ac6df78afe3d09a6ddb3af4b/workflow/controller/controller.go#L1375

which happens before the workflow is validated by `ValidateWorkflow()`.
Therefore, we need to explicitly pass in the `workflowDefaults` (if any)
to `ValidateWorkflow()` so it can merge in any labels/annotations before
validating templates.

Signed-off-by: Mason Malone <mmalone@adobe.com>
@MasonM
Copy link
Contributor Author

MasonM commented Sep 21, 2024

Test failures seem due to flakiness. CI / E2E Tests (test-executor, minimal, false) (pull_request) is failing with the same error on the main branch: https://github.com/argoproj/argo-workflows/actions/runs/10969440498/job/30462136331

For the failure in CI / E2E Tests (test-api, mysql, true) (pull_request) , I verified make TestCronCountersReplace passes locally:

        
=== PASS: MetricsSuite/TestCronCountersReplace
=== SLOW TEST:  MetricsSuite/TestCronCountersReplace took 2m1s
--- PASS: TestMetricsSuite (121.10s)
    --- PASS: TestMetricsSuite/TestCronCountersReplace (121.09s)
=== RUN   TestPodCleanupSuite
--- PASS: TestPodCleanupSuite (0.01s)
=== RUN   TestProgressSuite
--- PASS: TestProgressSuite (0.01s)
=== RUN   TestResourceTemplateSuite
--- PASS: TestResourceTemplateSuite (0.01s)
=== RUN   TestRetrySuite
--- PASS: TestRetrySuite (0.01s)
=== RUN   TestRunAsNonRootSuite
--- PASS: TestRunAsNonRootSuite (0.01s)
=== RUN   TestSemaphoreSuite
--- PASS: TestSemaphoreSuite (0.01s)
=== RUN   TestSignalsSuite
--- PASS: TestSignalsSuite (0.01s)
=== RUN   TestConfigMapKeySelectorSubstitutionSuite
--- PASS: TestConfigMapKeySelectorSubstitutionSuite (0.00s)
=== RUN   TestWorkflowInputsOverridableSuiteSuite
--- PASS: TestWorkflowInputsOverridableSuiteSuite (0.00s)
=== RUN   TestWorkflowTemplateSuite
--- PASS: TestWorkflowTemplateSuite (0.00s)
=== RUN   TestWorkflowSuite
--- PASS: TestWorkflowSuite (0.00s)
PASS
ok      github.com/argoproj/argo-workflows/v3/test/e2e  124.559s

@MasonM MasonM marked this pull request as ready for review September 21, 2024 23:45
Signed-off-by: Mason Malone <mmalone@adobe.com>
Signed-off-by: Mason Malone <mmalone@adobe.com>
@MasonM MasonM changed the title fix(server): respect workflowDefaults when validating workflows. Fixes #10946 fix(server): respect workflowDefaults when validating workflows. Fixes #10946. Fixes #11465 Sep 22, 2024
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@MasonM
Copy link
Contributor Author

MasonM commented Sep 25, 2024

Test failures are unrelated and should go away when #13660 is merged

@@ -3931,7 +3931,7 @@ func (woc *wfOperationCtx) setExecWorkflow(ctx context.Context) error {
// Validate the execution wfSpec
err := waitutil.Backoff(retry.DefaultRetry,
func() (bool, error) {
validationErr := validate.ValidateWorkflow(wftmplGetter, cwftmplGetter, woc.wf, validateOpts)
validationErr := validate.ValidateWorkflow(wftmplGetter, cwftmplGetter, woc.wf, woc.controller.Config.WorkflowDefaults, validateOpts)
Copy link
Member

@agilgur5 agilgur5 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Controller, this is normally handled via setWorkflowDefaults above (as you mentioned in your description) and the difference between woc.wf and woc.execWf.

That's probably why this worked in direct submissions but not cron ones or server ones, if I had to guess. Also the cron part means this doesn't just affect the server, if I'm not mistaken -- we may want to change the title if so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "direct submissions", but if you mean submitting workflows via a POST request to /api/v1/workflows, then that too is broken. The E2E tests I updated in this PR checks for that. I created a branch of main that cherry-picks just the tests so you can see for yourself: main...MasonM:argo-workflows:test-main-10946

Run make TestSubmitWorkflowTemplateWorkflowMetadataSubstitution and you should see the test fail with the following error:

workflow-template-submittable-v5xfw : Failed invalid spec: templates.whalesay-template: failed to resolve {{workflow.labels.default-label}}
Did not find node

Copy link
Member

@agilgur5 agilgur5 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "direct submissions"

Sorry I meant through the k8s API / kubectl "directly". Maybe the word "created" makes more sense in that case than "submitted", since the latter is generally used in the CLI & API?

Versus "server ones" I meant as through the API that are ofc failing. I meant them as "indirect" since they go through an intermediary, the Server.

"cron ones" I noted bc they do go through the same SubmitWorkflow method that runs validation, as this PR shows, meaning it would be affecting the (cron) Controller too, not just the Server

@agilgur5 agilgur5 added area/api Argo Server API area/server labels Sep 30, 2024
@agilgur5 agilgur5 changed the title fix(server): respect workflowDefaults when validating workflows. Fixes #10946. Fixes #11465 fix(server): respect workflowDefaults when validating workflows. Fixes #10946. Fixes #11465 Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants