Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMP-2132: Implement suspend and resume scan schedule #396

Merged
merged 1 commit into from
Oct 9, 2023

Conversation

rhmdnd
Copy link

@rhmdnd rhmdnd commented Aug 31, 2023

This commit implements the logic and tests necessary to suspend and
resume scan schedules using the ScanSetting custom resource.

You can find more details on the overall justification, use cases, and
implementation details in the enhancement:

#375

@openshift-ci-robot
Copy link
Collaborator

@rhmdnd: This pull request references CMP-2132 which is a valid jira issue.

In response to this:

This commit implements the logic and tests necessary to suspend and
resume scan schedules using the ScanSetting custom resource.

You can find more details on the overall justification, use cases, and
implementation details in the enhancement:

#375

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Collaborator

@sheriff-rh sheriff-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -9,7 +9,12 @@ Versioning](https://semver.org/spec/v2.0.0.html).

### Enhancements

-
- Users can now pause scan schedules by setting the `ScanSetting.suspend`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great QOL feature to add! I'll be sure it's highlighted in the release notes downstream in the future.

@rhmdnd rhmdnd force-pushed the CMP-2132 branch 3 times, most recently from e8f84a3 to 25f04d3 Compare September 1, 2023 20:21

}

func TestSuspendScanSettingDoesNotCreateScan(t *testing.T) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move these to serial tests since they will fail if they're run at the same time as another test that's using ocp4-cis or ocp4-cis-node.

@rhmdnd
Copy link
Author

rhmdnd commented Sep 1, 2023

@xiaojiey I have one minor test fix to make, but otherwise these should be good to go for pre-verification.

// ScanSetting.suspend attribute is disabled, or set to False, the
// ComplianceSuite will get updated and schedule scans.
if suite.Spec.Suspend {
return false, nil
Copy link
Member

@yuumasato yuumasato Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the ComplianceSuite's phase (status) should be changed to something like SUSPENDED.
For example:

$ oc get  compliancesuites
NAME            PHASE       RESULT
nist-moderate   RUNNING     NOT-AVAILABLE
cis             SUSPENDED   NON-COMPLIANT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea - I'll see if I can incorporate that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got down the path of incorporating status into the ComplianceSuite, and realized that having a Suspended status there makes reconciling it when the ScanSetting is reactivated much more complicated.

I'll propose an alternative with the status on the ScanSettingBinding, which should allow us to relay the same information, but it makes it simpler to reconcile because we can short-circuit the ComplianceScan creation if we have a binding that references a suspended scan setting.

}
if *job.Spec.Suspend == false {
t.Fatalf("Expected CronJob %s to be suspended", jobName)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add SUSPENDED Status/Phase, then here we would also check for the status to be SUSPENDED withWaitForSuiteScansStatus().

err := f.AssertScanDoesNotExist(scanName, f.OperatorNamespace)
if err != nil {
t.Fatal(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well, check the status of the ComplianceSuite.

@xiaojiey
Copy link
Collaborator

@rhmdnd Generally it is quite good.
Just two minor questions,

  1. Is it needed to set default value for suspend for the default and default-auto-apply ss? Currently the value is unset.
    $ oc get ss default -o=jsonpath={.suspend}
    $ oc get ss default-auto-apply -o=jsonpath={.suspend}
  2. When creating a ss with suspend set to true, then create a ssb with the ss, the suite with be in PENDING status. I didn't notice if there is any warning message to know why it is PENDING. Is it possible to add such warning message? Thanks.
    $ oc get suite
    NAME PHASE RESULT
    test PENDING
All below scenarios verified with 4.14.0-0.nightly-2023-09-11-201102 + code in this PR 
Test scenario 1##########Create a ss with suspend set to false, then suspend the ssb on demand:
1. Create a ss with suspend to false, and target to scan every 3 minutes 
schedule: "*/3 * * * *"
suspend: false
2. Create a ssb:
$ oc compliance bind -N test -S test-suspend profile/ocp4-cis
Creating ScanSettingBinding test
$ oc get cronjob
NAME            SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-rerunner   */3 * * * *   False     0        108s            2m51s
$ oc get job
NAME                     COMPLETIONS   DURATION   AGE
test-rerunner-28241790   1/1           5s         114s
test-rerunner-28241793   0/1                      0s
$ oc get suite
NAME   PHASE     RESULT
test   RUNNING   NOT-AVAILABLE
4. Path the suspend to true:
$ oc patch ss test-suspend -p '{"suspend":true}' --type='merge'
scansetting.compliance.openshift.io/test-suspend patched
5. Check the cronjob will be supspended:
$ oc get job
NAME                     COMPLETIONS   DURATION   AGE
test-rerunner-28241790   1/1           5s         10m
test-rerunner-28241793   1/1           6s         7m5s
6. Patch to set the suspend to false:
$ oc patch ss test-suspend -p '{"suspend":false}' --type='merge'
scansetting.compliance.openshift.io/test-suspend patched
7. Check the cronjob:
$ oc get cronjob -w
NAME            SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-rerunner   */3 * * * *   False     0        3m              13m
test-rerunner   */3 * * * *   False     1        0s              13m
^C$ oc get suite -w
NAME   PHASE     RESULT
test   RUNNING   NOT-AVAILABLE
^C
$ oc get job
NAME                     COMPLETIONS   DURATION   AGE
test-rerunner-28241793   1/1           6s         9m31s
test-rerunner-28241799   1/1           7s         2m12s
test-rerunner-28241802   1/1           7s         31s

Test scenario 2##########Create a ss with suspend set to true, then create a ssb and unsuspend the ssb on demand:
1. Create a ss with suspend set to true
schedule: "*/3 * * * *"
suspend: true
2. Create a ssb:
$ oc apply -f ss2.yaml 
scansetting.compliance.openshift.io/test-suspend-true created
3. Check the suite will be in PENDING status:
$ oc get ssb
NAME   AGE
test   3s
$ oc get suite
NAME   PHASE     RESULT
test   PENDING
4. Patch the ss to set suspend to false
$ oc patch ss test-suspend-true -p '{"suspend":false}' --type='merge'
scansetting.compliance.openshift.io/test-suspend-true patched
5. Check the suite will be in running status immediately:
$ oc get suite
NAME   PHASE   RESULT
test   DONE    NON-COMPLIANT
$ oc get cronjob
NAME            SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-rerunner   */3 * * * *   False     0        114s            3m45s
$ oc get job
NAME                     COMPLETIONS   DURATION   AGE
test-rerunner-28241838   1/1           6s         118s


Test scenario 3##########Create a ss with suspend not set, then create a ssb suspend the ssb on demand:
1. create a ss:
$ oc get ss test-default -o=jsonpath={.suspend}
$ oc get ss test-default -o=jsonpath={.schedule}
*/3 * * * *
2. Create a ssb:
$ oc compliance bind -N test -S test-default profile/ocp4-cis
Creating ScanSettingBinding test
$ oc get suite -w
NAME   PHASE     RESULT
test   RUNNING   NOT-AVAILABLE
test   AGGREGATING   NOT-AVAILABLE
test   DONE          NON-COMPLIANT
test   DONE          NON-COMPLIANT
^C
$ oc get cronjob
NAME            SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-rerunner   */3 * * * *   False     0        2m9s            4m15s
3. Patch the suspend to true:
$ oc patch ss test-default -p '{"suspend":true}' --type='merge'
scansetting.compliance.openshift.io/test-default patched

$ oc get cronjob
NAME            SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-rerunner   */3 * * * *   True      0        8m54s           11m
$ oc get job
NAME                     COMPLETIONS   DURATION   AGE
test-rerunner-28241859   1/1           6s         8m57s
4. Check the cronjobs:
$ oc get cronjob 
NAME            SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-rerunner   */3 * * * *   False     0        67s             12m
$ oc get suite -w
NAME   PHASE     RESULT
test   RUNNING   NOT-AVAILABLE
^C$ oc get job
NAME                     COMPLETIONS   DURATION   AGE
test-rerunner-28241859   1/1           6s         10m
test-rerunner-28241868   1/1           6s         36s

@xiaojiey
Copy link
Collaborator

@rhmdnd Retest the PR, the suite will be updated to "SUSPENDED" phase when the suspend enabled.
I am not sure if you would like to show the suspend field in default and default-auto-apply. Currently it is unset.
$ oc get ss default -o=jsonpath={.suspend}
$ oc get ss default-auto-apply -o=jsonpath={.suspend}

##########Create a ss with suspend set to true, then create a ssb and unsuspend the ssb on demand:

  1. Create a ss with suspend set to true
    schedule: "*/3 * * * *"
    suspend: true
  2. Create a ssb:
    $ oc apply -f ss2.yaml
    scansetting.compliance.openshift.io/test-suspend-true created
  3. Check the suite will be in PENDING status:
    $ oc get ssb
    NAME AGE
    test 3s
    $ oc get suite
    NAME PHASE RESULT
    test SUSPENDED
  4. Patch the ss to set suspend to false
    $ oc patch ss test-suspend-true -p '{"suspend":false}' --type='merge'
    scansetting.compliance.openshift.io/test-suspend-true patched
  5. Check the suite will be in running status immediately:
    $ oc get suite
    NAME PHASE RESULT
    test DONE NON-COMPLIANT
    $ oc get cronjob
    NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
    test-rerunner */3 * * * * False 1 5s 2m42s
    $ oc get job
    NAME COMPLETIONS DURATION AGE
    test-rerunner-28256610 1/1 6s 8s
  6. Patch it to enable suspend again:
    **$ oc patch ss test-suspend-true -p '{"suspend":true}' --type='merge'
    scansetting.compliance.openshift.io/test-suspend-true patched
    $ oc get suite
    NAME PHASE RESULT
    test SUSPENDED NON-COMPLIANT
    $ oc get cronjob
    NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
    test-rerunner */3 * * * * True 0 2m22s 4m59s
    **

@@ -157,7 +164,10 @@ func (r *ReconcileComplianceSuite) Reconcile(ctx context.Context, request reconc
return reconcile.Result{}, nil
}

suiteCopy := suite.DeepCopy()
suiteCopy, err := r.refreshSuite(suite)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this complexity if we have the status on the binding.

@rhmdnd
Copy link
Author

rhmdnd commented Oct 5, 2023

@yuumasato @Vincent056 should be ready for another round of reviews.

cmd/manager/operator.go Outdated Show resolved Hide resolved
@rhmdnd rhmdnd force-pushed the CMP-2132 branch 3 times, most recently from 8170852 to 34bc4e6 Compare October 5, 2023 14:10
@rhmdnd
Copy link
Author

rhmdnd commented Oct 5, 2023

@xiaojiey @yuumasato @Vincent056 I cleaned up all remaining comments and this should be ready for another round of reviews.

@@ -293,6 +293,53 @@ or `ocp4-var-role-worker`:
`oc get ccr -n openshift-compliance -o yaml | jq '.items[] | select(.valuesUsed | contains("ocp4-var-role-master") or contains("ocp4-var-role-worker"))'`


## Suspending and resuming scan schedules
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent. Thanks @rhmdnd !

@openshift-ci-robot
Copy link
Collaborator

@rhmdnd: This pull request references CMP-2132 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This commit implements the logic and tests necessary to suspend and
resume scan schedules using the ScanSetting custom resource.

You can find more details on the overall justification, use cases, and
implementation details in the enhancement:

#375

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This commit implements the logic and tests necessary to suspend and
resume scan schedules using the `ScanSetting` custom resource.

You can find more details on the overall justification, use cases, and
implementation details in the enhancement:

  ComplianceAsCode#375
@xiaojiey
Copy link
Collaborator

xiaojiey commented Oct 8, 2023

Retest with 4.14.0-0.nightly-2023-10-06-234925 + code in the PR. Verification pass.

##########Test scenario 1# Check the suspend value in default and default-auto-apply ss:
$ oc get ss default -o=jsonpath={.suspend}
false
$ oc get ss default-auto-apply -o=jsonpath={.suspend}
false
##########Test scenario 2# Create a ss with suspend set to false, then suspend the ssb on demand:
$ cat ss.yaml 
...
schedule: "*/3 * * * *"
strictNodeScan: true
timeout: 30m
$ oc apply -f ss.yaml 
scansetting.compliance.openshift.io/test-suspend created
$ oc compliance bind -N test-suspend -S test-suspend profile/ocp4-cis profile/ocp4-cis-node
Creating ScanSettingBinding test-suspend
$ oc get suite -w
NAME           PHASE       RESULT
test-suspend   LAUNCHING   NOT-AVAILABLE
test-suspend   LAUNCHING   NOT-AVAILABLE
test-suspend   LAUNCHING   NOT-AVAILABLE
test-suspend   RUNNING     NOT-AVAILABLE
test-suspend   RUNNING     NOT-AVAILABLE
test-suspend   RUNNING     NOT-AVAILABLE
test-suspend   AGGREGATING   NOT-AVAILABLE
^C
$oc patch ss test-suspend -p '{"suspend":true}' --type='merge'e'
scansetting.compliance.openshift.io/test-suspend patched
$ oc get ssb
NAME           STATUS
test-suspend   SUSPENDED
$ oc get suite -w
NAME           PHASE         RESULT
test-suspend   AGGREGATING   NOT-AVAILABLE
test-suspend   DONE          NON-COMPLIANT
test-suspend   DONE          NON-COMPLIANT
$ oc get cronjob
NAME                    SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-suspend-rerunner   */3 * * * *   True      0        <none>          7m2s

$ oc get job
No resources found in openshift-compliance namespace.
##########Test scenario 3# Create a ss with suspend set to true, then unsuspend the ssb on demand:
cat ss2.yaml
...
schedule: "*/3 * * * *"
showNotApplicable: false
strictNodeScan: true
timeout: 30m
suspend: true
$ oc delete ssb --all
scansettingbinding.compliance.openshift.io "test-suspend" deleted
$ oc apply -f ss2.yaml 
scansetting.compliance.openshift.io/test-suspend-true created
$ oc get ssb test-suspend-true -o=jsonpath={.status} | jq -r
{
  "conditions": [
    {
      "lastTransitionTime": "2023-10-08T11:14:02Z",
      "message": "The scan setting binding uses a scan setting that is suspended",
      "reason": "Suspended",
      "status": "False",
      "type": "Ready"
    }
  ],
  "phase": "SUSPENDED"
}
$ oc patch ss test-suspend-true --patch '{"suspend":false}' --type='merge'
scansetting.compliance.openshift.io/test-suspend-true patched
$ oc get ssb
NAME                STATUS
test-suspend-true   READY
$ oc get suite -w
NAME                PHASE     RESULT
test-suspend-true   RUNNING   NOT-AVAILABLE
...
test-suspend-true   AGGREGATING   NOT-AVAILABLE
test-suspend-true   DONE          NON-COMPLIANT
$ oc get cronjob
NAME                         SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
test-suspend-true-rerunner   */3 * * * *   False     0        2m57s           3m28s
$ oc get job
NAME                                  COMPLETIONS   DURATION   AGE
test-suspend-true-rerunner-28279398   1/1           7s         3m1s
test-suspend-true-rerunner-28279401   0/1           1s         1s
$ oc get suite
NAME                PHASE       RESULT
test-suspend-true   LAUNCHING   NOT-AVAILABLE

@xiaojiey
Copy link
Collaborator

xiaojiey commented Oct 8, 2023

/unhold
/label qe-approved

Copy link

@Vincent056 Vincent056 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing all the comments,
/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Oct 9, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhmdnd, Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot merged commit 3004a7f into ComplianceAsCode:master Oct 9, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants