Skip to content

Conversation

@ngopalak-redhat
Copy link
Contributor

@ngopalak-redhat ngopalak-redhat commented Jan 5, 2026

This PR introduces a new test suite specifically for node component testing, dubbed "Long Running Tests."

Following discussions with the MCO team on Slack, we agreed to separate specific disruptive node tests from the main MCO disruptive test suite.

Why not the Serial Suite? These tests cannot be part of the existing serial suite because they require multiple node reboots (exceeding the 3-restart limit) and have a significant runtime duration.

Goal is to establish a dedicated suite for tests that are disruptive and time-consuming but critical for release verification of the node component configuration (Kubelet).

Implementation Details

Adds the framework for long-running tests. Currently includes one test case: Changing the Kubelet Log Level.

  • The test applies a Kubelet config change.
  • It explicitly waits for the node reboot cycle to complete.

Configs are applied to a single node rather than all three. This reduces the blast radius and significantly speeds up execution. The suite includes logic to revert changes and clean up the node state post-execution.

Future Work / Roadmap

  • Post-merge, I will add tests for system-compressible and auto-node-sizing.

  • I will work with the MCO team to configure this as a Periodic Job (similar to the MCO disruptive tests). These tests will not run on every PR but will be a Release Blocking requirement.

Sample run: https://gist.github.com/ngopalak-redhat/0c63bddf63a0a49c46c9dd2a13fad465

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 5, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 5, 2026

@ngopalak-redhat: This pull request references OCPNODE-3203 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 5, 2026

@ngopalak-redhat: This pull request references OCPNODE-3203 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR introduces a new test suite specifically for node component testing, dubbed "Long Running Tests."

Following discussions with the MCO team on Slack, we agreed to separate specific disruptive node tests from the main MCO disruptive test suite.

Why not the Serial Suite? These tests cannot be part of the existing serial suite because they require multiple node reboots (exceeding the 3-restart limit) and have a significant runtime duration.

Goal is to establish a dedicated suite for tests that are disruptive and time-consuming but critical for release verification of the node component configuration (Kubelet).

Implementation Details

Adds the framework for long-running tests. Currently includes one test case: Changing the Kubelet Log Level.

  • The test applies a Kubelet config change.
  • It explicitly waits for the node reboot cycle to complete.

Configs are applied to a single node rather than all three. This reduces the blast radius and significantly speeds up execution. The suite includes logic to revert changes and clean up the node state post-execution.

Future Work / Roadmap

  • Post-merge, I will add tests for system-compressible and auto-node-sizing.

  • I will work with the MCO team to configure this as a Periodic Job (similar to the MCO disruptive tests). These tests will not run on every PR but will be a Release Blocking requirement.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat
Copy link
Contributor Author

/test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 5, 2026

@ngopalak-redhat: This pull request references OCPNODE-3203 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR introduces a new test suite specifically for node component testing, dubbed "Long Running Tests."

Following discussions with the MCO team on Slack, we agreed to separate specific disruptive node tests from the main MCO disruptive test suite.

Why not the Serial Suite? These tests cannot be part of the existing serial suite because they require multiple node reboots (exceeding the 3-restart limit) and have a significant runtime duration.

Goal is to establish a dedicated suite for tests that are disruptive and time-consuming but critical for release verification of the node component configuration (Kubelet).

Implementation Details

Adds the framework for long-running tests. Currently includes one test case: Changing the Kubelet Log Level.

  • The test applies a Kubelet config change.
  • It explicitly waits for the node reboot cycle to complete.

Configs are applied to a single node rather than all three. This reduces the blast radius and significantly speeds up execution. The suite includes logic to revert changes and clean up the node state post-execution.

Future Work / Roadmap

  • Post-merge, I will add tests for system-compressible and auto-node-sizing.

  • I will work with the MCO team to configure this as a Periodic Job (similar to the MCO disruptive tests). These tests will not run on every PR but will be a Release Blocking requirement.

Sample run: https://gist.github.com/ngopalak-redhat/0c63bddf63a0a49c46c9dd2a13fad465

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat ngopalak-redhat marked this pull request as ready for review January 5, 2026 09:11
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026
@ngopalak-redhat
Copy link
Contributor Author

/test all

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@ngopalak-redhat
Copy link
Contributor Author

@cpmeadors Please review

@qiliRedHat
Copy link

@ngopalak-redhat In tests that also need to config kubeletconfig or wait for machineconfig update, some of your functions in the kubeletconfig_features.go file can be reused. Should we consider to put the common functions in a utils.go or node_utils.go? CC: @cpmeadors

@cpmeadors
Copy link

/lgtm

@cpmeadors
Copy link

/assigne @neisw

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2026
@cpmeadors
Copy link

/assign @dgoodwin

@dgoodwin
Copy link
Contributor

I was hoping we would end up with a general purpose disruptive suite as this comes up for multiple teams quite often. Is there a reason this could not be morphed to be so? Long running is still fine I think as our ability to run very long jobs is now improved last I heard.

Out of curiosity how long are we talking for your current tests?

@cpmeadors
Copy link

I was hoping we would end up with a general purpose disruptive suite as this comes up for multiple teams quite often. Is there a reason this could not be morphed to be so? Long running is still fine I think as our ability to run very long jobs is now improved last I heard.

This is a step in that direction. We wanted to get it working first, then adapt for more general usage.

Out of curiosity how long are we talking for your current tests?

@ngopalak-redhat can you answer this?

@dgoodwin
Copy link
Contributor

I'd probably prefer we just name this to be what we want right from the get-go, rather than leaving yourselves a task to come back to later. Those kinds of things tend to not happen sometimes, but we could keep it kinda quiet while you get it where you want, then announce more broadly? I can avoid pointing anyone to it until you give the go-ahead.

@dgoodwin
Copy link
Contributor

To further the point, the test names would need renaming because they contain the suite, which technically bloats all the dbs and should be reflected by a rename in the component-mapping repo, which is a pain. Job renames lose history as well.
I would encourage getting this right the first time both in suite, test, and job naming.

@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/kubeletconfig_test branch from 7678b6a to 3c2d92c Compare January 22, 2026 05:53
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2026
@ngopalak-redhat
Copy link
Contributor Author

@dgoodwin / @cpmeadors I have chosen a generalized name for the test suite: openshift/disruptive-longrunning.

Currently, the test I added runs serially and takes about 5 minutes. However, the Node team plans to add more comprehensive tests that may take up to 15 minutes and require multiple node restarts. Thinking of merging usernamespace test (https://github.com/openshift/origin/blob/main/test/extended/node/nested_container.go) also into this one.

For example, the AutoSizingReserved feature (which you are aware of) requires multiple restarts to verify that the enable/disable logic functions correctly and that values are applied properly.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@dgoodwin
Copy link
Contributor

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2026
@ngopalak-redhat
Copy link
Contributor Author

/verified by @ngopalak-redhat
/retest-required

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jan 23, 2026
@openshift-ci-robot
Copy link

@ngopalak-redhat: This PR has been marked as verified by @ngopalak-redhat.

Details

In response to this:

/verified by @ngopalak-redhat
/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dgoodwin
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 23, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cpmeadors, dgoodwin, ngopalak-redhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 23, 2026

@ngopalak-redhat: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit f14bf4a into openshift:main Jan 23, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants