Skip to content

Conversation

@mdbooth
Copy link
Contributor

@mdbooth mdbooth commented May 14, 2025

This replaces use of the legacy /services SDK for compute only in the virtualmachines service, with a minimal set of supporting changes. This allows using Azure APIs beyond those available via the legacy SDK.

Backport includes #155, which fixes the ASH regression the original change introduced.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2025
@openshift-ci-robot
Copy link

@mdbooth: This pull request references Jira Issue OCPBUGS-56163, which is invalid:

  • expected the bug to target either version "4.19." or "openshift-4.19.", but it targets "4.20.0" instead
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This replaces use of the legacy /services SDK for compute only in the virtualmachines service, with a minimal set of supporting changes. This allows using Azure APIs beyond those available via the legacy SDK.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 14, 2025
@openshift-ci openshift-ci bot requested review from JoelSpeed and nrb May 14, 2025 08:52
@mdbooth
Copy link
Contributor Author

mdbooth commented May 14, 2025

/jira refresh

@openshift-ci-robot
Copy link

@mdbooth: This pull request references Jira Issue OCPBUGS-56163, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 14, 2025

/jira backport release-4.18,release-4.17,release-4.16

@openshift-ci-robot
Copy link

@mdbooth: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.18
/cherrypick release-4.17
/cherrypick release-4.16

In response to this:

/jira backport release-4.18,release-4.17,release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@openshift-ci-robot: once the present PR merges, I will cherry-pick it on top of release-4.16, release-4.17, release-4.18 in new PRs and assign them to you.

In response to this:

@mdbooth: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.18
/cherrypick release-4.17
/cherrypick release-4.16

In response to this:

/jira backport release-4.18,release-4.17,release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mdbooth mdbooth changed the title OCPBUGS-56163: Update virtualmachines service to armcompute/v5 SDK OCPBUGS-56163: [release-4.19] Update virtualmachines service to armcompute/v5 SDK May 14, 2025
@mdbooth
Copy link
Contributor Author

mdbooth commented May 14, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 14, 2025
@openshift-ci-robot
Copy link

@mdbooth: This pull request references Jira Issue OCPBUGS-56163, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-55372 is in the state ON_QA, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-55372 targets the "4.20.0" version, which is one of the valid target versions: 4.20.0
  • bug has dependents

Requesting review from QA contact:
/cc @sunzhaohua2

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sunzhaohua2 May 14, 2025 13:44
@damdo
Copy link
Member

damdo commented May 14, 2025

/assign @nrb @JoelSpeed

As you reviewed the previous one

@shellyyang1989
Copy link

@sunzhaohua2 when you have a moment, please take a look at the QE regression test, 3 tests failed. It's a big change, we need to ensure no regression introduced.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 15, 2025

@sunzhaohua2 when you have a moment, please take a look at the QE regression test, 3 tests failed. It's a big change, we need to ensure no regression introduced.

There's the same deterministic failure in there (the encryption one), and 2 others. I wonder if the test is timing out 🤔 I wonder if we sharded it into 3 (it's absurdly long-running!) if we'd see the non-deterministic failures go away.

Trying that here: openshift/release#64959

@sunzhaohua2
Copy link
Contributor

Thanks @mdbooth for finding these issues for this job..

  • OCP-52602:zhsun:Cluster_Infrastructure:[sig-cluster-lifecycle] Cluster_Infrastructure MAPI Drain operation should be asynchronous from the other machine operations [Disruptive]

This failed as there is uninitialized taint in node. Seems our script need to update, we check if nodes with uninitialized taint immediately when machines are running, seems need to wait until node Ready then check if nodes with uninitialized taint, right? @JoelSpeed

  • OCP-33058:zhsun:Cluster_Infrastructure:[sig-cluster-lifecycle] Cluster_Infrastructure MAPI Implement defaulting machineset values for azure [Disruptive] [Serial]

This case run timeout, it failed only 1 time in qe's jobs https://ocpqe-webapp-aos-qe-ci--runtime-int.apps.int.gpc.ocp-hub.prod.psi.redhat.com/prow_test_cases/OCP-33058, ignore the cucushift one, it should be removed, so this case is safe. Maybe we need to add time for this job, or split the cases into two jobs based on whether the case is destructive or not, this job run long time because it runs all MAPI-related cases.

  • OCP-39639:miyadav:Cluster_Infrastructure:[sig-cluster-lifecycle] Cluster_Infrastructure MAPI host-based disk encryption at VM on azure platform

This case I have enabled this feature in dev account in this pr https://github.com/openshift/openshift-tests-private/pull/24870 when run this job, the pr hasn't been merged

@JoelSpeed
Copy link
Contributor

This failed as there is uninitialized taint in node. Seems our script need to update, we check if nodes with uninitialized taint immediately when machines are running, seems need to wait until node Ready then check if nodes with uninitialized taint, right? @JoelSpeed

Sounds right to me yes, the nodes won't be Ready until the uninitialized taint has been removed

@JoelSpeed
Copy link
Contributor

/approve
/lgtm
/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label May 27, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2025
@mdbooth
Copy link
Contributor Author

mdbooth commented May 27, 2025

/test regression-clusterinfra-azure-ipi-mapi

@mdbooth
Copy link
Contributor Author

mdbooth commented May 27, 2025

/hold while we investigate a possible issue affecting ASH

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2025
@sunzhaohua2
Copy link
Contributor

Case OCP-39639:miyadav:Cluster_Infrastructure:[sig-cluster-lifecycle] Cluster_Infrastructure MAPI host-based disk encryption at VM on azure platform still failed, checked the job, there was no the new step g.By("Enable host-level encryption in Azure")
we need to update test image tag, I created a pr to update the tag, once this is merged, should be ok, I tested with qe account, if unregister EncryptionAtHost then run this case it passed.

oc image info quay-proxy.ci.openshift.org/openshift/ci:ci_tests-private_4.19
Name:          quay-proxy.ci.openshift.org/openshift/ci:ci_tests-private_4.19
Digest:        sha256:a0ef4c9fc777c8a2ad1e38e536063aefcb92d91b79f4be044f3c994a8465ad22
Manifest List: sha256:f551aedf7383dd27204e740146dc9b09bd2e48b9cf66f682fe93ebc6534f8de6
Media Type:    application/vnd.docker.distribution.manifest.v2+json
Created:       2h ago
$ oc image info quay-proxy.ci.openshift.org/openshift/ci:ci_tests-private_latest
Name:          quay-proxy.ci.openshift.org/openshift/ci:ci_tests-private_latest
Digest:        sha256:efdb152e86cff3f260353e1b3b9147896574ce945b6ed67d5872391ee68c5e0c
Manifest List: sha256:4b1c4f95a64dbfe489dc6237bf8349503ab5cf9385c0487af3b3dcfb9623ad53
Media Type:    application/vnd.docker.distribution.manifest.v2+json
Created:       42d ago

@damdo
Copy link
Member

damdo commented Jun 5, 2025

@mdbooth This now needs integration with a backport of #155 (might be worth doing both backports in this PR).

mdbooth added 2 commits July 1, 2025 11:47
This replaces use of the legacy /services SDK for compute only in the
virtualmachines service, with a minimal set of supporting changes. This
allows using Azure APIs beyond those available via the legacy SDK.

(cherry picked from commit f76ad07)

OCPBUGS-55372: Fix regression on ASH with armcompute/v5

The change "Update virtualmachines service to armcompute/v5 SDK" did not
update the StackHub implementation of the virtualmachines service, which
caused a failure in the machine actuator due to the difference in
type returned by Get().

We fix this by moving to a common implementation for both StackHub and
public clouds.

Having a common implementation in virtualmachines also requires a change
to the networkinterfaces Service. The networkinterfaces service Get()
method returns different types for StackHub and public clouds, which
were previously cast to the anticipated types by different
implementations of the virtualmachines service. As we don't require the
full type, we add a new type safe method which returns only the ID,
meaning a common implementation is now safe.

(cherry picked from commit afd0f3f)
@mdbooth mdbooth force-pushed the OCPBUGS-55372-4.19 branch from 81a56ce to 1fb94f1 Compare July 1, 2025 10:48
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 1, 2025
@openshift-ci-robot
Copy link

@mdbooth: This pull request references Jira Issue OCPBUGS-56163, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.z) matches configured target version for branch (4.19.z)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-55372 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-55372 targets the "4.20.0" version, which is one of the valid target versions: 4.20.0
  • bug has dependents

Requesting review from QA contact:
/cc @sunzhaohua2

In response to this:

This replaces use of the legacy /services SDK for compute only in the virtualmachines service, with a minimal set of supporting changes. This allows using Azure APIs beyond those available via the legacy SDK.

Backport includes #155, which fixes the ASH regression the original change introduced.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mdbooth
Copy link
Contributor Author

mdbooth commented Jul 1, 2025

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 1, 2025
@sdodson
Copy link
Member

sdodson commented Jul 1, 2025

@shellyyang1989 @sunzhaohua2 I thin this is ready for a fresh look after Matt fixed several issues you found on main branch and backported to release-4.19 here.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2025

@mdbooth: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/regression-clusterinfra-azure-ipi-mapi 1fb94f1 link false /test regression-clusterinfra-azure-ipi-mapi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sunzhaohua2
Copy link
Contributor

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 2, 2025
@shellyyang1989
Copy link

@sunzhaohua2 qe-approved or no-qe is required for all z streams!

@sunzhaohua2
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Jul 2, 2025
@openshift-ci-robot
Copy link

@mdbooth: This pull request references Jira Issue OCPBUGS-56163, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.z) matches configured target version for branch (4.19.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-55372 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-55372 targets the "4.20.0" version, which is one of the valid target versions: 4.20.0
  • bug has dependents

Requesting review from QA contact:
/cc @sunzhaohua2

In response to this:

This replaces use of the legacy /services SDK for compute only in the virtualmachines service, with a minimal set of supporting changes. This allows using Azure APIs beyond those available via the legacy SDK.

Backport includes #155, which fixes the ASH regression the original change introduced.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 04987a4 into openshift:release-4.19 Jul 2, 2025
11 of 12 checks passed
@openshift-ci-robot
Copy link

@mdbooth: Jira Issue OCPBUGS-56163: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-56163 has been moved to the MODIFIED state.

In response to this:

This replaces use of the legacy /services SDK for compute only in the virtualmachines service, with a minimal set of supporting changes. This allows using Azure APIs beyond those available via the legacy SDK.

Backport includes #155, which fixes the ASH regression the original change introduced.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@openshift-ci-robot: Failed to get PR patch from GitHub. This PR will need to be manually cherrypicked.

Error messagestatus code 406 not one of [200], body: {"message":"Sorry, the diff exceeded the maximum number of lines (20000)","errors":[{"resource":"PullRequest","field":"diff","code":"too_large"}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#get-a-pull-request","status":"406"}

In response to this:

@mdbooth: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.18
/cherrypick release-4.17
/cherrypick release-4.16

In response to this:

/jira backport release-4.18,release-4.17,release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

2 similar comments
@openshift-cherrypick-robot

@openshift-ci-robot: Failed to get PR patch from GitHub. This PR will need to be manually cherrypicked.

Error messagestatus code 406 not one of [200], body: {"message":"Sorry, the diff exceeded the maximum number of lines (20000)","errors":[{"resource":"PullRequest","field":"diff","code":"too_large"}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#get-a-pull-request","status":"406"}

In response to this:

@mdbooth: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.18
/cherrypick release-4.17
/cherrypick release-4.16

In response to this:

/jira backport release-4.18,release-4.17,release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-cherrypick-robot

@openshift-ci-robot: Failed to get PR patch from GitHub. This PR will need to be manually cherrypicked.

Error messagestatus code 406 not one of [200], body: {"message":"Sorry, the diff exceeded the maximum number of lines (20000)","errors":[{"resource":"PullRequest","field":"diff","code":"too_large"}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#get-a-pull-request","status":"406"}

In response to this:

@mdbooth: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.18
/cherrypick release-4.17
/cherrypick release-4.16

In response to this:

/jira backport release-4.18,release-4.17,release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-api-provider-azure
This PR has been included in build ose-machine-api-provider-azure-container-v4.19.0-202507021237.p0.g04987a4.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.