Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate RP from Azure AD Graph to Microsoft Graph #1970

Merged
merged 12 commits into from
Jun 14, 2023

Conversation

mbarnes
Copy link
Contributor

@mbarnes mbarnes commented Feb 16, 2022

Which issue this PR addresses:

[ARO-1919] AAD: Migrate ARO-RP Azure AD Graph -> Microsoft Graph

What this PR does / why we need it:

Microsoft has announced the impending end-of-support for Azure Active Directory Graph API:

We will retire the Azure AD Graph API any time after June 30th, 2023.

We need to migrate the RP code to the Microsoft Graph SDK for Go.

Test plan for issue:

Pass all* GitHub checks, especially E2E tests.

* Except perhaps CodeQL, see additional notes below.

Is there any documentation that needs to be updated for this PR?

None that I'm aware of. This is an internal RP change and should not alter its behavior.

Additional notes:

CodeQL currently fails due to the vendoring of Microsoft Graph SDK for Go, which, due to its enormous API surface, causes memory consumption during CodeQL analysis to double, putting it well beyond the 8 GiB memory limit of standard GitHub Action runners. I'm working on moving CodeQL analysis to an Azure Pipeline, where we can use a virtual machine with adequate RAM (see the "pipelines" commit).

@ross-bryan
Copy link
Contributor

@mbarnes FYI changed this PR to "Draft" to help make navigating open PRs a little easier for me.

@github-actions github-actions bot added the needs-rebase branch needs a rebase label Mar 3, 2022
@github-actions
Copy link

github-actions bot commented Mar 3, 2022

Please rebase pull request.

@github-actions github-actions bot removed the needs-rebase branch needs a rebase label Mar 3, 2022
@github-actions
Copy link

Please rebase pull request.

@github-actions github-actions bot added the needs-rebase branch needs a rebase label Mar 17, 2022
@github-actions github-actions bot removed the needs-rebase branch needs a rebase label Apr 11, 2022
@mbarnes mbarnes force-pushed the aadgraph-to-msgraph branch 2 times, most recently from bcb23d3 to f8f344f Compare April 15, 2022 02:33
@github-actions github-actions bot added the needs-rebase branch needs a rebase label May 4, 2022
@github-actions
Copy link

github-actions bot commented May 4, 2022

Please rebase pull request.

@github-actions github-actions bot removed the needs-rebase branch needs a rebase label May 6, 2022
@github-actions
Copy link

Please rebase pull request.

@github-actions github-actions bot added the needs-rebase branch needs a rebase label May 26, 2022
SudoBrendan
SudoBrendan previously approved these changes Jun 9, 2023
Copy link
Collaborator

@SudoBrendan SudoBrendan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there's still open discussion on *string vs string and returning a 404 error, I'm ultimately going to call this a stylistic choice - we can disagree and commit. I added one final nit that also wouldn't stop me from merging this. LGTM, great work Matt, thank you!

pkg/util/cluster/cluster.go Outdated Show resolved Hide resolved
matches := result.GetValue()
switch len(matches) {
case 0:
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still feels like it should be an error, but could be addressed later on post deadline.

s-amann
s-amann previously approved these changes Jun 9, 2023
To aid debugging failed MS Graph requests.

MS Graph's top-level APIError message is hard-coded and only says
"error status code received from the API".  Further details have
to be extracted from the "ODataErrorable" interface type.
Vendoring the Microsoft Graph SDK for Go causes memory consumption
during CodeQL analysis to double due to its enormous API surface,
putting it well beyond the memory limit of standard GitHub Action
runners.

I inquired with the Azure organization admins about provisioning
larger GitHub runners, but was directed instead to use the 1ES
Hosted Pool which runs our other CI checks. Since ARO controls
the VM type for Hosted Pool agents, we can use a VM type with
adequate memory for CodeQL analysis with the Graph SDK.

Note: Implemented CodeQL commands in a template in case we
      ever decide to move Javascript or Python analysis to
      1ES Hosted Pool as well.
Copy link
Collaborator

@bennerv bennerv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits and some questions.

pkg/env/armhelper.go Show resolved Hide resolved
pkg/env/armhelper.go Show resolved Hide resolved
pkg/util/cluster/aad.go Show resolved Hide resolved
pkg/util/cluster/cluster.go Show resolved Hide resolved
Comment on lines +597 to +598
if spObjID == nil {
return nil
}

roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", spObjID))
roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", *spObjID))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not expected

I wouldn't return an error, but log a message stating no role assignments found.

If the cluster installation fails or is requeued for any reason, this role assignment may not exist. And that's totally fine, we're just trying to clean up after ourselves on cluster deletion.

pkg/cluster/serviceprincipal.go Show resolved Hide resolved
pkg/cluster/serviceprincipal.go Show resolved Hide resolved
pkg/env/armhelper.go Show resolved Hide resolved
pkg/util/cluster/aad.go Show resolved Hide resolved
pkg/util/cluster/aad.go Show resolved Hide resolved

m.spApplications = graphrbac.NewApplicationsClient(m.env.Environment(), m.subscriptionDoc.Subscription.Properties.TenantID, spGraphAuthorizer)
return nil
return err
}

func (m *manager) clusterSPObjectID(ctx context.Context) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anywhere this is called, we can remove the refreshing action from now and test the previous theory about Authorization refreshing in azcore.

Also on the fixClusterSPObjectID call too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What azcore refreshing theory? If I got it right, we reduce the amount of code here, which is always good direction.

Copy link
Contributor Author

@mbarnes mbarnes Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrkotas Deep inside azcore there's a bit of logic to automatically force a token refresh upon an authorization error, which in theory obviates the need for our AuthorizationRefreshingAction. But the catch is it requires Azure clients explicitly define an authorization callback to handle this: specifically the "OnChallenge" callback shown in the azcore link.

We discovered the hard way (cluster install failures) that this logic isn't currently utilized for ARM calls since we're not fully migrated to the newer azure-sdk-for-go SDK. I'm not yet certain if the MS Graph client handles this situation automatically either.

Copy link
Collaborator

@bennerv bennerv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - few nits, but looks great otherwise.

@SudoBrendan SudoBrendan added next-up next-release To be included in the next RP release rollout labels Jun 9, 2023
Copy link
Member

@petrkotas petrkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbarnes thank you for outstanding work, especially for documenting graph api quirks.

Comment on lines +597 to +598
if spObjID == nil {
return nil
}

roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", spObjID))
roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", *spObjID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not consider this as an error either. There simply are not roleAssignments to delete. +1 on logging


m.spApplications = graphrbac.NewApplicationsClient(m.env.Environment(), m.subscriptionDoc.Subscription.Properties.TenantID, spGraphAuthorizer)
return nil
return err
}

func (m *manager) clusterSPObjectID(ctx context.Context) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What azcore refreshing theory? If I got it right, we reduce the amount of code here, which is always good direction.

@mbarnes
Copy link
Contributor Author

mbarnes commented Jun 12, 2023

@mbarnes thank you for outstanding work, especially for documenting graph api quirks.

Thanks! If the quirk you're referring to is the ByApplicationId comments, I did file a complaint upstream about it. Previous graph SDK versions didn't have this confusing naming.

@mbarnes
Copy link
Contributor Author

mbarnes commented Jun 12, 2023

Noting for myself:

The MS Graph SDK uses different logic for handling 401 Unauthorized errors in the package kiota-http-go:
https://github.com/microsoft/kiota-http-go/blob/main/nethttp_request_adapter.go#L173-L198

From what I can make of the code, it is indeed requesting a new authorization token. We should still test it to verify.

@SudoBrendan SudoBrendan merged commit 156383c into Azure:master Jun 14, 2023
@mbarnes mbarnes deleted the aadgraph-to-msgraph branch June 14, 2023 17:11
ellis-johnson pushed a commit to ellis-johnson/ARO-RP that referenced this pull request Jun 21, 2023
* go.mod: Add github.com/microsoftgraph/msgraph-sdk-go

* azureclient: Add NewGraphServiceClient

Creates a GraphServiceClient with scope and graph endpoint set
appropriately for the cloud environment (public or US government).

* pkg/util/graph: Add GetServicePrincipalIDByAppID

* armhelper: Use MS Graph to obtain service principal ID

* armhelper: Remove unused authorizer parameter

* Use MS Graph endpoint to validate service principal

I don't think it matters for the purpose of validation, but the
AD Graph endpoint is nearing its end-of-life.

* pkg/cluster: Use MS Graph to obtain service principal ID

* pkg/util/cluster: Use MS Graph to create and delete clusters

* Pretty-print OData errors from MS Graph

To aid debugging failed MS Graph requests.

MS Graph's top-level APIError message is hard-coded and only says
"error status code received from the API".  Further details have
to be extracted from the "ODataErrorable" interface type.

* azureclient: Remove ActiveDirectoryGraphScope

No longer used.

* Remove pkg/util/azureclient/graphrbac

No longer used.

* pipelines: Run CodeQL analysis for Go on 1ES Hosted Pool

Vendoring the Microsoft Graph SDK for Go causes memory consumption
during CodeQL analysis to double due to its enormous API surface,
putting it well beyond the memory limit of standard GitHub Action
runners.

I inquired with the Azure organization admins about provisioning
larger GitHub runners, but was directed instead to use the 1ES
Hosted Pool which runs our other CI checks. Since ARO controls
the VM type for Hosted Pool agents, we can use a VM type with
adequate memory for CodeQL analysis with the Graph SDK.

Note: Implemented CodeQL commands in a template in case we
      ever decide to move Javascript or Python analysis to
      1ES Hosted Pool as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file loki next-release To be included in the next RP release rollout next-up ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants