-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate RP from Azure AD Graph to Microsoft Graph #1970
Conversation
@mbarnes FYI changed this PR to "Draft" to help make navigating open PRs a little easier for me. |
edcfb49
to
374f79c
Compare
Please rebase pull request. |
374f79c
to
c451a2b
Compare
Please rebase pull request. |
baabb54
to
33f545a
Compare
bcb23d3
to
f8f344f
Compare
f8f344f
to
f5fdc3a
Compare
Please rebase pull request. |
f5fdc3a
to
a0ed155
Compare
Please rebase pull request. |
4a2f6e9
to
6a7b18d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While there's still open discussion on *string
vs string
and returning a 404 error, I'm ultimately going to call this a stylistic choice - we can disagree and commit. I added one final nit that also wouldn't stop me from merging this. LGTM, great work Matt, thank you!
matches := result.GetValue() | ||
switch len(matches) { | ||
case 0: | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still feels like it should be an error, but could be addressed later on post deadline.
To aid debugging failed MS Graph requests. MS Graph's top-level APIError message is hard-coded and only says "error status code received from the API". Further details have to be extracted from the "ODataErrorable" interface type.
No longer used.
No longer used.
Vendoring the Microsoft Graph SDK for Go causes memory consumption during CodeQL analysis to double due to its enormous API surface, putting it well beyond the memory limit of standard GitHub Action runners. I inquired with the Azure organization admins about provisioning larger GitHub runners, but was directed instead to use the 1ES Hosted Pool which runs our other CI checks. Since ARO controls the VM type for Hosted Pool agents, we can use a VM type with adequate memory for CodeQL analysis with the Graph SDK. Note: Implemented CodeQL commands in a template in case we ever decide to move Javascript or Python analysis to 1ES Hosted Pool as well.
6a7b18d
to
643ff9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits and some questions.
if spObjID == nil { | ||
return nil | ||
} | ||
|
||
roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", spObjID)) | ||
roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", *spObjID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but not expected
I wouldn't return an error, but log a message stating no role assignments found.
If the cluster installation fails or is requeued for any reason, this role assignment may not exist. And that's totally fine, we're just trying to clean up after ourselves on cluster deletion.
|
||
m.spApplications = graphrbac.NewApplicationsClient(m.env.Environment(), m.subscriptionDoc.Subscription.Properties.TenantID, spGraphAuthorizer) | ||
return nil | ||
return err | ||
} | ||
|
||
func (m *manager) clusterSPObjectID(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anywhere this is called, we can remove the refreshing action from now and test the previous theory about Authorization refreshing in azcore.
Also on the fixClusterSPObjectID call too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What azcore refreshing theory? If I got it right, we reduce the amount of code here, which is always good direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petrkotas Deep inside azcore there's a bit of logic to automatically force a token refresh upon an authorization error, which in theory obviates the need for our AuthorizationRefreshingAction. But the catch is it requires Azure clients explicitly define an authorization callback to handle this: specifically the "OnChallenge" callback shown in the azcore link.
We discovered the hard way (cluster install failures) that this logic isn't currently utilized for ARM calls since we're not fully migrated to the newer azure-sdk-for-go SDK. I'm not yet certain if the MS Graph client handles this situation automatically either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - few nits, but looks great otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbarnes thank you for outstanding work, especially for documenting graph api quirks.
if spObjID == nil { | ||
return nil | ||
} | ||
|
||
roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", spObjID)) | ||
roleAssignments, err := c.roleassignments.ListForResourceGroup(ctx, vnetResourceGroup, fmt.Sprintf("principalId eq '%s'", *spObjID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not consider this as an error either. There simply are not roleAssignments to delete. +1 on logging
|
||
m.spApplications = graphrbac.NewApplicationsClient(m.env.Environment(), m.subscriptionDoc.Subscription.Properties.TenantID, spGraphAuthorizer) | ||
return nil | ||
return err | ||
} | ||
|
||
func (m *manager) clusterSPObjectID(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What azcore refreshing theory? If I got it right, we reduce the amount of code here, which is always good direction.
Thanks! If the quirk you're referring to is the |
Noting for myself: The MS Graph SDK uses different logic for handling 401 Unauthorized errors in the package From what I can make of the code, it is indeed requesting a new authorization token. We should still test it to verify. |
* go.mod: Add github.com/microsoftgraph/msgraph-sdk-go * azureclient: Add NewGraphServiceClient Creates a GraphServiceClient with scope and graph endpoint set appropriately for the cloud environment (public or US government). * pkg/util/graph: Add GetServicePrincipalIDByAppID * armhelper: Use MS Graph to obtain service principal ID * armhelper: Remove unused authorizer parameter * Use MS Graph endpoint to validate service principal I don't think it matters for the purpose of validation, but the AD Graph endpoint is nearing its end-of-life. * pkg/cluster: Use MS Graph to obtain service principal ID * pkg/util/cluster: Use MS Graph to create and delete clusters * Pretty-print OData errors from MS Graph To aid debugging failed MS Graph requests. MS Graph's top-level APIError message is hard-coded and only says "error status code received from the API". Further details have to be extracted from the "ODataErrorable" interface type. * azureclient: Remove ActiveDirectoryGraphScope No longer used. * Remove pkg/util/azureclient/graphrbac No longer used. * pipelines: Run CodeQL analysis for Go on 1ES Hosted Pool Vendoring the Microsoft Graph SDK for Go causes memory consumption during CodeQL analysis to double due to its enormous API surface, putting it well beyond the memory limit of standard GitHub Action runners. I inquired with the Azure organization admins about provisioning larger GitHub runners, but was directed instead to use the 1ES Hosted Pool which runs our other CI checks. Since ARO controls the VM type for Hosted Pool agents, we can use a VM type with adequate memory for CodeQL analysis with the Graph SDK. Note: Implemented CodeQL commands in a template in case we ever decide to move Javascript or Python analysis to 1ES Hosted Pool as well.
Which issue this PR addresses:
[ARO-1919] AAD: Migrate ARO-RP Azure AD Graph -> Microsoft Graph
What this PR does / why we need it:
Microsoft has announced the impending end-of-support for Azure Active Directory Graph API:
We need to migrate the RP code to the Microsoft Graph SDK for Go.
Test plan for issue:
Pass all* GitHub checks, especially E2E tests.
* Except perhaps CodeQL, see additional notes below.
Is there any documentation that needs to be updated for this PR?
None that I'm aware of. This is an internal RP change and should not alter its behavior.
Additional notes:
CodeQL currently fails due to the vendoring of Microsoft Graph SDK for Go, which, due to its enormous API surface, causes memory consumption during CodeQL analysis to double, putting it well beyond the 8 GiB memory limit of standard GitHub Action runners. I'm working on moving CodeQL analysis to an Azure Pipeline, where we can use a virtual machine with adequate RAM (see the "pipelines" commit).