Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing issue between Graph Group and RBAC assignment. #15991

Closed
DeeNaxic opened this issue Jan 3, 2025 · 5 comments
Closed

Timing issue between Graph Group and RBAC assignment. #15991

DeeNaxic opened this issue Jan 3, 2025 · 5 comments

Comments

@DeeNaxic
Copy link

DeeNaxic commented Jan 3, 2025

Bicep version
Build on an Azure Build Agent:

  • Task AzureCLI Task Version 2.249.8
  • Bicep version 0.31.92
azure-cli                         2.67.0
core                              2.67.0
telemetry                          1.1.0
azure-devops                       1.0.1
msal                              1.31.0
azure-mgmt-resource               23.1.1
Bicep.config (graph)
"extensions": {
    "microsoftGraphV1": "br:mcr.microsoft.com/bicep/extensions/microsoftgraph/v1.0:0.1.8-preview"
  }

Describe the bug
In bicep, when deploying a Entra Group, and then immediately assigning RBAC permissions, the group cannot be found. The returned error explains this very issue, and suggests a fix. The CLI error when running a Deploy:

Principal <guid> does not exist in the directory <guid>. Check that you have the correct principal ID. If you are creating this principal and then immediately assigning a role, this error might be related to a replication delay. In this case, set the role assignment principalType property to a value, such as ServicePrincipal, User, or Group.  See https://aka.ms/docs-principaltype (Code:PrincipalNotFound)

This makes sense, the group HAS just been created, however the suggested change to fix it, is already implemented.

..
resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope       : target
  name        : guid('rbac-${name}-${groupId}-${role}')
  properties  : {
    principalType     : 'Group'
    principalId       : groupId
    roleDefinitionId  : '/providers/Microsoft.Authorization/roleDefinitions/${role}'
  }
}
..

From Googling it appears that other people who ran into this issue, fixed it, by setting the PrincipalType, but all of these have been for UAI and similar, it doesn't work for Groups. When running the code again, everything works as expected, assumingly since the group is now ready.

To Reproduce
Deploy a entra group using the graph bicep modules. Then assign an RBAC role immediately after.

@jeskew
Copy link
Member

jeskew commented Jan 7, 2025

@DeeNaxic I don't think you're doing anything wrong here; RBAC assignments are known to be "eventually consistent" in some cases (i.e., the resource may not be immediately available for use when provisioning is complete). This is one of our main use cases for #1013

@majastrz
Copy link
Member

We are going to look into improving this in the Graph extension so it will wait until the Group is fully provisioned before proceeding to the next resource in the deployment.

@shenglol
Copy link
Contributor

@dkershaw10 @jason-dou Does group creation rely on eventual consistency? I'm curious if this is an area we could enhance using the extensibility v2 asynchronous API.

@dkershaw10
Copy link

dkershaw10 commented Jan 16, 2025

Yes it does. It's eventually consistent in all regions (read replicas) due to the Entra directory being a massively distributed system (that has inherent replication delays). We have a few tricks - for example there's an affinity between Graph session and a directory replica, so that created objects are immediately visible in the same session. Problem here is that the Azure RBAC assignment comes through a different session and is most likely hitting a read replica that doesn't have the latest changes.
We are already tracking this issue here: microsoftgraph/msgraph-bicep-types#193

@shenglol - we're hoping to address this with some retry logic for key cross-service scenarios (see the last statement in the linked issue). However, if the extensibility v2 asynchronous API provides a better and more reliable mechanism, I'd be all for that too. Although not sure how that would work, as Entra APIs are not inherently async (and we don't get any signals that replication has completed AFAIK). Please chat more to @jason-dou (or @eketo-msft who is much more familiar with the directory architecture and operation) about the options here.

@shenglol
Copy link
Contributor

Good to know! Thanks for sharing the details. If Entra APIs are not inherently async, adding some retry logic might be a suitable approach.

Closing this issue as it's already being tracked in the msgraph-bicep-types repo.

@github-project-automation github-project-automation bot moved this from Todo to Done in Bicep Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

5 participants