Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AcrPull permission through Azure Lighthouse results in 401 unauthorized error #11825

Closed
evdbogaard opened this issue Sep 12, 2023 · 28 comments
Closed

Comments

@evdbogaard
Copy link

Bicep version
Bicep CLI version 0.21.1 (d4acbd2)

Describe the bug
Getting 401 Unauthorized error when trying to restore file with reference to Bicep Registry in another tenant.
We have Azure Lighthouse setup to give AcrPull permissions to all our developers. This setup has worked perfectly for months, but suddenly we noticed when we needed to add a new file to the registry that everything now returns 401 error.

Locally I'm logged in with azure CLI which points to subscription A. On the tenant that subscription is in a AD group was created called developers which I'm a member of.
The Bicep Registry is located in subscription B which is on a different tenant. Azure Lighthouse is used to give AcrPull permissions for that AD group. This setup has worked for many months when we initially introduced it. Now it fails to authenticate every time and I'm not sure why this suddenly happens. We're not touching this piece of code a lot, so I'm unsure at what point it started breaking.

I saw in different issues (#5030) that Ligthouse is still mentioned as a solution for cross tenant bicep registries. Is this still the case or did something break?

To Reproduce
Create Bicep registry in Tenant A.
Create AD Group in Tenant B.
Add yourself to the AD group.
On Tenant B create Lighthouse offering (ARM template) that sets AcrPull permissions on subscription level for the created AD group.
Run the template on subscription in Tenant A (Deploy a custom template option).
Make sure you are logged in with azure CLI to the subscription on tenant B.
Create a main.bicep file with a bicep registry reference to the created registry
Run az bicep restore -f main.bicep --force

Additional context
Here is the ARM template we used for ligthouse

{
 "$schema": "https://schema.management.azure.com/schemas/2019-08-01/subscriptionDeploymentTemplate.json#",
 "contentVersion": "1.0.0.0",
 "parameters": {
  "mspOfferName": {
   "type": "string",
   "metadata": {
    "description": "Specify a unique name for your offer"
   },
   "defaultValue": "Bicep Test"
  },
  "mspOfferDescription": {
   "type": "string",
   "metadata": {
    "description": "Name of the Managed Service Provider offering"
   },
   "defaultValue": ""
  }
 },
 "variables": {
  "mspRegistrationName": "[guid(parameters('mspOfferName'))]",
  "mspAssignmentName": "[guid(parameters('mspOfferName'))]",
  "managedByTenantId": "GUID of Tenant B",
  "authorizations": [
   {
    "principalId": "principalId of AD Group",
    "roleDefinitionId": "7f951dda-4ed3-4680-a7ca-43fe172d538d",
    "principalIdDisplayName": "Developers"
   }
  ]
 },
 "resources": [
  {
   "type": "Microsoft.ManagedServices/registrationDefinitions",
   "apiVersion": "2020-02-01-preview",
   "name": "[variables('mspRegistrationName')]",
   "properties": {
    "registrationDefinitionName": "[parameters('mspOfferName')]",
    "description": "[parameters('mspOfferDescription')]",
    "managedByTenantId": "[variables('managedByTenantId')]",
    "authorizations": "[variables('authorizations')]"
   }
  },
  {
   "type": "Microsoft.ManagedServices/registrationAssignments",
   "apiVersion": "2020-02-01-preview",
   "name": "[variables('mspAssignmentName')]",
   "dependsOn": [
    "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"
   ],
   "properties": {
    "registrationDefinitionId": "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"
   }
  }
 ],
 "outputs": {
  "mspOfferName": {
   "type": "string",
   "value": "[concat('Managed by', ' ', parameters('mspOfferName'))]"
  },
  "authorizations": {
   "type": "array",
   "value": "[variables('authorizations')]"
  }
 }
}
@miqm
Copy link
Collaborator

miqm commented Sep 12, 2023

I have this same problem for a month now. I opened a support ticket but they are still investigating. I seems like lighthouse problem but i haven't been able to dig into deeper. As a workaround we enabled anonymous access to acr but something had happened with lighthouse imho.

@evdbogaard
Copy link
Author

Enabling anonymous access is unfortunately not an option for us. It would be really great to hear from the Bicep team if this setup of Azure Lighthouse is a supported way to solve cross-tenant authentication or that we should look for others ways to solve our issue.

@miqm
Copy link
Collaborator

miqm commented Sep 14, 2023

Enabling anonymous access is unfortunately not an option for us. It would be really great to hear from the Bicep team if this setup of Azure Lighthouse is a supported way to solve cross-tenant authentication or that we should look for others ways to solve our issue.

I suggest you raise a ticket to support. Lighthouse should work, no matter what permissions you provide. If it doesn't - there's a problem.

@evdbogaard
Copy link
Author

We did create a support ticket for this issue, but that hasn't helped us so far. According to the responses in the ticket somewhere Azure Container Registry is making a data plane operation to authenticate and Azure Lighthouse doesn't support those, only control plane operations.

As it's suddenly broken my guess would be ACR changed something in the way authentication works and that's why the Azure Lighthouse setup currently fails. Still this is just me guessing as this wasn't confirmed in the support ticket, just that they are further investigating it.

@alex-frankel
Copy link
Collaborator

@evdbogaard - is the support case still in progress? I agree with your assessment. It seems like the ACR team made a breaking change with this new/updated authentication in the data plane. They should be engaged by support to come up with a mitigation.

@evdbogaard
Copy link
Author

The ticket is still open. I've linked this issue in it and asked if they could check with the ACR team. When I hear more I'll update it here as well.

@miqm
Copy link
Collaborator

miqm commented Sep 25, 2023

My ticket is also still open but it seems no one knows how to approach it.

@alex-frankel
Copy link
Collaborator

Has the support engineers confirmed they are attempting to work with the ACR team?

@miqm
Copy link
Collaborator

miqm commented Sep 26, 2023

They have been working with ACR but it seems no one knows how to approach it.

My ticket ID #2308030050000330, feel free to join into

@evdbogaard
Copy link
Author

evdbogaard commented Sep 27, 2023

I got a reply back from the ACR team:

From my analysis everything is working fine with AKS side. By looking the error message, it seems the issue is related authentication from lighthouse RBAC permissions and we don't have any resource and document to investigate it further. Also, we don't have any reference for the ACR integration with the Lighthouse as well.

Which to me sounds like Azure Lighthouse isn't supported in combination with ACR. Which feels weird to me as why would there be a AcrPull permission in Lighthouse if it can't do anything?

@miqm
Copy link
Collaborator

miqm commented Sep 29, 2023

@evdbogaard - it was working before, so something had to change recently. I did a test and assigned a CONTRIBUTOR role via Lighthouse. I was able to modify ACR, but I couldn't login to ACR as I was used to do before.

So it's a problem with both ACR and Lighthouse.

@slavizh
Copy link
Contributor

slavizh commented Oct 5, 2023

+1

1 similar comment
@neusert
Copy link

neusert commented Oct 5, 2023

+1

@northtyphoon
Copy link
Member

I'm the dev lead from Azure Container Registry. Just a quick update. We now understand the issue related to cross-tenant permission validation and actively investigating the solution with Lighthouse team. Thanks for reporting the issue and sorry for late response.

@slavizh
Copy link
Contributor

slavizh commented Oct 6, 2023

@northtyphoon thanks for the update! This is especially important scenario for CSP and their customers.

@neusert
Copy link

neusert commented Oct 9, 2023

hi @northtyphoon, do you already have any update on this please? We got a mail from MSFT support saying this scenario is not possible within the current product design and suggesting we should contact the ACR team.

Thanks!

@northtyphoon
Copy link
Member

northtyphoon commented Oct 11, 2023

@neusert we are still reviewing and testing the fix. The full solution will likely be rolled out in several stages. Current priority is to unblock existing customers who already leveraged the feature. I will give another update by the end of this week.

@vagguharinath
Copy link

when can we expect unblock existing customers who already leveraged the feature.

@northtyphoon
Copy link
Member

Update: we plan to rollout the first stage fix next week to unblock existing customers.

@slavizh
Copy link
Contributor

slavizh commented Oct 16, 2023

@northtyphoon how do you identify existing customers?

@evdbogaard
Copy link
Author

Was just checking out our setup and noticed everything was working again. Thanks everyone who helped fixing this issue.

@northtyphoon
Copy link
Member

Update: We have rollout the first stage fix in all public Azure regions. If you still see the issue with LH setup, please open support ticket and our engineer team can help you.
@slavizh if you have new registries created after 9/19/2023 and still run into the issue, please open support ticket.

@alex-frankel
Copy link
Collaborator

Closing the bicep issue since the rollout has started. Thanks for getting this fixed @northtyphoon!

@github-project-automation github-project-automation bot moved this from Todo to Done in Bicep Oct 20, 2023
@slavizh
Copy link
Contributor

slavizh commented Oct 21, 2023

@northtyphoon thanks! I have opened support ticket and now waiting to reach the right people. Initial support is always difficult.

@slavizh
Copy link
Contributor

slavizh commented Nov 9, 2023

@northtyphoon I have logged a ticket but it seems it cannot reach to the right team to apply the fix. Can you help as it is now more than 2 weeks dealing with support - TrackingID#2310200050003425 ?

@northtyphoon
Copy link
Member

@slavizh can you check your registry to confirm it is working? For the delayed response, I apologized to it. The ticket was not routed to our service team for some reason. We are reviewing the process to understand the bottleneck.

@slavizh
Copy link
Contributor

slavizh commented Nov 12, 2023

@northtyphoon works for one of the two registries I have opened the ticket. Yeah, I was trying to tell them multiple times that the ticket needs to reach container registry team but the engineer just didn't listen. I even intentionally opened the ticket for the container registry resource. In the same subscription there is another registry that needs the fix. Not mentioning the name here intentionally.

@slavizh
Copy link
Contributor

slavizh commented Nov 14, 2023

@northtyphoon thank you very much for stepping in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

8 participants