Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create private AML compute #2780

Closed
t-young31 opened this issue Oct 25, 2022 · 3 comments · Fixed by #3052
Closed

Cannot create private AML compute #2780

t-young31 opened this issue Oct 25, 2022 · 3 comments · Fixed by #3052
Assignees
Labels
bug Something isn't working

Comments

@t-young31
Copy link
Contributor

Describe the bug
When trying to create AML compute with no public IP fails with

Provisioning error
The subnet has a network security group (/subscriptions/xxx/resourceGroups/rg-xxx-ws-
98b8/providers/Microsoft.Network/networkSecurityGroups/nsg-ws) that is
missing the following rules below. Please add those rules or increase the
priority to allow traffic if the rules already exist. For network security group
requirements, please refer to https://docs.microsoft.com/azure/machine-
learning/how-to-secure-training-vnet?tabs=azure-studio#azure-machine.
learning-compute-clusterinstance-1.
f
"name": "Outbound Storage 445"
"properties": {
"protocol": "TCP",
"sourcePortRange": "*"
"destinationPortRange": "445"
"sourceAddressPrefix": "VirtualNetwork"
"destinationAddressPrefix": "Storage OR Internet OR Storage.uksouth",
"priority": "Higher than 4096",
"direction": "Outbound"

Looks like the compute is trying to access public storage(?), which is disabled

resource "azurerm_network_security_rule" "allow_outbound_storage_445" {

because the service isn't exposed externally. I've had a look at https://learn.microsoft.com/en-us/azure/machine-learning/how-to-secure-workspace-vnet?tabs=pe%2Ccli#azure-storage-account and there is some storage inside the ws vnet that has a private endpoint, so I'm not really any the wiser.

Any help would be much appreciated!

Steps to reproduce

  1. Deploy TRE v0.6.0
  2. Create workspace with AML (not exposed publicly) and VMs (exposed publicly)
  3. Log into VM
  4. Go to the AML URL
  5. Try to create some compute, ticking the "No public IP (preview)" checkbox
@t-young31 t-young31 added the bug Something isn't working label Oct 25, 2022
@marrobi
Copy link
Member

marrobi commented Oct 26, 2022

@t-young31 agree. The docs say:

(*) 445 is only required if you have a firewall between your virtual network for Azure ML and a private endpoint for your storage accounts.

So not sure why this is being requested. Will ask some questions internally and get back to you.

@marrobi
Copy link
Member

marrobi commented Oct 31, 2022

@t-young31 I read the docs incorrectly - it's if you have private endpoints 445 is still required.

We can limit what that can connect to using a service endpoint policy on the subnet - https://github.com/jhirono/azureml-dlp

What I'm thinking at the moment is that a dedicated subnet for AML is created and the firewall allows outbound traffic form only that subnet, and that subnet has the endpoint policy configured.

Hopefully this will get easier longer term.

@marrobi
Copy link
Member

marrobi commented Oct 31, 2022

This requires this #1846 to be completed first.

@marrobi marrobi self-assigned this Nov 29, 2022
@marrobi marrobi moved this from PR to In Progress in Azure TRE - Engineering Nov 29, 2022
marrobi added a commit to marrobi/AzureTRE that referenced this issue Jan 3, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Azure TRE - Engineering Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants