Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRE cannot scale beyond about 32 projects #3920

Closed
TonyWildish-BH opened this issue May 3, 2024 · 11 comments · Fixed by #4098
Closed

TRE cannot scale beyond about 32 projects #3920

TonyWildish-BH opened this issue May 3, 2024 · 11 comments · Fixed by #4098
Assignees
Labels
bug Something isn't working has workaround a workaround is available for this issue question Further information is requested

Comments

@TonyWildish-BH
Copy link

Description

In my Azure TRE deployment I am trying to test the limits of scalability, since we want to eventually run with up to 150 projects at a time. Yesterday, I created a large number of projects, and at about number 32, they started failing with the message Subscription ******* already contains 250 storage accounts with Standard Dns endpoints in location uksouth and the maximum allowed is 250.

From what I can understand, if the storage endpoints were to use AzureDnsZone endpoints instead of Standard, that will raise the limit to 5000 endpoints, which should be enough for us?

My question is, is it sufficient to update the storage.tf in various places to add dns_endpoint_type = AzureDnsZone, or is there some reason that won't work?

Steps

The steps I have tried are:

  1. create a workspace
  2. go to 1, until failure
  3. look at the error message

Code

n/a

@TonyWildish-BH TonyWildish-BH added the question Further information is requested label May 3, 2024
@marrobi
Copy link
Member

marrobi commented May 3, 2024

@TonyWildish-BH I presume you have other things than the TRE in the subscription? I have run automated tests for 40 plus workspaces which do complete. It's not a quota I've seen others hit.

@SvenAelterman do you know anything about this? https://techcommunity.microsoft.com/t5/azure-storage-blog/public-preview-create-additional-5000-azure-storage-accounts/ba-p/3465466

@TonyWildish-BH
Copy link
Author

not much, there were 3 or 4 other workspaces and they had very little in them, maybe a VM, Guacamole, an ADF... They may have added to the storage account numbers, but the error message about the limit is clear enough, and we're going to hit it well before we reach production scale.

@marrobi
Copy link
Member

marrobi commented May 3, 2024

Let us investigate, I take it the airlock is enabled on each of these workspaces - that could be the difference as it creates a number of storage accounts. Not sure to what scale that has been tested.

@TonyWildish-BH
Copy link
Author

thanks. We do have the airlock enabled, we'll have that on all our workspaces.

@SvenAelterman
Copy link
Collaborator

@marrobi: I am familiar with the (still in preview) DNS-zone based solution to exceed the 250-account limit per subscription. However, just turning this on for the account creation would not work in my estimation because several other Azure services aren't yet capable of dealing with it, including those that TRE leverages. Also, the TRE code should be inspected to determine that there are no hardcoded references to the "blob.core.windows.net" DNS namespace.

@TonyWildish-BH: In the short term, I would recommend requesting an increase in the limit from 250 to 500 accounts per subscription per region using the process described here: https://learn.microsoft.com/azure/quotas/storage-account-quota-requests. This would then give you ~70 workspaces.

In the longer-term, once GA, TRE maintainers could evaluate using DNS-zone based storage accounts instead. However (for many other governance reasons), I would advocate for deploying workspaces in different subscriptions, which would also address this issue: #1073.

@marrobi marrobi added the bug Something isn't working label May 9, 2024
@TonyWildish-BH
Copy link
Author

thanks for the reply, @SvenAelterman, a couple of follow-on questions:

  • is 500 accounts per subscription per region a hard upper limit, or could I conceivably go above that if needed?
  • do you have any idea when DNS-zone based storage accounts will be GA?

I've already thrown my hat in the ring for deployment into different subscriptions, we'd like that so we can let people just spend their own money on their own subscription and not have to concern ourselves with their costs. I'm not aware of any timeline for that to happen, though.

Regarding the issue of inspecting the code base for hardcoded references, that's a generic issue, in that there are many places where object names are derived from parameters instead of looked up from the resource that created the object. It would be really nice to have that cleaned up, but that's also for the future.

@tim-allen-ck tim-allen-ck added the has workaround a workaround is available for this issue label May 22, 2024
@marrobi
Copy link
Member

marrobi commented Jun 18, 2024

@TonyWildish-BH did you manage to increase your subscription storage account limit?

@TonyWildish-BH
Copy link
Author

TonyWildish-BH commented Jun 18, 2024 via email

@tim-allen-ck
Copy link
Collaborator

Hi @TonyWildish-BH can you just confirm you managed to increase the workspace limit?

@TonyWildish-BH
Copy link
Author

Hi @tim-allen-ck. I've not run a test yet, but unless there's another limit somewhere, it should be OK. We can close this ticket, if I hit another issue I can re-open or create a new ticket, as appropriate.

@tim-allen-ck
Copy link
Collaborator

Thanks, I'll update the docs to reference the limit then close this ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working has workaround a workaround is available for this issue question Further information is requested
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants