-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SageMaker Studio long start-up time #352
Comments
Hi @atzev, I can confirm this behavior. I will pass this information internally to see if we can pick up this task and improve the user experience when using ML Studio from data.all |
Hello Team, do we have any fix or workaround for this issue? |
A colleague has done some investigation and has a explanation for the issue. Basically in the CDK code, the SageMaker Domain is defined with “VPCOnly” together with the default VPC (which has only public subnets): A snippet of code that would work is to create the right VPC for SageMaker Studio with VPCOnly mode:
|
…for SageMaker domain (#420) ### Feature or Bugfix - Feature - Bugfix ### Detail - Instead of creating the SageMaker Studio domain as a nested stack we create it as part of the environment stack. To clearly show that the resources created for SageMaker are part of the ML Studio functionality they `check_existing_sagemaker_studio_domain` and `create_sagemaker_domain_resources` are class methods of `SageMakerDomain` placed in `backend/dataall/cdkproxy/stacks/sagemakerstudio.py`. - As reported in #352 data.all uses the default VPC of the account, which does not fill the requirements for SM Studio. This results in long start times. This PR also adds the creation of a dedicated VPC that solves the issue of slow starts. - It is not possible to modify the networking configuration of an existing SageMaker Studio domain. In CloudFormation it deletes and re-creates the domain (replacement= True), and if it has Studio users it results in failure of the CloudFormation stack. For this reason I kept the previous implementation using the default VPC. If a customer opts to use a dedicated networking they need to delete the default VPC. This is an interim solution and we will look for better ways to migrate to a dedicated SM VPC once we get more info on how customers are using data.all ML Studio ### Relates - #409 - #352 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix - Enhancement ### Detail - Added method to check if default VPC exists instead of relying in CFN stack failing. Since it uses the cdk-look-up role we do not need to add any ec2 permissions on the pivotRole ### Relates - #352 #409 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Fixed in #409 |
Describe the bug
As a user, when I open SageMaker Studio from the data.all portal I have to wait several minutes before I can start using Studio.
How to Reproduce
New user:
Returning user:
Expected behavior
For existing users, Studio load within seconds after the use clicks to launch Jupyter Lab icon.
For new users, Studio should launch as soon as the app is created.
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.8
AWS data.all version
1.4.1
Additional context
No response
The text was updated successfully, but these errors were encountered: