Skip to content

Commit

Permalink
Sagemaker test docs update for framework upgrade (#11206)
Browse files Browse the repository at this point in the history
* increased train_runtime for model parallelism

* added documentation for framework upgrade
  • Loading branch information
philschmid authored and Rocketknight1 committed Apr 21, 2021
1 parent 8d4cba6 commit 36cfbf2
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 6 deletions.
6 changes: 2 additions & 4 deletions tests/sagemaker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,7 @@ images:
```
2. In the PR comment describe what test, we ran and with which package versions. Here you can copy the table from [Current Tests](#current-tests).
TODO: Add a screenshot of PR + Text template to make it easy to open.
2. In the PR comment describe what test we ran and with which framework versions. Here you can copy the table from [Current Tests](#current-tests). You can take a look at this [PR](https://github.com/aws/deep-learning-containers/pull/1016), which information are needed.
## Test Case 2: Releasing a New AWS Framework DLC
Expand All @@ -92,7 +91,6 @@ AWS_PROFILE=<enter-your-profile> make test-sagemaker
```
These tests take around 10-15 minutes to finish. Preferably make a screenshot of the successfully ran tests.


### After successful Tests:

After we have successfully run tests for the new framework version we need to create a PR at the [Deep Learning Container Repository](https://github.com/aws/deep-learning-containers).
Expand Down Expand Up @@ -136,7 +134,7 @@ images:
docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /,
*CUDA_VERSION, /Dockerfile., *DEVICE_TYPE ]
```
2. In the PR comment describe what test we ran and with which framework versions. Here you can copy the table from [Current Tests](#current-tests). You can take a look at this [PR](https://github.com/aws/deep-learning-containers/pull/1016), which information are needed.
2. In the PR comment describe what test we ran and with which framework versions. Here you can copy the table from [Current Tests](#current-tests). You can take a look at this [PR](https://github.com/aws/deep-learning-containers/pull/1025), which information are needed.
## Current Tests
Expand Down
4 changes: 2 additions & 2 deletions tests/sagemaker/test_multi_node_model_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@
"script": "run_glue_model_parallelism.py",
"model_name_or_path": "roberta-large",
"instance_type": "ml.p3dn.24xlarge",
"results": {"train_runtime": 1500, "eval_accuracy": 0.3, "eval_loss": 1.2},
"results": {"train_runtime": 1600, "eval_accuracy": 0.3, "eval_loss": 1.2},
},
{
"framework": "pytorch",
"script": "run_glue.py",
"model_name_or_path": "roberta-large",
"instance_type": "ml.p3dn.24xlarge",
"results": {"train_runtime": 1500, "eval_accuracy": 0.3, "eval_loss": 1.2},
"results": {"train_runtime": 1600, "eval_accuracy": 0.3, "eval_loss": 1.2},
},
]
)
Expand Down

0 comments on commit 36cfbf2

Please sign in to comment.