Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Regression Tests Github action #2403

Merged
merged 37 commits into from
Jul 14, 2023
Merged

Conversation

agunapal
Copy link
Collaborator

@agunapal agunapal commented Jun 9, 2023

Description

This PR creates a github action to build TorchServe Docker Images and run regression tests inside a docker container

To make this happen,

  • Create a new target for building docker ci image
  • Some of the tests don't run as is inside the container. These need to be fixed. Skipping them for now
  • Run docker regression tests nightly as they take longer to run. Each run takes around 1.5 hours

Fix these Pytests

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

2_docker-regression (self-hosted, regression-test-gpu).txt
1_docker-regression (ubuntu-20.04).txt

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Merging #2403 (14f6139) into master (2f5a784) will not change coverage.
The diff coverage is n/a.

❗ Current head 14f6139 differs from pull request most recent head b126b2e. Consider uploading reports for the commit b126b2e to get more accurate results

@@           Coverage Diff           @@
##           master    #2403   +/-   ##
=======================================
  Coverage   71.89%   71.89%           
=======================================
  Files          78       78           
  Lines        3654     3654           
  Branches       58       58           
=======================================
  Hits         2627     2627           
  Misses       1023     1023           
  Partials        4        4           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@agunapal agunapal changed the title (WIP)Docker Regression Tests Github action Docker Regression Tests Github action Jun 9, 2023
@agunapal agunapal requested review from msaroufim and mreso June 9, 2023 17:04
test/pytest/test_example_intel_extension_for_pytorch.py Outdated Show resolved Hide resolved
test/pytest/test_torch_compile.py Show resolved Hide resolved
.github/workflows/regression_tests_docker.yml Show resolved Hide resolved
docker/Dockerfile.base Outdated Show resolved Hide resolved
docker/build_image.sh Outdated Show resolved Hide resolved
docker/Dockerfile.ci Outdated Show resolved Hide resolved
mreso
mreso previously requested changes Jun 9, 2023
docker/Dockerfile.base Outdated Show resolved Hide resolved
docker/Dockerfile.base Outdated Show resolved Hide resolved
docker/Dockerfile.base Outdated Show resolved Hide resolved
docker/Dockerfile.base Outdated Show resolved Hide resolved
docker/build_image.sh Outdated Show resolved Hide resolved
docker/build_image.sh Outdated Show resolved Hide resolved
test/pytest/test_torch_compile.py Show resolved Hide resolved
test/regression_tests.py Show resolved Hide resolved
@agunapal agunapal requested review from msaroufim and mreso July 3, 2023 21:11
.github/workflows/regression_tests_docker.yml Outdated Show resolved Hide resolved
.github/workflows/regression_tests_docker.yml Show resolved Hide resolved
.github/workflows/regression_tests_docker.yml Show resolved Hide resolved
.github/workflows/regression_tests_docker.yml Show resolved Hide resolved
docker/Dockerfile Show resolved Hide resolved
docker/Dockerfile Outdated Show resolved Hide resolved
docker/Dockerfile Show resolved Hide resolved
examples/dcgan_fashiongen/create_mar.sh Show resolved Hide resolved
test/regression_tests.py Show resolved Hide resolved
@agunapal agunapal requested a review from msaroufim July 12, 2023 19:09
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved assuming CI is green on those images - for the failing test tracker please create a seperate github issue

@agunapal agunapal requested review from namannandan and lxning July 14, 2023 16:42
run: |
echo "Cleaning up previous run"
ls -la ./
sudo rm -rf ./* || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why do we need the || true part? Does this command fail in certain conditions and we want to ignore it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are weird permission issues because of pycache geenrated by docker run. This logic worked for me. Will have to monitor the runs for the next few days

cd docker
./build_image.sh -g -cv cu117 -bt ci -b $GITHUB_REF_NAME -t pytorch/torchserve:ci
- name: Torchserve GPU Regression Tests
if: false == contains(matrix.hardware, 'ubuntu')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would it be better check for the specific GPU runner attributes/labels instead of absence of ubuntu?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, the problem is that we need to check for the entire thing: [self.hosted, ...] . I couldnt get it to work , hence went for the negation logic.

@agunapal agunapal dismissed mreso’s stale review July 14, 2023 17:59

Re-requested review after new changes. GitHub doesn't seem to reflect it.

@agunapal agunapal merged commit 5390025 into master Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants