Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt to also run e2e tests that needs gpus #1070

Merged
merged 9 commits into from
Jan 10, 2024
Merged

Conversation

winglian
Copy link
Collaborator

@winglian winglian commented Jan 9, 2024

No description provided.

@winglian winglian requested a review from hamelsmu January 9, 2024 15:58
@hamelsmu
Copy link
Collaborator

hamelsmu commented Jan 9, 2024

Hopefully this won't block CI pipeline too much

Copy link
Collaborator

@hamelsmu hamelsmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like something is wrong with this command see the CI logs

@winglian
Copy link
Collaborator Author

winglian commented Jan 9, 2024

Looks like something is wrong with this command see the CI logs

'"all" 🤦

@hamelsmu
Copy link
Collaborator

hamelsmu commented Jan 9, 2024

@winglian I dont have access to the hosted runner, but I think that we need to set nvidia runtime as the default (just trying to interpret the error message).

On your self-hosted runner, in /etc/docker/daemon.json the default-runtime should be this:

{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

If this is not the case, make the change and then restart docker by running

sudo systemctl restart docker

@winglian winglian force-pushed the docker-e2e-enable branch 3 times, most recently from 639de26 to 3f8a9d1 Compare January 9, 2024 20:36
ensure wandb is dissabled for docker pytests
clear wandb env after testing
clear wandb env after testing
make sure to provide a default val for pop
tryin skipping wandb validation tests
explicitly disable wandb in the e2e tests
explicitly report_to None to see if that fixes the docker e2e tests
split gpu from non-gpu unit tests
skip bf16 check in test for now
build docker w/o cache since it uses branch name ref
revert some changes now that caching is fixed
skip bf16 check if on gpu w support
@winglian winglian merged commit 788649f into main Jan 10, 2024
6 checks passed
@winglian winglian deleted the docker-e2e-enable branch January 10, 2024 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants