Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI testing of CUDA #457

Closed
Infinoid opened this issue May 7, 2021 · 5 comments · Fixed by #458
Closed

CI testing of CUDA #457

Infinoid opened this issue May 7, 2021 · 5 comments · Fixed by #458

Comments

@Infinoid
Copy link
Contributor

Infinoid commented May 7, 2021

We have some CI testing, for a limited set of features. But we need better feedback for when a change breaks something on CUDA.

Github action runners run on constricted VMs with 2 cores and no GPUs. Thus, if we want to build and run tests automatically on CUDA, we need to provide the hardware ourselves.

@Infinoid
Copy link
Contributor Author

Infinoid commented May 7, 2021

I have done some testing locally, it's not very hard to set up a self-hosted runner. I temporarily ran one on my desktop PC in a docker container with access to the video card and nothing else, and I was able to run tests.

The problem is security. A CI system is effectively downloading code from random people on the internet, compiling it and running it. The code should be sanity checked first.

Github has some warnings about using self-hosted runners with public repos. Related discussions suggest that the actual problem is actions triggered on pull requests, because anyone can submit a pull request. Triggering them on pushes (or manually) is preferable, because only members of the TACO team can do that.

What is needed is a process where only members of the TACO team can trigger the CUDA test, and they have the opportunity to review pull request code before doing so. Github has a way to define an action that is run explicitly, by saying the action is triggered on workflow_dispatch. I suggest we use that, and follow a workflow like this:

  1. Someone submits a PR
  2. Taco dev reviews the PR
  3. Taco dev creates a temp branch within the taco repo for the purpose of running tests
  4. Taco dev triggers the test job on the temp branch
  5. Self-hosted runner runs the tests
  6. Temp branch is removed
  7. Once those pass, the PR can be merged

Steps 3, 4 and 6 can be scripted, but it is crucial that the review step is done first.

Our CUDA runner(s) should not be available to run jobs for forked repos, or other repos outside of the control of the TACO team. (I am not sure what the exact security model is for forks, we need to test this.)

I have set up a temporary test configuration in my personal fork of taco. It has two actions defined: normal tests, and cuda tests:

screenshot of my actions list

On normal pushes and pull requests, it tests with recent versions of gcc and clang, on ubuntu-latest and macos (because why not).

Selecting the other workflow, it has a button and a popup dialog to manually trigger it:

screenshot of triggering a cuda test

That action has some input fields defined, and those appear in the popup so you have some control over how taco is built.

@Infinoid
Copy link
Contributor Author

Infinoid commented May 7, 2021

Some problems with the above approach:

  • need to test whether forks can run actions on the self-hosted runner
  • if using docker or some other encapsulation/sandboxing, the guest cuda library versions must be carefully installed and match the host driver version
  • the github runner can try to update itself, and that update process failed for me
  • other users of the GPU can interfere with the test results

@stephenchouca
Copy link
Contributor

It definitely makes sense to make the CUDA tests be triggered manually for pull requests. Would there be any issues with having the tests be triggered automatically for pushes to master though, since only developers with write access to the repo can push to master?

@Infinoid
Copy link
Contributor Author

I think it makes sense to run it automatically for pushes to master too. (Though that would be giving you information after-the-fact, it won't prevent the push.)

@Infinoid
Copy link
Contributor Author

I got a friend to help test the details of forks and unauthorized access to self-hosted runners. Here's what we found.

  • The Settings tab in my fork was not visible to him
  • The CUDA build and test action in my fork was visible to him, but the Run workflow UI widget was not visible
  • When he forked it, actions were disabled in his fork by default
  • When he pushed the green button, he was able to run actions on his fork
  • The CUDA build and test action was visible to him, with a Run workflow UI widget
  • Attempts to launch the CUDA build and test action died immediately with a "No runners available" error message
  • The Settings tab in his fork did not list any self-hosted runners

So that all looks sane to me. If someone wants to run their own self-hosted runner for their own fork, they can. But only taco collaborators can run code on the official one, and they do so via pushes or manual triggers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants