CI testing of CUDA #457

Infinoid · 2021-05-07T12:59:16Z

We have some CI testing, for a limited set of features. But we need better feedback for when a change breaks something on CUDA.

Github action runners run on constricted VMs with 2 cores and no GPUs. Thus, if we want to build and run tests automatically on CUDA, we need to provide the hardware ourselves.

Infinoid · 2021-05-07T13:12:10Z

I have done some testing locally, it's not very hard to set up a self-hosted runner. I temporarily ran one on my desktop PC in a docker container with access to the video card and nothing else, and I was able to run tests.

The problem is security. A CI system is effectively downloading code from random people on the internet, compiling it and running it. The code should be sanity checked first.

Github has some warnings about using self-hosted runners with public repos. Related discussions suggest that the actual problem is actions triggered on pull requests, because anyone can submit a pull request. Triggering them on pushes (or manually) is preferable, because only members of the TACO team can do that.

What is needed is a process where only members of the TACO team can trigger the CUDA test, and they have the opportunity to review pull request code before doing so. Github has a way to define an action that is run explicitly, by saying the action is triggered on workflow_dispatch. I suggest we use that, and follow a workflow like this:

Someone submits a PR
Taco dev reviews the PR
Taco dev creates a temp branch within the taco repo for the purpose of running tests
Taco dev triggers the test job on the temp branch
Self-hosted runner runs the tests
Temp branch is removed
Once those pass, the PR can be merged

Steps 3, 4 and 6 can be scripted, but it is crucial that the review step is done first.

Our CUDA runner(s) should not be available to run jobs for forked repos, or other repos outside of the control of the TACO team. (I am not sure what the exact security model is for forks, we need to test this.)

I have set up a temporary test configuration in my personal fork of taco. It has two actions defined: normal tests, and cuda tests:

On normal pushes and pull requests, it tests with recent versions of gcc and clang, on ubuntu-latest and macos (because why not).

Selecting the other workflow, it has a button and a popup dialog to manually trigger it:

That action has some input fields defined, and those appear in the popup so you have some control over how taco is built.

Infinoid · 2021-05-07T13:18:06Z

Some problems with the above approach:

need to test whether forks can run actions on the self-hosted runner
if using docker or some other encapsulation/sandboxing, the guest cuda library versions must be carefully installed and match the host driver version
the github runner can try to update itself, and that update process failed for me
other users of the GPU can interfere with the test results

stephenchouca · 2021-05-12T04:18:13Z

It definitely makes sense to make the CUDA tests be triggered manually for pull requests. Would there be any issues with having the tests be triggered automatically for pushes to master though, since only developers with write access to the repo can push to master?

Infinoid · 2021-05-12T11:46:36Z

I think it makes sense to run it automatically for pushes to master too. (Though that would be giving you information after-the-fact, it won't prevent the push.)

Infinoid · 2021-05-13T12:52:12Z

I got a friend to help test the details of forks and unauthorized access to self-hosted runners. Here's what we found.

The Settings tab in my fork was not visible to him
The CUDA build and test action in my fork was visible to him, but the Run workflow UI widget was not visible
When he forked it, actions were disabled in his fork by default
When he pushed the green button, he was able to run actions on his fork
The CUDA build and test action was visible to him, with a Run workflow UI widget
Attempts to launch the CUDA build and test action died immediately with a "No runners available" error message
The Settings tab in his fork did not list any self-hosted runners

So that all looks sane to me. If someone wants to run their own self-hosted runner for their own fork, they can. But only taco collaborators can run code on the official one, and they do so via pushes or manual triggers.

This was referenced May 13, 2021

Define CUDA CI actions #458

Merged

Add a second CUDA CI workflow for automatic testing of pushes. #461

Draft

Test regression in scheduling_eval.spmvGPU, scheduling_eval.ttvGPU #462

Closed

stephenchouca closed this as completed in #458 May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI testing of CUDA #457

CI testing of CUDA #457

Infinoid commented May 7, 2021

Infinoid commented May 7, 2021

Infinoid commented May 7, 2021

stephenchouca commented May 12, 2021

Infinoid commented May 12, 2021

Infinoid commented May 13, 2021

CI testing of CUDA #457

CI testing of CUDA #457

Comments

Infinoid commented May 7, 2021

Infinoid commented May 7, 2021

Infinoid commented May 7, 2021

stephenchouca commented May 12, 2021

Infinoid commented May 12, 2021

Infinoid commented May 13, 2021