Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify repo and branch when testing against upstream #367

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

sneakers-the-rat
Copy link
Contributor

@sneakers-the-rat sneakers-the-rat commented Feb 1, 2025

Related to the discussion re: python 3.13:

Builds off: #366
because it's just simpler to deal with the pull request metadata when the action is triggered by pull_request.

So linkml-runtime is tightly coupled to linkml, and there are enough tricky complexities in both that it's inevitable that we need to make coordinated changes to both packages at the same time - e.g. with the upgrade to python 3.13 support, neither linkml-runtime nor linkml could have the PR merged with passing tests because both sets of changes necessarily depended on the other.

This PR lets you specify a branch and/or repository to run the upstream tests in the body of the pull request by starting it with two lines like

upstream_repo: sneakers-the-rat/linkml
upstream_branch: python-3.13

blah blah blah the rest of the PR message

It also allows you to specify those when you run it via manual workflow_dispatch.

It can be observed as working here, noting that we get the correct repo and correct branch:
sneakers-the-rat#1
and this PR is an example of working with the default upstream when none is specified

This shouldn't be any more dangerous than running the action for external PRs already is, since the user input is cast to an environment variable (rather than being directly injected), and while it is true they could be testing against an upstream branch that has been modified to pass all tests, or try to snatch environment variables, the same is already true with the existing action.

hopefully this makes things a lil easier and makes co-maintaining these repos less of a pain

@dalito
Copy link
Member

dalito commented Feb 1, 2025

For me the action-changes are difficult to review regarding possible safety implications. I ran it through zizmor and it has several complains. Reviewing/attesting that it is nevertheless OK to use possibly unsafe ways is getting hard and requires a lot of knowledge or reading to be sure. For example, after reading https://securitylab.github.com/resources/github-actions-untrusted-input/ I am not sure if injecting via workflow_dispatch inputs is another risk.

So maybe making the action so powerful is not worth it?
#366 will already allow to detect problems earlier and the additions of "Upstream Testing" to CONTRIBUTING will help developers to test this locally.

Thanks for showing more clever ways to use gh-actions.

@sneakers-the-rat
Copy link
Contributor Author

i think it's always worth being careful about user input. in this case I think we do avoid the threats here which are injection and data exfil, but happy to be told i'm wrong about that.

injection
the user input for the pr comment method is born as a string via the env parameter so it is never interpolated into the run statement. it is not ever eval'd within the shell, and is immediately piped into head and grep et al. I followed github's advice on how to trust the actions interpolation methods, and they recommend the env parameter method of avoiding injection (linked above).

the input for the workflow dispatch i think we can be relatively loose with - the only people that can do a workflow_dispatch are those that have write access to the repo, so if they want to abuse the workflow dispatch input, then they could already modify the action and anything else to do whatever they wanted to do in a more effective way.

the injection vulns in that post (which i love, btw) and in the above gh security docs involve injection via abusing the templating system (out of band) or eval's that are done within the action (in band) and I think we avoid both.

exfil
These risks come assuming the user input does not escape the intended frame via e.g. injection into the run statement. these also define the scope of the threat - what can be done assuming there is an injection attack?

With an out of band attack in the templating step, they may have access to the outer execution environment of the action, which is sort of out of our hands. with an in band attack we assume they have access to the environment itself and can run arbitrary commands. We don't have any trusted secrets exposed in these workflows, as far as i know, and each leg that is exposed to user input is gated by an event type check. that post says that different events have different kinds of information, and the most dangerous one is pull_request here, which is always read-only and doesn't access secrets. the question is then what additional risks come by allowing the upstream to be any repository? once we get to the point of caring about what the contents of the upstream repo that's specified are, the risk here is identical to running an action for any PR: we assume the code in the PR, run by pytest, etc. can be anything.

we could have this be something gated by maintainer approval tho, while it's annoying to need (repeated) approval for the upstream action with the main upstream repo, we could wait for approval if someone wants to specify a custom repo. that would mean removing any parsing of the PR text, and it would have to be manually input by the maintainer (i.e. we would do an event triggered by an issue comment and then filter the author of the comment to only those in the maintainer group). that's a little in the category-2 type errors that are already inherent in allowing actions to be run against the contents of a PR, but might be reasonable as an extra-paranoid step. these situations where we need synchronized PRs are not all that rare tho, so it would also take it out of 'improving the flow of the work' and back down to 'making the work possible' too.

but ya plz roast me on security, i am sure i got something wrong as always, that's just my read of the threat situation

@dalito
Copy link
Member

dalito commented Feb 1, 2025

Thanks for pointing out that workflow_dispatch requires write access (documented here). So it is fine.
I saw that you follow the gh best-practises and as far as I can tell (based on my limited knowledge) all is correct.

@sneakers-the-rat
Copy link
Contributor Author

I would so much rather be double checked and corrected than getting this stuff wrong, thx for having eyes up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants