Specify repo and branch when testing against upstream #367

sneakers-the-rat · 2025-02-01T06:17:25Z

Related to the discussion re: python 3.13:

Builds off: #366
because it's just simpler to deal with the pull request metadata when the action is triggered by pull_request.

So linkml-runtime is tightly coupled to linkml, and there are enough tricky complexities in both that it's inevitable that we need to make coordinated changes to both packages at the same time - e.g. with the upgrade to python 3.13 support, neither linkml-runtime nor linkml could have the PR merged with passing tests because both sets of changes necessarily depended on the other.

This PR lets you specify a branch and/or repository to run the upstream tests in the body of the pull request by starting it with two lines like

upstream_repo: sneakers-the-rat/linkml
upstream_branch: python-3.13

blah blah blah the rest of the PR message

It also allows you to specify those when you run it via manual workflow_dispatch.

It can be observed as working here, noting that we get the correct repo and correct branch:
sneakers-the-rat#1
and this PR is an example of working with the default upstream when none is specified

This shouldn't be any more dangerous than running the action for external PRs already is, since the user input is cast to an environment variable (rather than being directly injected), and while it is true they could be testing against an upstream branch that has been modified to pass all tests, or try to snatch environment variables, the same is already true with the existing action.

hopefully this makes things a lil easier and makes co-maintaining these repos less of a pain

dalito · 2025-02-01T10:42:20Z

For me the action-changes are difficult to review regarding possible safety implications. I ran it through zizmor and it has several complains. Reviewing/attesting that it is nevertheless OK to use possibly unsafe ways is getting hard and requires a lot of knowledge or reading to be sure. For example, after reading https://securitylab.github.com/resources/github-actions-untrusted-input/ I am not sure if injecting via workflow_dispatch inputs is another risk.

So maybe making the action so powerful is not worth it?
#366 will already allow to detect problems earlier and the additions of "Upstream Testing" to CONTRIBUTING will help developers to test this locally.

Thanks for showing more clever ways to use gh-actions.

sneakers-the-rat · 2025-02-01T11:35:48Z

i think it's always worth being careful about user input. in this case I think we do avoid the threats here which are injection and data exfil, but happy to be told i'm wrong about that.

injection
the user input for the pr comment method is born as a string via the env parameter so it is never interpolated into the run statement. it is not ever eval'd within the shell, and is immediately piped into head and grep et al. I followed github's advice on how to trust the actions interpolation methods, and they recommend the env parameter method of avoiding injection (linked above).

the input for the workflow dispatch i think we can be relatively loose with - the only people that can do a workflow_dispatch are those that have write access to the repo, so if they want to abuse the workflow dispatch input, then they could already modify the action and anything else to do whatever they wanted to do in a more effective way.

the injection vulns in that post (which i love, btw) and in the above gh security docs involve injection via abusing the templating system (out of band) or eval's that are done within the action (in band) and I think we avoid both.

exfil
These risks come assuming the user input does not escape the intended frame via e.g. injection into the run statement. these also define the scope of the threat - what can be done assuming there is an injection attack?

With an out of band attack in the templating step, they may have access to the outer execution environment of the action, which is sort of out of our hands. with an in band attack we assume they have access to the environment itself and can run arbitrary commands. We don't have any trusted secrets exposed in these workflows, as far as i know, and each leg that is exposed to user input is gated by an event type check. that post says that different events have different kinds of information, and the most dangerous one is pull_request here, which is always read-only and doesn't access secrets. the question is then what additional risks come by allowing the upstream to be any repository? once we get to the point of caring about what the contents of the upstream repo that's specified are, the risk here is identical to running an action for any PR: we assume the code in the PR, run by pytest, etc. can be anything.

we could have this be something gated by maintainer approval tho, while it's annoying to need (repeated) approval for the upstream action with the main upstream repo, we could wait for approval if someone wants to specify a custom repo. that would mean removing any parsing of the PR text, and it would have to be manually input by the maintainer (i.e. we would do an event triggered by an issue comment and then filter the author of the comment to only those in the maintainer group). that's a little in the category-2 type errors that are already inherent in allowing actions to be run against the contents of a PR, but might be reasonable as an extra-paranoid step. these situations where we need synchronized PRs are not all that rare tho, so it would also take it out of 'improving the flow of the work' and back down to 'making the work possible' too.

but ya plz roast me on security, i am sure i got something wrong as always, that's just my read of the threat situation

dalito · 2025-02-01T11:57:47Z

Thanks for pointing out that workflow_dispatch requires write access (documented here). So it is fine.
I saw that you follow the gh best-practises and as far as I can tell (based on my limited knowledge) all is correct.

sneakers-the-rat · 2025-02-01T12:05:59Z

I would so much rather be double checked and corrected than getting this stuff wrong, thx for having eyes up

sneakers-the-rat added 13 commits January 31, 2025 18:26

always run upstream tests

bde0ec7

forgot about one additional limit on run

a65b0de

ability to specify upstream branch in PR body or workflow dispatch

3e8bf37

specify upstream repo too

6553c13

specify upstream repo too

e10f60f

multiline string body

9ba792b

bash will be the death of me

1a0e7d3

docs

ab59dfb

not fail fast

a7a17ee

simplify getting pr body

767bffa

lil more docs on declaring upstream

167803b

don't ask for confirmation when uninstalling

bc2362e

no pipefail/ e mode

fab1d2f

sneakers-the-rat mentioned this pull request Feb 1, 2025

Add python 3.13, Remove python 3.8 #345

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify repo and branch when testing against upstream #367

Specify repo and branch when testing against upstream #367

sneakers-the-rat commented Feb 1, 2025 •

edited

Loading

dalito commented Feb 1, 2025

sneakers-the-rat commented Feb 1, 2025

dalito commented Feb 1, 2025 •

edited

Loading

sneakers-the-rat commented Feb 1, 2025

Specify repo and branch when testing against upstream #367

Are you sure you want to change the base?

Specify repo and branch when testing against upstream #367

Conversation

sneakers-the-rat commented Feb 1, 2025 • edited Loading

dalito commented Feb 1, 2025

sneakers-the-rat commented Feb 1, 2025

dalito commented Feb 1, 2025 • edited Loading

sneakers-the-rat commented Feb 1, 2025

sneakers-the-rat commented Feb 1, 2025 •

edited

Loading

dalito commented Feb 1, 2025 •

edited

Loading