-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zizmor to pre-commit and fix most findings #127749
Conversation
Nice! As a suggestion: to reduce the risk of the lower-severity findings being ignored indefinitely, the CPython CI could use zizmor's SARIF format here to integrate with GitHub's security scanning feature. Once integrated, zizmor's results get tracked by GitHub itself and can be ignored/suppressed/dismissed within GitHub's own UI. For example, this is what a finding looks like (this one is from Homebrew, and has since been resolved): The main downside to doing this is that it might need to be shoehorned a bit into the pre-commit approach, or broken out entirely (which might be unacceptable if everything else is driven by pre-commit). As such I leave it up to you to do what's best for this repo; just wanted to make sure people were aware of it 🙂 Edit: another thing security scanning enables is ruleset enforcement, e.g. this is what I have on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you!
Nice! As a suggestion: to reduce the risk of the lower-severity findings being ignored indefinitely, the CPython CI could use zizmor's SARIF format here to integrate with GitHub's security scanning feature [...] The main downside to doing this is that it might need to be shoehorned a bit into the pre-commit approach, or broken out entirely (which might be unacceptable if everything else is driven by pre-commit).
No strong opinion on this! I'm used to pre-commit and I'd quite like to get notified of new security issues when running pre-commit run -a
locally. I also think we could track solving the low-severity warnings by opening an issue. But it sounds like this approach also has a lot to recommend it... I'm happy with whatever!
.github/workflows/reusable-docs.yml
Outdated
@@ -39,7 +41,7 @@ jobs: | |||
if: github.event_name == 'pull_request' | |||
run: | | |||
# Fetch enough history to find a common ancestor commit (aka merge-base): | |||
git fetch origin ${{ env.refspec_pr }} --depth=$(( ${{ github.event.pull_request.commits }} + 1 )) \ | |||
git fetch origin ${{ env.refspec_pr }} --depth=$(( ${{ env.commits }} + 1 )) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an argument against doing this, just a howl of frustration at GitHub: it feels so silly that we have to do this to make our workflow secure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's definitely not ideal 🙃.
In this particular case however, this might be a false positive: contextually it looks like github.event.pull_request.commits
can only ever be an integer, so there's no risk of code injection. If that sounds right to you all, I can add this context to zizmor's allow-list (which is currently woefully incomplete, since there are a lot of default contexts).
(And as a separate note: ${{ env ... }}
is still an injection risk, although probably a lesser one in this context. I think the only reason zizmor
isn't warning about it here is because we emit a low-severity warning for it, since it's hard to analyze/generalize over 🙂.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, https://docs.github.com/en/webhooks/webhook-events-and-payloads#pull_request documents that pull_request.commits
will always be an integer -- so we're safe as long as they don't change that 😄
(And as a separate note:
${{ env ... }}
is still an injection risk, although probably a lesser one in this context. I think the only reasonzizmor
isn't warning about it here is because we emit a low-severity warning for it, since it's hard to analyze/generalize over 🙂.)
I'm very ignorant about all this, but I assumed from your docs at https://woodruffw.github.io/zizmor/audits/#template-injection that using an environment variable here would be safer:
The most common forms of template injection are in
run:
and similar code-execution blocks. In these cases, an inline template expansion can typically be replaced by an environment variable whose value comes from the expanded template.This avoids the vulnerability, since variable expansion is subject to normal shell quoting/expansion rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's saf-er but not truly safe? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think that phrasing is confusing! What I was trying to say in those docs is that lines like:
run: foo ${{ env.bar }}
should be ideally replaced with:
run: foo "${BAR}"
env:
BAR: ${{ env.bar }}
In other words: in the best case ${{ ... }}
never ever appears in a run:
or similar block, it should always be forwarded via an env:
setting.
I can improve those docs to make that more clear 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Technically I think run: foo ${bar}
might also work, since GHA should set bar
in the shell environment if env.bar
is set in the context namespace. But there are some weird edge cases around that, so I personally tend to explicitly forward it with an `env: block.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand correctly, if I have something like:
run: echo "Issue title: ${{ github.event.issue.title }}"
and the attacker sets the title to something like "; rm -rf /; echo "
, the run
command will become:
run: echo "Issue title: "; rm -rf /; echo ""
and execute the 3 commands separately (echo
+rm
+echo
).
Whereas if I use
run: echo "Issue title: ${TITLE}"
env:
TITLE: ${{ github.event.issue.title }}
when ${TITLE}
is used in run
it will be escaped and expanded to something like
run: echo "Issue title: \"; rm -rf /; echo \""
which will only execute a single echo
and print Issue title: "; rm -rf /; echo "
, right?
Also what is the difference -- once the env var is set -- between using:
run: echo "Issue title: $TITLE"
run: echo "Issue title: ${TITLE}"
run: echo "Issue title: ${{env.TITLE}}"
Are they all equivalent or do different escaping/expansions rules apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be ideally replaced with:
run: foo "${BAR}" env: BAR: ${{ env.bar }}
Aha, thanks! I've updated the PR. Although this isn't working in Windows' pwsh.exe shell, what's the right syntax there?
https://github.com/python/cpython/actions/runs/12238891899/job/34138074592?pr=127749
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this isn't working in Windows' pwsh.exe shell, what's the right syntax there?
You could try using shell: bash
to force a bash shell even on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they all equivalent or do different escaping/expansions rules apply?
The first two examples are shell interpolations, and have the same expansion rules applied to them. But note: this is only because of the surrounding quotes: if you had instead written echo $TITLE
, then $TITLE
would be expanded using "splatting" rules and its expansion would be interpreted as flags to echo
, rather than a single string input.
The third example is a template interpolation, which bypasses the shell interpolation rules. It's not safe to do in the general case.
TL;DR: you almost always want some variant of "${VAR}"
or "$VAR"
, quoting included 🙂
You could try using
shell: bash
to force a bash shell even on Windows?
Yep, this is the easiest fix -- the alternative is to use pwsh's $env:VARNAME
syntax, but that means having to split your steps.
@@ -61,6 +61,8 @@ jobs: | |||
- run: >- | |||
echo '${{ github.event_name }}' | |||
- uses: actions/checkout@v4 | |||
with: | |||
persist-credentials: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a git fetch
in the next step. Wouldn't that still need the credentials in order to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it needs to be an authenticated git fetch?
It ran okay here: https://github.com/python/cpython/actions/runs/12225521755/job/34099604804
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, persist-credentials
is only needed for git
ops that need authentication. In the context of public repos and their workflows, in practice that means only ops that mutate the repo, rather than pulling from it 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, but if this is the case, does it also mean that as long as we are just reading from public repos, there are no credentials used so there's nothing to share with the other jobs?
IOW, persist-credentials: true
would only be a concern when a token is used, either for mutating operations (e.g. pushes) or for accessing private repos, and in this case there's actually no security risk, right?
(It might still be better to explicitly add persist-credentials: false
regardless though -- e.g. in case someone adds a token later.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, but if this is the case, does it also mean that as long as we are just reading from public repos, there are no credentials used so there's nothing to share with the other jobs?
Nope, unfortunately the default means that a credential is persisted, even though it isn't necessary. So persist-credentials: false
actually does something usefully by explicitly removing the credential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply! I'm still not sure I understand if there is an actual issue with reading (i.e. with no writing/mutating operations) public repos though. I tried digging a bit deeper to understand and this is what I found.
The actions/checkout
README says that it accepts both a token
and an ssh-key
(which we are not using):
# Personal access token (PAT) used to fetch the repository. The PAT is configured
# with the local git config, which enables your scripts to run authenticated git
# commands. The post-job step removes the PAT.
#
# We recommend using a service account with the least permissions necessary. Also
# when generating a new PAT, select the least scopes necessary.
#
# [Learn more about creating and using encrypted secrets](https://help.github.com/en/actions/automating-your-workflow-with-github-actions/creating-and-using-encrypted-secrets)
#
# Default: ${{ github.token }}
token: ''
# SSH key used to fetch the repository. The SSH key is configured with the local
# git config, which enables your scripts to run authenticated git commands. The
# post-job step removes the SSH key.
#
# We recommend using a service account with the least permissions necessary.
#
# [Learn more about creating and using encrypted secrets](https://help.github.com/en/actions/automating-your-workflow-with-github-actions/creating-and-using-encrypted-secrets)
ssh-key: ''
The comment for persist-credentials
says:
# Whether to configure the token or SSH key with the local git config
# Default: true
persist-credentials: ''
So my understanding is that if I explicitly set a token/key with token: '...'
or ssh-key: '...'
and use persist-credentials: true
(either explicitly or implicitly since it's the default), the token/key will be "persisted in the local git config" (which I assume is .git/config
, i.e. it will be written on a file). This will expose the token/key to the other jobs -- which is a security risk.
If I don't explicitly specify any token/key (i.e. what we are doing in our workflows), actions/checkout
will set github.token
as default token and persist that. Since github.token
is already available to all jobs, the fact that it is persisted shouldn't be an issue -- unless having its value written on a file poses a new threat that I'm not aware of.
Nope, unfortunately the default means that a credential is persisted, even though it isn't necessary.
If no credentials are passed explicitly by using token
/ssh-key
(which are not needed while reading public repos, like in our workflows) and assuming persist-credentials: true
, either:
- there is no security issue (unless either a
token
orssh-key
are explicitly set); - persisting
github.token
is still a security issue (maybe because it's written on a file?); 1 - there's some other credential that I'm missing that causes a security issue;
Footnotes
-
if this is a problem, maybe setting
token: ''
explicitly will prevent thegithub.token
to be used (and persisted) at all (depending on how the action is implemented) ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! Thanks for your detailed response as well.
Since github.token is already available to all jobs, the fact that it is persisted shouldn't be an issue -- unless having its value written on a file poses a new threat that I'm not aware of.
Yep: the risk with this is that it's easy to inadvertently upload/log/otherwise persist that local filesystem credential. This can't happen with the in-memory token (i.e. github.token
) in the ordinary case.
For example, until recently, this would cause a token disclosure via a public artifact:
uses: actions/checkout
# other steps
uses: actions/upload-artifact
with:
path: . # uploads the entire repo, including the persisted token
This post has some interesting/elucidative examples of that: https://unit42.paloaltonetworks.com/github-repo-artifacts-leak-tokens/
- if this is a problem, maybe setting
token: ''
explicitly will prevent thegithub.token
to be used (and persisted) at all (depending on how the action is implemented) ↩
Yeah, I believe this will result in an empty string being saved on the filesystem for the token. I'm not 100% sure but I think this will cause GitHub API errors in some cases.
(I think persist-credentials: false
is the intended way to disable github.token
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This post has some interesting/elucidative examples of that: https://unit42.paloaltonetworks.com/github-repo-artifacts-leak-tokens/
Thanks for sharing, this is really interesting!
Now that I have all the pieces of the puzzle it finally makes sense:
- GitHub automatically creates a
github.token
- If no
token: ...
is specified inactions/checkout
, thengithub.token
is used - If
persist-credentials
istrue
(the default), either thegithub.token
or the specifiedtoken
will be written in.git/config
- If the file containing the token becomes publicly accessible (e.g. uploaded via
actions/upload-artifact
) an attacker could access it and use it persist-credentials: false
avoids writing the tokens on files and prevents token leaks
There are a few more caveats here and there, but this is already more than enough to justify the use of persist-credentials: false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! Your summary is great and matches my understanding perfectly 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this Hugo!
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
jobs: | ||
label-dnm: | ||
name: DO-NOT-MERGE | ||
if: github.repository_owner == 'python' | ||
runs-on: ubuntu-latest | ||
permissions: | ||
issues: write | ||
pull-requests: write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These labels are only used on PRs, so I'm pretty sure the issues: write
can be removed. In addition, I think we might not even need the pull-requests: write
since AFAICT we are not writing anything.
I don't have enough energy today to also dig into this to make sure that the permissions are not needed, so feel free to merge this as is and we can open a separate issue to investigate and possibly remove these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we can check with this separately. This PR isn't adding any extra permissions here.
Thanks all for the reviews! Let's backport to 3.12 as well. Let's see how the bot goes, I'm expecting merge conflicts. Edit: And we could consider backports to 3.9 as well? |
Thanks @hugovk for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
Sorry, @hugovk, I could not cleanly backport this to
|
Sorry, @hugovk, I could not cleanly backport this to
|
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> (cherry picked from commit ae31df3)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> (cherry picked from commit ae31df3)
GH-127786 is a backport of this pull request to the 3.13 branch. |
(cherry picked from commit ae31df3) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
GH-127788 is a backport of this pull request to the 3.12 branch. |
… (python#127788) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
https://woodruffw.github.io/zizmor/
This PR adds zizmor to pre-commit, which runs it on the CI.
Here are all the findings currently on
main
:134 findings (89 suppressed): 0 unknown, 0 informational, 13 low, 21 medium, 11 high
This PR fixes all errors except this one:
Let's tackle that another time. For now it's added to the ignore list in
.github/zizmor.yml
.Also, these high-severity errors are fixed:
And turned into these low-severity findings:
There's also some other low-severity ones like this. So this PR runs zizmor with
--min-severity=medium
to ignore those low-severity ones for now.Here's all the findings on this branch, without ignoring anything:
109 findings (89 suppressed): 0 unknown, 0 informational, 19 low, 0 medium, 1 high
Summary:
134 findings (89 suppressed): 0 unknown, 0 informational, 13 low, 21 medium, 11 high
No findings to report. Good job! (20 ignored, 89 suppressed)
cc @sethmlarson and zizmor author @woodruffw.