Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch Protection Rules support for main branch #16

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

richm
Copy link
Contributor

@richm richm commented Feb 18, 2023

This adds a playbook update_branch_protection_rules.yml for
setting Branch Protection Rules for the main branch.

The default rules are quite strict - require approvals, require
status checks, clear approvals if a new commit is pushed, do not
allow admin override, and more.

The status checks are not currently configurable per group or
per role, but you can have additional status checks. For example,
the network role has additional status checks not used by the
other roles. There are additional python status checks, and
additional shell checks, for the roles that support and use those
checks.

More default status checks will be added - we're in the process
of revising the names of the baseos ci status checks, so more
will be coming soon.

Signed-off-by: Rich Megginson rmeggins@redhat.com

@richm
Copy link
Contributor Author

richm commented Feb 18, 2023

Branch protection rule.pdf
This is an example of what the new rules look like in the kernel_settings role.

@martinpitt
Copy link

Thanks @richm ! This is quite a nice way to sync configuration across many related GH projects. Big 👍 TBH, I consider most of these restrictions as a "bare minimum", and they should really be enable by default. The only disputable one is "Require branches to be up to date before merging", which isn't suitable for projects which get more than ~ one PR per day and have too expensive CI to constantly rebase (like Cockpit), but for LSR we can afford to enforce it.

Cheers!

@tyll
Copy link
Member

tyll commented Feb 18, 2023

The only disputable one is "Require branches to be up to date before merging", which isn't suitable for projects which get more than ~ one PR per day and have too expensive CI to constantly rebase (like Cockpit), but for LSR we can afford to enforce it.

GitHub now has beta support for merge queues which should solve this problem for Cockpit as well: https://github.blog/changelog/2021-10-27-pull-request-merge-queue-limited-beta/ - it seems that I might have applied the network role for this already, since I got a notification recently, that we can use it.

The challenge that I see, is that the system roles CI does not seem to be reliable enough for this to properly work.

Comment on lines +50 to +55
- DCO
- markdownlint
- integration (c8s)
- integration (c9s)
- integration (c7)
- python-26
- woke
Copy link
Member

@tyll tyll Feb 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- DCO
- markdownlint
- integration (c8s)
- integration (c9s)
- integration (c7)
- python-26
- woke
- DCO
- markdownlint
- integration (c8s)
- integration (c9s)
- integration (c7)
- python-26
- woke
- Analyze (python)

Currently, the UI also shows a source for each check (DCO, GitHub Actions, GitHub Code Scanning) depending on the status, the alternative seems to be "any source" which might be what this will be configuring, since it does not collect the source. It would be great to keep the sources as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ansible_lint etc. are the defaults applied to all roles - so the network.yml contains only those status checks unique (currently) to network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the UI also shows a source for each check (DCO, GitHub Actions, GitHub Code Scanning) depending on the status, the alternative seems to be "any source" which might be what this will be configuring, since it does not collect the source. It would be great to keep the sources as well.

Why keep the sources?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the UI also shows a source for each check (DCO, GitHub Actions, GitHub Code Scanning) depending on the status, the alternative seems to be "any source" which might be what this will be configuring, since it does not collect the source. It would be great to keep the sources as well.

Why keep the sources?

My assumption is that there is that there might be a security impact (maybe not that big since it seems to require write access to the repo to set a state) and to avoid that status checks conflict. For example, if someone experiments with a new tool that would set a markdownlint status, it could approve the PR without the official markdownlint being run. I guess, this also would allow users to circumvent the branch protection rules but they need to have already write access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the API that allows you to create a status check context with the id: https://docs.github.com/en/graphql/reference/input-objects#requiredstatuscheckinput - the appId field is used to specify a particular action that must already be in the database. Here is the output of searching the network role with the full requiredStatusCheck name + app information:

                createdAt: '2017-03-25T13:51:49Z'
                databaseId: 1861
                description: "This App enforces the [Developer Certificate of Origin](https://developercertificate.org/)\
                    \ (DCO) on Pull Requests. It requires all commit messages to contain\
                ...
                url: https://probot.github.io/apps/dco/
            context: DCO
...
                createdAt: '2020-03-17T21:25:09Z'
                databaseId: 57789
                description: ''
                name: GitHub Code Scanning
                slug: github-code-scanning
                updatedAt: '2021-12-14T16:45:01Z'
                url: https://github.com/features/security
            context: CodeQL
...
                createdAt: '2018-07-30T09:30:17Z'
                databaseId: 15368
                description: Automate your workflow from idea to production
                slug: github-actions
                updatedAt: '2019-12-10T19:04:12Z'
                url: https://help.github.com/en/actions
            context: markdownlint
....

Except for DCO and CodeQL, every "app" is the generic github-actions "app". I guess I could change the input to reflect that. I'm assuming the databaseId field in the output is the appId field I should use in the input - the documentation doesn't really say. It's odd that "real" github actions like ansible_lint do not have their own appId - perhaps because we have custom code in there - for the ansible-lint check we have some custom code to munge meta/main.yml role_name and namespace - for ansible-test we have code to convert to collection.

Note that the key is the context - this must exactly match the name of the job e.g. https://github.com/linux-system-roles/.github/blob/main/playbooks/templates/.github/workflows/ansible-lint.yml#L12

jobs:
  ansible_lint:

so as long as there is only 1 github action which defines jobs.ansible_lint, we can be guaranteed that if we refer to a status check context name as ansible_lint it will use the one defined in .github/workflows/ansible-lint.yml, which requires write access to the repo to change.

Unless you can foresee some attack vector based on status check context naming, I would prefer to use status check context name as is being used currently in this PR.

@richm
Copy link
Contributor Author

richm commented Feb 18, 2023

The only disputable one is "Require branches to be up to date before merging", which isn't suitable for projects which get more than ~ one PR per day and have too expensive CI to constantly rebase (like Cockpit), but for LSR we can afford to enforce it.

GitHub now has beta support for merge queues which should solve this problem for Cockpit as well: https://github.blog/changelog/2021-10-27-pull-request-merge-queue-limited-beta/ - it seems that I might have applied the network role for this already, since I got a notification recently, that we can use it.

Ok. That's another thing I need to figure out how to apply to all roles . . . seems like overkill though with as few PRs as we get across all of the system roles, plus the CI problems mentioned below.

The challenge that I see, is that the system roles CI does not seem to be reliable enough for this to properly work.

@tyll
Copy link
Member

tyll commented Feb 20, 2023

GitHub now has beta support for merge queues which should solve this problem for Cockpit as well: github.blog/changelog/2021-10-27-pull-request-merge-queue-limited-beta - it seems that I might have applied the network role for this already, since I got a notification recently, that we can use it.

Ok. That's another thing I need to figure out how to apply to all roles . . . seems like overkill though with as few PRs as we get across all of the system roles, plus the CI problems mentioned below.

The unreliable CI and possible high latency is actually an argument for the merge queues because it will reduce the amount of necessary CI runs on a specific day when there is a deadline and changes pile up.

@richm
Copy link
Contributor Author

richm commented Feb 20, 2023

GitHub now has beta support for merge queues which should solve this problem for Cockpit as well: github.blog/changelog/2021-10-27-pull-request-merge-queue-limited-beta - it seems that I might have applied the network role for this already, since I got a notification recently, that we can use it.

Ok. That's another thing I need to figure out how to apply to all roles . . . seems like overkill though with as few PRs as we get across all of the system roles, plus the CI problems mentioned below.

The unreliable CI and possible high latency is actually an argument for the merge queues because it will reduce the amount of necessary CI runs on a specific day when there is a deadline and changes pile up.

I still don't think it is necessary to use merge queues for system roles, despite the recently flurry of activity in the network role.

@tyll
Copy link
Member

tyll commented Feb 20, 2023

I still don't think it is necessary to use merge queues for system roles, despite the recently flurry of activity in the network role.

It seems it is also necessary to have merge queues for automated merging to properly work. Otherwise GitHub seems to wait for a human to click the "rebase" button. I enabled merge queues for the network role so we can take a look there.

Regarding disallowing admins to override the branch protection rules: I used this for the current network changelog PR since it was waiting on the integration tests and it is also the fallback in case the code owner is unavailable. What's the plan for that? I guess there should also be a more sophisticated codeowners file to add the system-roles core team as owners for LSR specific paths such as the changelog, maybe tox and .github?

@richm
Copy link
Contributor Author

richm commented Feb 20, 2023

I still don't think it is necessary to use merge queues for system roles, despite the recently flurry of activity in the network role.

It seems it is also necessary to have merge queues for automated merging to properly work. Otherwise GitHub seems to wait for a human to click the "rebase" button. I enabled merge queues for the network role so we can take a look there.

Ok. Then the network role will be the PoC for merge queues, and we'll see if they will be generally useful for all of the system roles.

Regarding disallowing admins to override the branch protection rules: I used this for the current network changelog PR since it was waiting on the integration tests and it is also the fallback in case the code owner is unavailable.

Right.

What's the plan for that?

Typically what I do is go into Settings -> Branch Protection Rules and turn off this switch, merge the change, and then turn back on the switch.

I guess there should also be a more sophisticated codeowners file to add the system-roles core team as owners for LSR specific paths such as the changelog, maybe tox and .github?

You can do that with codeowners?

@richm
Copy link
Contributor Author

richm commented Feb 20, 2023

I still don't think it is necessary to use merge queues for system roles, despite the recently flurry of activity in the network role.

It seems it is also necessary to have merge queues for automated merging to properly work. Otherwise GitHub seems to wait for a human to click the "rebase" button. I enabled merge queues for the network role so we can take a look there.

Ok. Then the network role will be the PoC for merge queues, and we'll see if they will be generally useful for all of the system roles.

Regarding disallowing admins to override the branch protection rules: I used this for the current network changelog PR since it was waiting on the integration tests and it is also the fallback in case the code owner is unavailable.

Right.

What's the plan for that?

Typically what I do is go into Settings -> Branch Protection Rules and turn off this switch, merge the change, and then turn back on the switch.

The only repo currently that has isAdminEnforced: true is the cockpit role - I guess @martinpitt prefers the extra layer of security and isn't too concerned with having to override status checks/approvals and force merge. Martin - are you ok if we set isAdminEnforced: false for now? @tyll are you willing to try isAdminEnforced: true?

It's too bad there isn't a switch to "allow admin to merge if CI/github action check failure" but "disallow admin to merge if there aren't enough reviews".

I guess there should also be a more sophisticated codeowners file to add the system-roles core team as owners for LSR specific paths such as the changelog, maybe tox and .github?

You can do that with codeowners?

Apparently yes, but I would rather do that in a subsequent PR.

@richm
Copy link
Contributor Author

richm commented Feb 21, 2023

@tyll @martinpitt 80123f9 allows isAdminEnforced per role, and makes cockpit the only role currently that uses true

richm added 6 commits March 10, 2023 11:44
This adds a playbook `update_branch_protection_rules.yml` for
setting Branch Protection Rules for the `main` branch.

The default rules are quite strict - require approvals, require
status checks, clear approvals if a new commit is pushed, do not
allow admin override, and more.

The status checks are not currently configurable per group or
per role, but you can have additional status checks.  For example,
the network role has additional status checks not used by the
other roles.  There are additional python status checks, and
additional shell checks, for the roles that support and use those
checks.

More default status checks will be added - we're in the process
of revising the names of the baseos ci status checks, so more
will be coming soon.

Signed-off-by: Rich Megginson <rmeggins@redhat.com>
@richm richm force-pushed the branch-protection-rules branch from fc2358f to 88c0572 Compare March 10, 2023 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants