Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lambda): Prevent scale-up lambda from starting runner for user repo if org level runners is enabled #3909

Merged
merged 10 commits into from
Aug 16, 2024

Conversation

PerGon
Copy link
Contributor

@PerGon PerGon commented May 13, 2024

We run a setup where we only have "Org Level Runners" (enable_organization_runners: true).

When creating an action job in a repository that does not belong to an Organization (User level repository), that ScaleUp lambda will happily create a new VM for it.
Well, this new VM in turn, will never be able to register itself in GitHub because it cannot register itself as a "Org Level Runner" as such concept does not exist at the user level.
This VM will then become a "zombie" VM and will also cause issues with the ScaleDown lambda (described in ToBeCreatedIssue).

In this PR, we will pass the "Repo Owner Type" all the way from the Webhook lambda to the ScaleUp lambda (by adding a new field in the SQS payload). Then, the ScaleUp lambda can decide to drop the request if the "Repo Owner Type" is not of the type Organization and enable_organization_runners is set to true.

Copy link
Member

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time crating the PR. What I don't understand why the user event is sent to the webook. The common way of using the module is installing repositories to an app.

For me it is not 100% clear what you mean with user space repo's. Are that forks of org repo's?

@@ -305,6 +316,11 @@ describe('scaleUp with GHES', () => {
expect(mockOctokit.paginate).toHaveBeenCalledTimes(1);
});

it('Throws error if it is a User repo and org level runners is enabled', () => {
process.env.ENABLE_ORGANIZATION_RUNNERS = 'true';
expect(() => scaleUpModule.scaleUp('aws:sqs', TEST_DATA_USER_REPO)).rejects.toBeInstanceOf(Error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer not to add a global event (module scope) for only a single test. Please can you update the event in the test to set the repoUserType

if (enableOrgLevel && payload.repoOwnerType !== 'Organization') {
logger.warn(`Repository ${payload.repositoryOwner}/${payload.repositoryName} does not belong to a GitHub` +
`organization and organization runners are enabled. This is not supported. Not scaling up for this event.`);
throw Error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not throw an error, this will send the message back to SQS. And trigger a retry.

@PerGon
Copy link
Contributor Author

PerGon commented May 22, 2024

Howdy! Thanks for having a look.

Sorry I was not very clear 😅 I'll try to elaborate!

Let's consider the following scenario with 2 different repos:

  1. github.my_enterprise.com/some_team/app1 - In this case we have a repo called app1 in one GitHub Org called some_team
  2. github.my_enterprise.com/john_doe/some_poc - In this case we have a repo called some_poc in a GitHub User called john_doe

In our setup we use a GitHub App. We also have plenty of GitHub Orgs in our enterprise - we tell the users who want to use our centrally maintained GHA Runners to install the specific GitHub App in their GitHub Orgs so they can use GHA (we have other CIs in our company 🙄 ) - in our internal documentation we clearly state "Do not install the GitHub App in your GitHub User account - ONLY install in GitHub Organizations".
But, even tho this is clearly stated in our internal docs, some engineers will eventually install the GitHub App for GHA runners in their GitHub User account - and once they try to create Action in their "personal" repos, it will cause the erratic behavior I described in OP.

As far as we can see, there's no way to prevent a GitHub App from being installed in GitHub User accounts 😢

The TL:DR; is that we have a self-service model where engineers can onboard their GitHub Orgs to use GHA Runners, but if they install the GitHub App in their own account (instead of their org) to try our GHA, it will cause issues.

Here's an example of two GitHub Apps installations, one that is OK and one that is not OK.

This is an OK use case

image

This is an NOT an OK use case

image

Sorry for the wall of text 😅 hopefully I could exemplify the case that causes issues. Let me know if it made things a bit more clear 😸 and also thanks for the comments in the code - will address them as soon as I have a few minutes!

@PerGon
Copy link
Contributor Author

PerGon commented May 30, 2024

I've addressed the comments on the code (I think 😅 ) - thanks for the review. Let me know how it looks and if you think more changes are needed 🙇

@npalm
Copy link
Member

npalm commented Jun 28, 2024

I was out of the office for some time, back. Will check the PR next week. Sorry for the delay.

@stuartp44 stuartp44 added enhancement New feature or request awaiting review labels Jul 31, 2024
@npalm
Copy link
Member

npalm commented Aug 6, 2024

Sorry the PR complete was out of my sight. Will catch up again. Sorry once again

@npalm npalm self-requested a review August 6, 2024 17:40
@npalm
Copy link
Member

npalm commented Aug 6, 2024

Please can you run in the meantime the command yarn run format from the lambdas dir to address the CI issue about formatting.

Copy link
Member

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great contribution. All looks good!

@npalm npalm merged commit 98b1560 into philips-labs:main Aug 16, 2024
3 of 4 checks passed
npalm pushed a commit that referenced this pull request Aug 16, 2024
🤖 I have created a release *beep* *boop*
---


##
[5.15.0](v5.14.1...v5.15.0)
(2024-08-16)


### Features

* add time zone support for pool schedules
([#4063](#4063))
([b8f9eb4](b8f9eb4))
@janslow
* scale up for long waiting jobs (job retry)
([#4064](#4064))
([6120571](6120571))


### Bug Fixes

* **lambda:** bump axios from 1.7.2 to 1.7.4 in /lambdas
([#4071](#4071))
([2f32195](2f32195))
* **lambda:** bump the aws group in /lambdas with 5 updates
([#4057](#4057))
([5ecdbad](5ecdbad))
* **lambda:** bump the aws-powertools group in /lambdas with 3 updates
([#4058](#4058))
([f9533f3](f9533f3))
* **lambda:** Prevent scale-up lambda from starting runner for user repo
if org level runners is enabled
([#3909](#3909))
([98b1560](98b1560))
@PerGon

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: forest-releaser[bot] <80285352+forest-releaser[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants