Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get AutoScaling work #726

Closed
1 of 2 tasks
vikas027 opened this issue Aug 14, 2021 · 6 comments
Closed
1 of 2 tasks

Unable to get AutoScaling work #726

vikas027 opened this issue Aug 14, 2021 · 6 comments
Labels
question Further information is requested

Comments

@vikas027
Copy link

Describe the bug
I am new to the controller and have set up the hosted runners with custom images and IRSA on my EKS account, all works fine so far :). But I am not sure if I understanding how autoscaling works or probably have not set it up correctly.

I have created a HorizontalRunnerAutoscaler and pushed a few commits to a git repository to make sure a few jobs are queued. I was expecting the HRA to trigger autoscaling but it didn't work. Not sure what I missing here :(

Checks

  • My actions-runner-controller version (v0.12.7) does support the feature
  • I'm using an unreleased version of the controller I built from HEAD of the default branch

To Reproduce
As per the documentation:

  • created a GitHub Organization application
  • enable githubWebhookServer in the helm chart
  • Create a RunnerDeployment (without replicas) and a HorizontalRunnerAutoscaler with a replica count
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: runner-deployment
spec:
  template:
    spec:
      serviceAccountName: github-actions-runner
      organization: my-org
      ephemeral: true
      image: 1111111111.dkr.ecr.us-east-1.amazonaws.com/github-actions-runner:latest
      dockerEnabled: false

---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: runner-deployment-autoscaler
spec:
  scaleTargetRef:
    name: runner-deployment
  scaleDownDelaySecondsAfterScaleOut: 60
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '0.75'
    scaleDownThreshold: '0.3'
    scaleUpAdjustment: 2
    scaleDownAdjustment: 1
  scaleUpTriggers:
  - githubEvent:
      checkRun:
        types: ["created"]
        status: "queued"
    amount: 1
    duration: "5m"

Expected behaviour
I was expecting that controller will watch the jobs queue in the repository and scale-up automatically.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Controller Version: 0.19.0
  • Deployment Method: Helm
  • Helm Chart Version: 0.12.7

Additional context

  • All deliveries in the git repos are green.
@mumoshu
Copy link
Collaborator

mumoshu commented Aug 15, 2021

@vikas027 Hey! I have a few points to understand your issue:

  • You shouldn't need /actions in https://github-runner-webhook.<my_domain>/actions/ as long as you don't have any ingress controller or alike in front of the webhook server
  • Are you sure you've checked Check Runs in your webhook settings page, so that the webhook server can receive check_run events from the webhook?
    • For verification, you can review the webhook history in the next tab of the webhook settings page
  • Are you sure you've configured --sync-period small enough? The controller recalculates the desired runner replicas number only on each sync period.

The scale out performance is controlled via the manager containers startup --sync-period argument. The default value is set to 10 minutes to prevent default deployments rate limiting themselves from the GitHub API.
https://github.com/actions-runner-controller/actions-runner-controller#autoscaling

@mumoshu mumoshu added the question Further information is requested label Aug 15, 2021
@vikas027
Copy link
Author

Hey @mumoshu ,

Thanks for the response. I have explained (and added few questions) as well :)

  • You shouldn't need /actions in https://github-runner-webhook.<my_domain>/actions/ as long as you don't have any ingress controller or alike in front of the webhook server

I am using ambassador ingress controller and mapped https://github-runner-webhook.<my_domain> to kubernetes service actions-runner-controller-github-webhook-server.actions-runner-system.svc.cluster.local. Do I need to add /actions?
Also, the webhook needs to be added to every repository or at the organization level (https://github.com/organizations/<org_name>/settings/hooks)

  • Are you sure you've checked Check Runs in your webhook settings page, so that the webhook server can receive check_run events from the webhook?

Oh, I was missing this. I had selected the default, Just the push event which I think misses Check runs permission.

  • For verification, you can review the webhook history in the next tab of the webhook settings page

I did review the history, it was all green with an HTTP 200 response.

  • Are you sure you've configured --sync-period small enough? The controller recalculates the desired runner replicas number only on each sync period.

The scale out performance is controlled via the manager containers startup --sync-period argument. The default value is set to 10 minutes to prevent default deployments rate limiting themselves from the GitHub API.
https://github.com/actions-runner-controller/actions-runner-controller#autoscaling

Sorry, I did not get this completely. Does this mean that the actions runner controller will poll for the GitHub Actions queue every 10 minutes and start a runner if there are unallocated jobs? Suppose there are 5 outstanding jobs, will it run 5 more runners?

@mumoshu
Copy link
Collaborator

mumoshu commented Aug 15, 2021

Do I need to add /actions?

Nope. The server doesn't have /actions endpoint hence you don't need it.

Also, the webhook needs to be added to every repository or at the organization level (https://github.com/organizations/<org_name>/settings/hooks)

Either should work.

I did review the history, it was all green with an HTTP 200 response.

Okay. Now, try checking it once again after you checked the Check runs in the webhook settings and pushed more commits. Also, ensure that you check for historical webhook events whose event/action header is check_run. You might have only seen 200 OKs against ping events.

Sorry, I did not get this completely. Does this mean that the actions runner controller will poll for the GitHub Actions queue every 10 minutes and start a runner if there are unallocated jobs? Suppose there are 5 outstanding jobs, will it run 5 more runners?

Yes. It's more involved if we looked into the implementation detail, but I believe you got it 99% correct.

So again, make it --sync-period 1m if you do need actions-runner-controller to autoscale in approx 1m after you've enqueued a lot of jobs, with PercentageRunnersBusy.

The webhook-based autoscale should scale more quickly. But webhook-based autoscale is difficult to configure correctly if you have a lot of runner label combinations and runner groups to scale. It will be more complete and easy after #721 (and corresponding GitHub feature releases).

@vikas027
Copy link
Author

vikas027 commented Aug 15, 2021

Thanks for all the help @mumoshu . Everything is working fine now :) . I was mainly missing two things, an extra /actions on the webhook and the missing check_run in the webhook permission.

I have one last question regarding the GitHub APP API limits. I am just trying to figure out how these limits work, so I can fine tune the sync-period appropriately.

As per the server-to-server requests in the official documentation is 12,500 which is fine but I can't make sense of the below.

Organization installations with more than 20 users receive another 50 requests per hour for each user. Installations that have more than 20 repositories receive another 50 requests per hour for each repository.

Why would GitHub make an API call per user and per repository? The API calls should come from the GitHub Webhook Server to my Kubernetes cluster, and maybe then on the repository only while running the jobs, isn't it? I don't understand why it needs to make an API call to every user and every repository.

@mumoshu
Copy link
Collaborator

mumoshu commented Aug 15, 2021

@vikas027 Glad to hear it worked for you :)

I don't understand why it needs to make an API call to every user and every repository.

I believe you're misreading the GitHub doc here, if I'm correct. The doc seems to be saying that it will allow you 50 additional API calls per user above 20 users.

@vikas027
Copy link
Author

I believe you're misreading the GitHub doc here, if I'm correct. The doc seems to be saying that it will allow you 50 additional API calls per user above 20 users

Yeah, could be :| . Anyways, I will close the issue. Thanks again for helping me out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants