From 57ea42d38c0211910ad71162423e2a0c562e4231 Mon Sep 17 00:00:00 2001 From: Natalie Somersall Date: Wed, 9 Aug 2023 13:08:22 -0600 Subject: [PATCH 1/4] remove admin-introduction in favor of website https://some-natalie.dev/blog/arch-guide-to-selfhosted-actions/ --- docs/admin-introduction.md | 127 +------------------------------------ 1 file changed, 2 insertions(+), 125 deletions(-) diff --git a/docs/admin-introduction.md b/docs/admin-introduction.md index a68fbbd..0f3b6f3 100644 --- a/docs/admin-introduction.md +++ b/docs/admin-introduction.md @@ -1,126 +1,3 @@ -# Admin's (draft) guide to implementing self-hosted GitHub Actions +# Architecture guide to implementing self-hosted GitHub Actions -## Introduction - -Audience - GitHub Enterprise administrators wanting to self-host compute for GitHub Actions, especially for [Enterprise Server](https://docs.github.com/en/enterprise-server@latest) (self-hosted) or [GitHub AE](https://docs.github.com/en/github-ae@latest) (dedicated, isolated SaaS). The guidance changes some depending on which product, so any differences will be noted. If you're not one of those, you're still welcome! You might find helpful tips and tricks nonetheless. :tada: - -This piece is going to take a look at what this feature is and a quick overview of how it works, then go through some key decisions you should think through as you set it up. A bunch of experience running this at scale went into this project, and opinions from that experience is noted in the last paragraph of each key decision on _why_ this problem is approached the way it is in this solution. - -We're _not_ covering the details of which Enterprise version you should be on or any future roadmap items. If that's of interest, reach out to the friendly [Sales](https://github.com/enterprise/contact) or [Support](https://enterprise.github.com/support) teams. - -### What's GitHub Actions? - -Glad you asked! You can learn all about it [here](https://docs.github.com/en/actions), but the tl;dr awesome video version is in [this YouTube video](https://www.youtube.com/watch?v=cP0I9w2coGU). It's a tool that can be used to automate all sorts of other stuff done manually or locally, like: - -- Regression testing -- Deploying software -- Linting code -- Running security tools -- Git branch management and other chores -- Reward users with cat gifs (no, [really](https://github.com/ruairidhwm/action-cats)) -- Close stale issues and pull requests ([link](https://github.com/actions/stale)) -- Integrate with pretty much any other thing that could ever possibly use GitHub -- ... and a lot more ... - -There's a whole [marketplace](https://github.com/marketplace?type=actions) full of building blocks of automation to use - over 12,000 of them as of March 2022. You can also [create your own](https://docs.github.com/en/actions/creating-actions) to further help robots do all the work. - -### Why self-hosted? - -GitHub provides hosted, managed runners that you can use out of the box - but only for users within GitHub.com. Information on features, hardware specs, and pricing for this compute can be found [here](https://docs.github.com/en/enterprise-cloud@latest/actions/using-github-hosted-runners/about-github-hosted-runners). They're super easy to use and offer a wide variety of software built-in, which can be customized as detailed [here](https://docs.github.com/en/enterprise-cloud@latest/actions/using-github-hosted-runners/customizing-github-hosted-runners). While great, the managed runners don't fit everyone's use case, so bring-your-own compute is also fully supported. It's a straightforward process to install the [runner agent](https://github.com/actions/runner) on the compute needed. Common reasons for choosing self-hosted runners include: - -- Custom hardware (like ARM processors or GPU-focused compute) -- Custom software beyond what's available or installable in the hosted runners -- You don't have the option to use the GitHub-managed runners because you are on [GitHub Enterprise Server](https://docs.github.com/en/enterprise-server@latest) or [GitHub AE](https://docs.github.com/en/github-ae@latest). -- Firewall rules to access stuff won't allow access to/from whatever it is you need to do -- Needing to run jobs in a specific environment such as "gold load" type imaged machines -- Because you _want_ to and I'm not here to judge that :) - -This means that you, intrepid Enterprise administrator, are responsible for setting up and maintaining the compute needed for this service. The [documentation](https://docs.github.com/en/actions/hosting-your-own-runners) to do this is fantastic. If you're used to running your own enterprise-wide CI system, GitHub Actions is probably easier than it seems. If you aren't, or are starting from scratch, it can be a bit daunting. That's where this guide comes in. The next section is all about some key decisions to make that will determine how to set up self-hosted compute for GitHub Actions. - ---- - -## Key decisions - -### Scaling - -How do you want or need to scale up? By using the runners provided by GitHub, this is handled invisibly to users without any additional fiddling. Self-hosted runners don't have the same "magic hardware budgeting" out of the box. Some things to keep in mind: - -- **GitHub Actions are parallel by default.** This means that unless you specify "this job depends on that job", they'll both run at the same time ([link](https://docs.github.com/en/actions/using-workflows/advanced-workflow-features#creating-dependent-jobs)). Jobs will wait in queue if there are no runners available. The balance to search for here is minimizing job wait time on users without having a ton of extra compute hanging out idle. Regardless of if you're using a managed cloud provider or bare metal, efficient capacity management governs infrastructure costs. -- **Users can have multiple tasks kick off simultaneously.** GitHub Actions is event-driven, meaning that one event can start several processes. For example, by opening a pull request targeting the main branch, that user is proposing changes into what could be "production" code. This can and should start some reviews. Good examples of things that can start at this time include regression testing, security testing, code quality analysis, pinging reviewers in chat, update the project management tool, etc. These can, but don't necessarily _need_ to, run in parallel. By encouraging small changes more often, these should run fairly quickly and frequently too, resulting in a faster feedback loop between development and deployment. However, it means that your usage can appear a bit "peaky" during work hours, with flexibility in job queuing. -- **GitHub Actions encourages use beyond your legacy CI system.** It can do more with less code defining your pipeline, users can provide all sorts of additional things for it to do, and it can even run scheduled shell scripts and other operations-centric tasks. These are all great things, but a project that used X minutes of runtime on Y platform may not linearly translate to the same usage in GitHub Actions. -- **Migrating to GitHub Actions can be a gradual transition.** The corollary to above is that while the end state may be more compute than right now, it's a process to get a project to migrate from one system to another and then to grow their usage over time as a project grows. Without external pressure like "we're turning off the old system on this date", it'll take a while for users to move themselves. Use this to your advantage to scale your infrastructure if you have long-lead tasks such as provisioning new servers or appropriating budget. - -:information_desk_person: **Opinion** - This is one of those cases where the balance between infrastructure costs and the time a user will spend waiting for a runner to pick up a job can really swing how they perceive the service. I went with Kubernetes to provide fast scaling of variable-spec compute on a wide variety of platforms. In the [example deployment](../deployments/README.md), each pod starts out pretty small, but can scale to a maximum size as needed. This means small tasks get small compute and bigger tasks (such as code security scans) will get bigger compute. The downside of the choice to use Kubernetes is that it's more complicated than other platform options, detailed in the next section. - -### Platform - -What platform do you want to run on? The runner agent for GitHub Actions works in modern versions of Mac OS, Windows, and most major distributions of Linux. This leaves a lot of flexibility for what the platform to offer to your userbase looks like. The diagram below offers an overview of the options to consider. - -![Deployment options](https://d33wubrfki0l68.cloudfront.net/26a177ede4d7b032362289c6fccd448fc4a91174/eb693/images/docs/container_evolution.svg) - -**Bare metal** comes with the upside of simpler management for end-user software licenses or supporting specialized hardware. In a diverse enterprise user base, there is always a project or two that needs a GPU cluster or specialized Mac hardware to their organization or repository. Supporting this as an enterprise edge case is a good choice. However, it comes with the cost of owning and operating the hardware 24/7 even if it isn't in use that entire time. Since one runner agent corresponds to one job, an agent on a beefy machine will still only run one job to completion before picking up the next one. If the workloads are primarily targeted to the hardware provided, this isn't a problem, but it can be inefficient at an enterprise scale. - -**Virtual machines** are simple to manage using a wide variety of existing enterprise tooling at all stages of their lifecycle. They can be as isolated or shared across users as you'd like. Each runner is another VM to manage that isn't fundamentally different than existing CI build agents, web or database servers, etc. There are some community options to scale them up or down as needed, such as [Terraform](https://github.com/philips-labs/terraform-aws-github-runner) or [Ansible](https://github.com/MonolithProjects/ansible-github_actions_runner), if that's desired. The hypervisor that manages the VM infrastructure handles resource allocation in the datacenter or it's magically handled by a private cloud provider such as Azure or AWS. - -**Kubernetes** provides a scalable and reproducible environment for containerized workloads. Declarative deployments and the ephemeral nature of pods used as runner agents creates less "works on this agent and not that one" by not having the time for configuration to drift. There are a lot of advantages to using Kubernetes (outlined [here](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/)), but it is more complicated and less widely-understood than the other options. A managed provider makes this much simpler to run at scale. - -:information_source: Some GitHub Actions ship as Dockerfiles, meaning the workload builds and runs in the container it defines. Whichever path is chosen here, a container runtime should be part of the solution if these jobs are required. This could mean Docker-in-Docker, which this solution fully supports, for Kubernetes-based solutions. - -:information_desk_person: **Opinion** - Whatever is currently in use is probably the best path forward. I hesitate to recommend a total infrastructure rebuild for a few more servers in racks, or VMs, or container deployments. Managed providers of VM infrastructure or Kubernetes clusters take away the hardware management aspect of this. This solution relies on Kubernetes and the [actions-runner-controller](https://github.com/actions-runner-controller/actions-runner-controller) community project. - -### Persistence - -How persistent or transient do you want the environment that is building the code to be? Should the state of one random job that runs on this machine/container affect any other random job? - -There's a lot to unpack here, so here's a helpful analogy: - -> A build environment is like a kitchen. You can make all sorts of food in a kitchen, not just the one dish that you want at any given time. If it's just you and some reasonable roommates, you can all agree to a shared standard of cleanliness. The moment one unreasonable houseguest cooks for the team and leaves a mess, it's a bunch of work to get things back in order (broken builds). There could also be food safety issues (code safety issues) when things are left to get fuzzy and gross. -> -> Imagine being able to snap your fingers and get a brand new identical kitchen at every meal - that's the power of ephemeral build environments. Now imagine being able to track changes to those tools in that kitchen to ensure the knives are sharp and produce is fresh - that's putting your build environment in some sort of infrastructure-as-code solution. - -The persistence here is somewhat independent of the platform chosen. Bare metal ephemeral runners are possible, but may require more effort than a solution based on virtual machines or containers. The _exact_ way this gets implemented depends a lot on the other parts and pieces of your unique platform. - -:information_desk_person: **Opinion** - The more ephemeral and version-controlled, the better! This solution uses containers that are used once, then redeployed from the specified container in a registry. In my experience, persistent environments tend to work alright for single projects and start to have problems when the project needs change. Persistence leads to configuration drift even with the best config management practices, meaning that "it works on my machine" and the work required to maintain everything doesn't always happen. - -### Compute design - -This decision depends a lot on how persistent or ephemeral the compute is and the particulars of the environment it lives in, but the goal here is to figure out how large or lean the environment is at runtime. - -- **Larger environments with lots of pre-loaded software decrease job execution time.** As the user base grows in size and diversity of needs (languages, tools, frameworks, etc.), having the more common things installed in the "base" image allows for faster job execution. If the compute is discarded and rebuilt after each job, this comes at the expense of bandwidth between the compute and the container registry, storage server, or wherever the "base" image comes from. -- **Persistent environments can have conflicting update needs.** When there's more software to manage, there's a bigger chance that updates conflict or configuration can drift. That doesn't mean this isn't the right choice for some projects, such as projects that need software that isn't able to be licensed in a non-persistent state. This can be mitigated somewhat by having persistent compute scoped to only the project(s) that need it. -- **Larger environments with lots of pre-loaded software increases vulnerability area.** If you're scanning the build environment, there's more things for it to alarm on in larger images. The validity of these alarms may vary based on tools used, software installed, etc. -- **Smaller ephemeral images that consistently pull in dependencies at setup increases bandwidth use.** A job that installs build dependencies every time it runs will download those every time. This isn't necessarily a bad thing, but keep in mind your upstream software sources (such as package registries) may rate-limit the entire source IP, which affects every project in use and not just the offending project. There are ways to mitigate this, including the use of a caching proxy or private registry. - -:information_desk_person: **Opinion** - This isn't a binary choice and can always change as the project/enterprise needs change. I wouldn't spend too much time on this, but have tended to prefer larger images with more things in them to minimize traffic out of the corporate network at the cost of bandwidth between the Kubernetes cluster and the private image registry that hosts the container images. - -### Compute scope - -GitHub Enterprise can have runners that are only available to an individual repository, all or select repositories within an organization, or enterprise-wide (detailed [here](https://docs.github.com/en/enterprise-server@latest/actions/hosting-your-own-runners/about-self-hosted-runners)). What is the ideal state for your company? - -:information_desk_person: **Opinion** - All of the above is likely going to happen with any sufficiently diverse user base, so let's make this as secure and easily governable as needed. Some teams will bring their own hardware and not want to share, which is reasonable, so will join their compute to only accept jobs from their code repositories. This also means that admins can do some networking shenanigans to allow only runners from X subnet to reach Y addresses to meet rules around isolation if needed. Likewise, as an enterprise-wide administrator, I wanted to make the most commonly-used Linux compute available and usable to most users for most jobs. This solution defaults to enterprise-wide availability, but will also demonstrate organization or repository specific compute. - -### Policy and compliance - -Is there any policy you need to consider while building this out? Examples could be scan your containers/VMs/bare metal machines with some security tool, to have no critical vulnerabilities in production, project isolation, standards from an industry body or government, etc. - -:information_desk_person: **Opinion** - I don't know all the policies everywhere at all times, but I've always found it very helpful to gather these requirements up front and keep them in mind. When possible, I'll highlight security guidance. - ---- - -## Recommendations - -Here's a few general recommendations that don't fall neatly into the above, but were learned from experience: - -- Don't underestimate the power of enterprise-wide availability to drive adoption among users. Just like it's easy to use the GitHub-hosted compute, having a smooth and simple onboarding experience is great. Offering compute to users is a great "carrot" to keep shadow IT assets to a minimum. -- "Why not both?" is usually a decent answer. Once you get the hang of creating images and deployments for unique pools of containerized runners, it becomes low-effort to enable more distinct projects. -- Ephemeral compute is great and even better when you have diverse users/workloads. Each job gets a fresh environment, so no configuration drift or other software maintenance weirdness that develops over time. -- Docker-in-Docker for Kubernetes is hard, but valuable. It enables containerized workflows for GitHub Actions, so no one is "left out" or needs to check if this type of runner supports that type of Action. It's a better user experience. This solution includes it by default. -- Ship your logs somewhere. You can view job logs in GitHub and that's handy for developers to troubleshoot their stuff, but it's hard to see trends at scale there. We'll talk about this more in a later writeup. -- Everything is made better by a managed provider and Kubernetes doubly so. Kubernetes is super powerful and very extensible, but I wouldn't call it easy for anyone to pick up and DIY. -- Have a central-ish place for users to look for information. This could be wherever the rest of the documentation for your company lives. In this case, this repository has a [`README.md`](../README.md) file and uses documentation in the repository. - ---- - -## Next steps - -:boom: Ready for some scalable, Kubernetes-based ephemeral runners for GitHub Actions? Let's move to the [setup](admin-setup.md) guide! +Moved to From 88d205f22d67a669247cd5cde821c61021c60528 Mon Sep 17 00:00:00 2001 From: Natalie Somersall Date: Wed, 9 Aug 2023 13:09:57 -0600 Subject: [PATCH 2/4] moved docs --- docs/admin-customization.md | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/docs/admin-customization.md b/docs/admin-customization.md index 38e2abe..1eacadb 100644 --- a/docs/admin-customization.md +++ b/docs/admin-customization.md @@ -1,21 +1,3 @@ # Quick guide to customizing your images -There are a couple images in the [images](../images/) folder that can be used as-is if you'd like. It's likely that some customization will need to happen over time and that's great! Here's how to do that: - -## Types of enterprise-y customizations - -There are a few types of customizations you can use within these runners. - -1. Adding software directly to the runner image. This adds to the size of the image, but should make it faster to execute the job. You can do this either by adding it to the appropriate `Dockerfile` line to install the default OS package (like [this](../images/ubuntu-focal.Dockerfile#L35)), or by creating a [script](../images/software/) and copying it to run within the `Dockerfile` at build time (like [this](../images/ubuntu-focal.Dockerfile#L67)). -1. Adding environment variables, such as a custom proxy setups or package registries. By default, these images use an [`.env`](../images/.env) file that's loaded when the image starts. -1. Customizing the [deployments](../deployments/) changes the type of runners available to the users and if/how they scale. The standards and options available in these files is controlled by [actions-runner-controller](https://github.com/actions-runner-controller/actions-runner-controller) -1. Changing the labels on the runners allows your users to target specific deployments, so that their jobs only run on Ubuntu runners, for instance. Read more about this [here](https://docs.github.com/en/enterprise-cloud@latest/actions/hosting-your-own-runners/using-labels-with-self-hosted-runners). -1. Adding or editing the [enterprise allowlist](https://docs.github.com/en/enterprise-cloud@latest/admin/github-actions/getting-started-with-github-actions-for-your-enterprise/introducing-github-actions-to-your-enterprise) will change what's available from the [Actions marketplace](https://github.com/marketplace?type=actions) to your users. Read more about the considerations [here](https://docs.github.com/en/enterprise-cloud@latest/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions). - -## Other things to think about - -These images are assumed to be tagged by release, so they should stay reasonably consistent. You can, if you choose, only use `latest` to tag your images and deploy them - but there are a lot of reasons this is a [bad idea](https://kubernetes.io/docs/concepts/configuration/overview/#using-labels). While it's fine for testing, it makes it difficult to track what's deployed into production and hard to roll back changes. - -As the images get customized to meet user needs, the version can/should change over time. - -Here's a great [hands-on lab](https://lab.github.com/githubtraining/create-a-release-based-workflow) to learn how to get started using release-based workflows. +Moved to [Creating custom images for actions-runner-controller](https://some-natalie.dev/blog/kubernoodles-pt-5/) From bd74a34e41ec32d242af552279930919ef3e188a Mon Sep 17 00:00:00 2001 From: Natalie Somersall Date: Wed, 9 Aug 2023 13:46:03 -0600 Subject: [PATCH 3/4] move more docs to where they're updated --- docs/README.md | 8 ++--- docs/admin-setup.md | 82 +-------------------------------------------- 2 files changed, 5 insertions(+), 85 deletions(-) diff --git a/docs/README.md b/docs/README.md index cc1c59d..d69e1e5 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,16 +1,16 @@ # Documentation -All of the documentation that isn't a README for a specific part of the project lives here. :) +All of the documentation that isn't a README for a specific part of the project is linked from here. ## Before you get started -The [admin introduction](admin-introduction.md) walks you through some key considerations on _how_ to think about implementing GitHub Actions at the enterprise scale, the implications of those decisions, and why this project is generally built out the way it is. +The [admin introduction](https://some-natalie.dev/blog/arch-guide-to-selfhosted-actions/) walks you through some key considerations on _how_ to think about implementing GitHub Actions at the enterprise scale, the implications of those decisions, and why this project is generally built out the way it is. ## Getting up and running -The [admin setup](admin-setup.md) is a mostly copy-and-paste exercise to get a basic deployment up and going. +The [admin setup](https://some-natalie.dev/blog/kubernoodles-pt-1/) is a mostly copy-and-paste exercise to get a basic deployment up and going. -The [customization](admin-customization.md) guide has a quick writeup and links to learn more about the ways you can customize things to your needs. +The [customization](https://some-natalie.dev/blog/kubernoodles-pt-5/) guide has a quick writeup and links to learn more about the ways you can customize things to your needs. ## Troubleshooting diff --git a/docs/admin-setup.md b/docs/admin-setup.md index 211909a..eedfb60 100644 --- a/docs/admin-setup.md +++ b/docs/admin-setup.md @@ -1,83 +1,3 @@ # Setup guide -Ready to set up some kubernoodles? - -## Pre-requisites - -You'll need a Kubernetes cluster already set up that GitHub can use. If you don't have one, or are just testing this out, try out [Docker Desktop](https://www.docker.com/products/docker-desktop). It can create a local Kubernetes cluster with a few clicks to get you on your way. - -Additionally, for **GitHub Enterprise Server**, you will need the following: - -- GitHub Enterprise Server 3.3 or later -- [Actions](https://docs.github.com/en/enterprise-server@latest/admin/github-actions/enabling-github-actions-for-github-enterprise-server) and [Packages](https://docs.github.com/en/enterprise-server@latest/admin/packages) are already set up and enabled - -:information_source: While Actions shipped in GHES 3.0, the later versions of actions-runner-controller specify that 3.3 is their minimum supported version. You may, if needed, want to move to an earlier version of actions-runner-controller. Upgrading to a later version of GHES is the better option though. :-) - -Here are the credentials we'll be generating for enterprise-wide runners: - -- A GitHub PAT with _only_ the `admin:enterprise` scope (for enterprise-wide runners) -- A GitHub PAT (or credentials for an alternative container registry) to pull the runner containers from the registry (in this case, we're using GitHub Packages) - -## Directions - -1. Install [Helm](https://helm.sh) to manage Kubernetes software. - - ```shell - curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 - sudo bash get_helm.sh - ``` - -1. Install [cert-manager](https://cert-manager.io) to generate and manage certificates. - - ```shell - kubectl create namespace cert-manager - helm repo add jetstack https://charts.jetstack.io - helm repo update - helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v1.10.0 --set installCRDs=true - ``` - -1. Install [actions-runner-controller](https://github.com/actions-runner-controller/actions-runner-controller). - - ```shell - kubectl create namespace actions-runner-system - helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller - helm repo update - helm install -n actions-runner-system actions-runner-controller actions-runner-controller/actions-runner-controller --version=0.21.1 - ``` - -1. Set the GitHub Enterprise URL, needed only for GitHub Enterprise Server or GitHub AE. - - ```shell - kubectl set env deploy actions-runner-controller -c manager GITHUB_ENTERPRISE_URL=https://YOUR-GHE-URL --namespace actions-runner-system - ``` - -1. Set a [personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) for the runner controller to use. - - ```shell - kubectl create secret generic controller-manager -n actions-runner-system --from-literal=github_token=TOKEN-GOES-HERE - ``` - - :information_source: A personal access token with only the `admin:enterprise` scope is needed for enterprise-wide runner availability. There is a _ton_ of other ways to authenticate, detailed [here](https://github.com/actions-runner-controller/actions-runner-controller#setting-up-authentication-with-github-api). - -1. Create namespaces for the runners, one for production users and another (optionally) for testing the runners prior to making them available to users. - - ```shell - kubectl create namespace runners - kubectl create namespace test-runners - ``` - -1. Now give each namespace you created the login to the private registry that hosts the image used as a runner. - - ```shell - kubectl create secret docker-registry ghe -n runners --docker-server=https://docker.YOUR-GHE-URL --docker-username=SOME-USERNAME --docker-password=PAT-FOR-SOME-USERNAME --docker-email=EMAIL@GOES.HERE - ``` - - :information_source: In this case, we're using GitHub Packages on your instance of Enterprise Server or GitHub AE, but it doesn't _need_ to be that if your company has another registry already in place. This repository and the testing that's done all use the public images available in this repository. - -1. Now deploy one of the [deployments](../deployments), after editing it to attach to the correct enterprise/organization/repo, give it the appropriate resources you'd like, and use the desired image. - - ```shell - kubectl apply -f ubuntu-focal.yml - ``` - -:tada: Now enjoy an awesome automation experience! :tada: +Moved to From 2ce9f63c3a2bc517ea23e75b7eac95cf25126c64 Mon Sep 17 00:00:00 2001 From: Natalie Somersall Date: Wed, 9 Aug 2023 14:04:33 -0600 Subject: [PATCH 4/4] update docs --- README.md | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index b881be1..4d24f4a 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,12 @@ # Kubernoodles -> **Warning** -> -> :warning: There's a lot of work going on upstream in actions-runner-controller right now that will change quite a bit of the recommendations and defaults in this repository! This is now using the private preview APIs and will not work as expected for users as-is right now. Read more about the upcoming changes [here](https://github.com/actions/actions-runner-controller/pull/2153) :warning: -> -> GHES and GHEC users, please navigate back to tag [v0.9.6](https://github.com/some-natalie/kubernoodles/tree/v0.9.6) ([release](https://github.com/some-natalie/kubernoodles/releases/tag/v0.9.6)) for the APIs that'll work for you. :heart: +> GHES users prior to 3.9, please navigate back to tag [v0.9.6](https://github.com/some-natalie/kubernoodles/tree/v0.9.6) ([release](https://github.com/some-natalie/kubernoodles/releases/tag/v0.9.6)) for the APIs that'll work for you. :heart: Kubernoodles is a framework for managing custom self-hosted runners for GitHub Actions in Kubernetes at the enterprise-wide scale. The design goal is to easily bootstrap a system where customized self-hosted runners update, build, test, deploy, and scale themselves with minimal interaction from enterprise admins and maximum input from the developers using it. This is an _opinionated_ reference implementation, designed to be taken and modified to your liking. I use this to test GitHub Actions on my personal account, [GitHub Enterprise Cloud](https://github.com) (SaaS) or [GitHub Enterprise Server](https://docs.github.com/en/enterprise-server@latest) (self-hosted) from Docker Desktop, a Raspberry Pi cluster for `arm64`, a managed Kubernetes provider, and other random platforms as needed. Your implementation may look wildly different, etc. -:question: Are you a GitHub Enterprise admin that's new to GitHub Actions? Don't know how to set up self-hosted runners at scale? Start [here](docs/admin-introduction.md)! +:question: Are you a GitHub Enterprise admin that's new to GitHub Actions? Don't know how to set up self-hosted runners at scale? Start [here](https://some-natalie.dev/blog/arch-guide-to-selfhosted-actions/)! Pull requests welcome! :heart: @@ -24,31 +20,30 @@ Moving data around locally is exponentially cheaper and easier than pulling data ## Setup -The [admin introduction](docs/admin-introduction.md) walks you through some key considerations on _how_ to think about implementing GitHub Actions at the enterprise scale, the implications of those decisions, and why this project is generally built out the way it is. +The [admin introduction](https://some-natalie.dev/blog/arch-guide-to-selfhosted-actions/) walks you through some key considerations on _how_ to think about implementing GitHub Actions at the enterprise scale, the implications of those decisions, and why this project is generally built out the way it is. -The [admin setup](docs/admin-setup.md) is a mostly copy-and-paste exercise to get a basic deployment up and going. +The [admin setup](https://some-natalie.dev/blog/kubernoodles-pt-1) is a mostly copy-and-paste exercise to get a basic deployment up and going. -The [customization](docs/admin-customization.md) guide has a quick writeup and links to learn more about the ways you can customize things to your needs. +The [customization](https://some-natalie.dev/blog/kubernoodles-pt-5) guide has a quick writeup and links to learn more about the ways you can customize things to your needs. [Tips and tricks](docs/tips-and-tricks.md) has a few more considerations if things aren't quite going according to plan. ## Choosing the image(s) -There are currently 4 images that are "prebuilt" by this project, although you can certainly use others or build your own! All images assume that they are deployed with `ephemeral: true` by actions-runner-controller. If you're copy/pasting out of the [deployments](deployments), you should be set ... provided you give it the right repository/organization/enterprise to use! +There are currently 3 images that are "prebuilt" by this project, although you can certainly use others or build your own! All images assume that they are ephemeral. If you're copy/pasting out of the [deployments](deployments), you should be set ... provided you give it the right repository/organization/enterprise to use! | image name | base image | virtualization? | sudo? | notes | | --- | --- | --- | --- | --- | -| ubuntu-focal | [ubuntu:focal](https://hub.docker.com/_/ubuntu) | rootful Docker-in-Docker | passwordless sudo | | -| podman | [podman/stable:v4](https://quay.io/repository/podman/stable?tab=tags) | rootless Podman-in-Podman | nope | based on Fedora ([Containerfile](https://github.com/containers/podman/tree/main/contrib/podmanimage)) | -| rootless-ubuntu-focal | [ubuntu:focal](https://hub.docker.com/_/ubuntu) | rootless Docker-in-Docker | nope | [common rootless problems](docs/tips-and-tricks.md#rootless-images) | -| ubuntu-jammy | [ubuntu:jammy](https://hub.docker.com/_/ubuntu) | rootful Docker-in-Docker | passwordless sudo | | +| ubi8 | [ubi8-init:8.7](https://catalog.redhat.com/software/containers/ubi8-init/5c6aea74dd19c77a158f0892) | :x: | :x: | n/a | +| ubi9 | [ubi9-init:9.2](https://catalog.redhat.com/software/containers/ubi9-init/6183297540a2d8e95c82e8bd) | :x: | :x: | n/a | +| rootless-ubuntu-jammy | [ubuntu:jammy](https://hub.docker.com/_/ubuntu) | rootless Docker-in-Docker | nope | [common rootless problems](docs/tips-and-tricks.md#rootless-images) | ## Sources These are all excellent reads and can provide more insight into the customization options and updates than are available in this repository. This entire repository is mostly gluing a bunch of these other bits together and explaining how/why to make this your own. - GitHub's official [documentation](https://docs.github.com/en/actions/hosting-your-own-runners) on hosting your own runners. -- Kubernetes controller for self-hosted runners, on [GitHub](https://github.com/actions-runner-controller/actions-runner-controller), is the glue that makes this entire solution possible. +- Kubernetes controller for self-hosted runners, on [GitHub](https://github.com/actions/actions-runner-controller), is the glue that makes this entire solution possible. - Docker image for runners that can automatically join, which solved a good bit of getting the runner agent started automatically on each pod, [write up](https://sanderknape.com/2020/03/self-hosted-github-actions-runner-kubernetes/) and [GitHub](https://github.com/SanderKnape/github-runner). - GitHub's repository used to generate the hosted runners' images ([GitHub](https://github.com/actions/virtual-environments)), where I got the idea of using shell scripts to layer discrete dependency management on top of a base image. The [software](../images/software) scripts are (mostly) copy/pasted directly out of that repo.