Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move off rust-lang-ci #188

Open
1 of 8 tasks
Kobzol opened this issue Dec 19, 2024 · 16 comments
Open
1 of 8 tasks

Move off rust-lang-ci #188

Kobzol opened this issue Dec 19, 2024 · 16 comments

Comments

@Kobzol
Copy link

Kobzol commented Dec 19, 2024

We would like to get rid of the rust-lang-ci organization, which should no longer be needed:

  • We can now configure secrets on GH per branch, so we can configure secrets only for the auto and try branches, not for all CI.
  • We get 500 concurrent jobs across all repos in the rust-lang organization, so that should be enough to run all CI in a single org.
  • We no longer use self-hosted runners.

The following (probably non-exhaustive) list below tracks what needs to be done to get rid of rust-lang-ci:

This idea was discussed on Zulip.

@jdno jdno added this to infra-team Dec 20, 2024
@github-project-automation github-project-automation bot moved this to Backlog in infra-team Dec 20, 2024
@jdno jdno moved this from Backlog to Ready in infra-team Dec 20, 2024
@pietroalbini
Copy link
Member

A missing thing is that rustup also refers to the docker images built by rust-lang/rust's CI.

@Kobzol
Copy link
Author

Kobzol commented Jan 12, 2025

That's the same thing as Switch CI to use Docker ghcr.io caches from the rust-lang organization; rust's CI writes the image tag into a file stored at S3, and rustup downloads it. Whatever is written by Rust's CI into that file will determine from where rustup downloads the Docker images (be it rust-lang or rust-lang-ci's ghcr.io hub).

@Mark-Simulacrum
Copy link
Member

Just to check, even if we did use self-hosted runners I think those are also fine in a single org now (GitHub upstream made changes for that), right?

@pietroalbini
Copy link
Member

Just to check, even if we did use self-hosted runners I think those are also fine in a single org now (GitHub upstream made changes for that), right?

Yes. If the runners are placed in a runner group, it's possible to restrict which workflows can access them. We could then restrict them to rust-lang/rust/.github/workflows/ci.yml@auto to only allow them to execute ci.yml on the auto branch.

@marcoieni
Copy link
Member

marcoieni commented Feb 4, 2025

Configure secrets for the auto and try branches on rust-lang/rust.

It looks like I need to create a new GitHub Environment to configure the secret. Should I call this environment bors?

Make sure that the homu bors GitHub account has access to the auto and try branches on rust-lang/rust.

Added these two branch protection rules to rust-lang/rust:

Image

both have the following settings (which are the same of master, too):

Image

I checked this box in the first comment of this issue 👍

@Kobzol
Copy link
Author

Kobzol commented Feb 4, 2025

Couldn't it be a repo secret, as opposed to environment secret? But an env. secret is also fine, bors sounds like a reasonable name.

@marcoieni
Copy link
Member

I thought we wanted to configure these secrets only for the try and auto branches.

A repository secret can be read from any branch, right?

@Kobzol
Copy link
Author

Kobzol commented Feb 5, 2025

Right. Makes sense, environment it is, then :) I suppose that the lack of environment/per-branch secrets was one of the reasons why rust-lang-ci was needed in the first place, originally.

@marcoieni
Copy link
Member

I created the environment and added the first secret

Image

@marcoieni
Copy link
Member

Secrets

  • DATADOG_API_KEY: I copied it from the datadog dashboard. There were many API keys but I copied the one that was created when the rust-lang-ci secret was set.
  • TOOLSTATE_REPO_ACCESS_TOKEN: How do I create this token? There are some instructions here but they are probably outdated and I wonder if "public_repo" permission is still enough. The token is used to create issues here and write PR comments here. I found in 1password a "rust-toolstate 2021-02-22" token in the rust-highfive user. I can try to copy it.
  • there are also various aws related secrets which I'm investigating if I can read them somehow from terraform. If not, I will investigate what permissions they should have and create new ones. Of course let me know if you remember more details about the aws secrets 👍

@marcoieni
Copy link
Member

I found the iam users for the aws keys 👍

Image

I will create a new access key for these two users.
I'll delete old unused keys because we can have at maximum 2 keys per user.
For example, one key in the artifacts account was never used:

Image

@marcoieni
Copy link
Member

marcoieni commented Feb 6, 2025

I didn't realize that the keys I deleted where created from terraform in the simpleinfra/terraform/rustc-ci module. The keys deletion resulted in a terraform plan diff:


Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.public.module.artifacts_user.aws_iam_access_key.ci has been deleted
  - resource "aws_iam_access_key" "ci" {
      - id                   = "AKIA46X5W6CZDRNFRBJK" -> null
      - secret               = (sensitive value) -> null
        # (4 unchanged attributes hidden)
    }

  # module.public.module.caches_user.aws_iam_access_key.ci has been deleted
  - resource "aws_iam_access_key" "ci" {
      - id                   = "AKIA46X5W6CZGPEWL7UN" -> null
      - secret               = (sensitive value) -> null
        # (4 unchanged attributes hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or
respond to these changes.

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.public.module.artifacts_user.aws_iam_access_key.ci will be created
  + resource "aws_iam_access_key" "ci" {
      + create_date                    = (known after apply)
      + encrypted_secret               = (known after apply)
      + encrypted_ses_smtp_password_v4 = (known after apply)
      + id                             = (known after apply)
      + key_fingerprint                = (known after apply)
      + secret                         = (sensitive value)
      + ses_smtp_password_v4           = (sensitive value)
      + status                         = "Active"
      + user                           = "ci--rust-lang--rust--artifacts"
    }

  # module.public.module.artifacts_user.aws_iam_user.ci will be updated in-place
  ~ resource "aws_iam_user" "ci" {
        id            = "ci--rust-lang--rust--artifacts"
        name          = "ci--rust-lang--rust--artifacts"
      ~ tags          = {
          - "AKIA46X5W6CZAWZRBC7B" = "Created by Marco when migrating from rust-lang-ci org. Stored in GitHub Actions environments secrets." -> null
        }
      ~ tags_all      = {
          - "AKIA46X5W6CZAWZRBC7B" = "Created by Marco when migrating from rust-lang-ci org. Stored in GitHub Actions environments secrets." -> null
        }
        # (4 unchanged attributes hidden)
    }

  # module.public.module.artifacts_user.github_actions_secret.aws_access_key_id must be replaced
-/+ resource "github_actions_secret" "aws_access_key_id" {
      ~ created_at      = "2021-10-17 14:40:34 +0000 UTC" -> (known after apply)
      ~ id              = "rust:ARTIFACTS_AWS_ACCESS_KEY_ID" -> (known after apply)
      ~ plaintext_value = (sensitive value) # forces replacement
      ~ updated_at      = "2021-10-17 14:40:34 +0000 UTC" -> (known after apply)
        # (2 unchanged attributes hidden)
    }

  # module.public.module.artifacts_user.github_actions_secret.aws_secret_access_key must be replaced
-/+ resource "github_actions_secret" "aws_secret_access_key" {
      ~ created_at      = "2021-10-17 14:40:32 +0000 UTC" -> (known after apply)
      ~ id              = "rust:ARTIFACTS_AWS_SECRET_ACCESS_KEY" -> (known after apply)
      ~ plaintext_value = (sensitive value) # forces replacement
      ~ updated_at      = "2021-10-17 14:40:32 +0000 UTC" -> (known after apply)
        # (2 unchanged attributes hidden)
    }

  # module.public.module.caches_user.aws_iam_access_key.ci will be created
  + resource "aws_iam_access_key" "ci" {
      + create_date                    = (known after apply)
      + encrypted_secret               = (known after apply)
      + encrypted_ses_smtp_password_v4 = (known after apply)
      + id                             = (known after apply)
      + key_fingerprint                = (known after apply)
      + secret                         = (sensitive value)
      + ses_smtp_password_v4           = (sensitive value)
      + status                         = "Active"
      + user                           = "ci--rust-lang--rust--caches"
    }

  # module.public.module.caches_user.github_actions_secret.aws_access_key_id must be replaced
-/+ resource "github_actions_secret" "aws_access_key_id" {
      ~ created_at      = "2021-10-17 14:40:34 +0000 UTC" -> (known after apply)
      ~ id              = "rust:CACHES_AWS_ACCESS_KEY_ID" -> (known after apply)
      ~ plaintext_value = (sensitive value) # forces replacement
      ~ updated_at      = "2021-10-17 14:40:34 +0000 UTC" -> (known after apply)
        # (2 unchanged attributes hidden)
    }

  # module.public.module.caches_user.github_actions_secret.aws_secret_access_key must be replaced
-/+ resource "github_actions_secret" "aws_secret_access_key" {
      ~ created_at      = "2021-10-17 14:40:32 +0000 UTC" -> (known after apply)
      ~ id              = "rust:CACHES_AWS_SECRET_ACCESS_KEY" -> (known after apply)
      ~ plaintext_value = (sensitive value) # forces replacement
      ~ updated_at      = "2021-10-17 14:40:32 +0000 UTC" -> (known after apply)
        # (2 unchanged attributes hidden)
    }

Plan: 6 to add, 1 to change, 4 to destroy.

I discussed this with JD.

Learnings

  • The GitHub CI was using the legacy keys. The once I deleted are the ones created from the gha-iam-user module

Questions

  • why the CI didn't switch to the new keys?
  • In the rust-lang-ci/rust GitHub Actions secrets we have four AWS_SECRET_ACCESS_KEY_{ID} secrets but it seems we only use two of them. Am I missing something and maybe we need multiple keys because we have different permissions associated to these keys? Why are we using the {ID} technique instead of using the secret ARTIFACTS_AWS_SECRET_ACCESS_KEY (which seems unused): imo it's a simpler approach.

Plan

Here's how we want to move forward:

  1. Leave the terraform module simpleinfra/terraform/rustc-ci dirty. I.e. don't touch it for now.
  2. Create a new terragrunt module where I copy paste the parts I need for rust-lang/rust (e.g. I don't need the difference between repo and source anymore). I will use rustc-ci--rust-lang--rust as a iam_prefix to avoid conflicts with existing resources.
  3. import the buckets into the terragrunt state (with terraform import). Instructions here.
  4. Apply the module to the bors-kindergarden repo to test if the access keys are applied correctly. Jakub, does it make sense to treat bors-kindergarden as the "staging" environment of the rust CI?
    EDIT: I'm not sure this makes sense as bors-kindergarden should test bors, not the entire rust ci (with the caches and artifacts).
  5. Apply the same module to rust-lang/rust
  6. Migrate rust-lang-ci
  7. Delete everything from the simpleinfra/terraform/rustc-ci module, except the s3 bucket from this module.

Wdyt?

@marcoieni
Copy link
Member

  • Create try-perf and perf-tmp branches/branch protections in rust-lang/rust.

Regarding this, what rules should I set? rust-lang-ci/rust doesn't have any branch protection rules.

Also, to create the branches, should I just git checkout -b try-perf && git push?

@Kobzol
Copy link
Author

Kobzol commented Feb 8, 2025

If memory serves me right, rust-lang/rust has rules that prohibit the creation of new branches. Therefore we need to explicitly create protections for branches that can be created, and then perhaps allow force-pushing and disallow deletion for them. I'm surprised that rust-lang-ci/rust doesn't have any branch protections though.

@Mark-Simulacrum Do you recall which branch protections we should configure for auto/try/try-perf/perf-tmp branches?

@Mark-Simulacrum
Copy link
Member

I don't, no, but I'd expect to be able to copy from rust-lang/rust? That has these branch protections right now (not pulling up all the details but @marcoieni has access to poke I imagine):

Image

My sense is that we'll need force-push allowed for bors' account(s) at minimum.

@Kobzol
Copy link
Author

Kobzol commented Feb 8, 2025

Well, we are migrating from rust-lang-ci to rust-lang/rust, so I assumed that we'll copy in the opposite direction 😅 But if rust-lang-ci doesn't have any branch protections, then probably we can just copy whatever try/auto has and use that for the try-perf branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready
Development

No branches or pull requests

4 participants