Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(disk): use an official GCP image on CI VMs for disk auto-resizing, make CI & CD disks 300GB #5371

Merged
merged 10 commits into from
Oct 16, 2022

Conversation

gustavovalverde
Copy link
Member

@gustavovalverde gustavovalverde commented Oct 10, 2022

Previous behavior

We've presented issues in the past with resizing as the device is busy, for example:

e2fsck: Cannot continue, aborting.
/dev/sdb is in use.

Depends-On: #5370
Fixes #5085

Expected behavior

We've been manually resizing the disk as this task was not being done automatically, but having an official Public Image from GCP would make this easier (automatic) and it also integrates better with other GCP services

Configuration differences: https://cloud.google.com/compute/docs/images/os-details#notable-difference-debian

Solution

Why is this best solution to the problem?

  • We've been using a custom image for VMs, which requires us to create tooling already implemented in official public images by GCP
  • Resizing is one of the examples why we may want to change our custom image to an official GCP public image

How are we going to test this?

  • Resizing the disks and validating CI is working

What manual testing has been done?

  • The disk have been resized in this PR

Review

Anyone from DevOps, but @teor2345 has interacted with this changes before

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
  • How do you know it works? Does it have tests?

Previous behavior:
We've presented issues in the past with resizing as the device is busy,
for example:

```
e2fsck: Cannot continue, aborting.
/dev/sdb is in use.
```

Expected behavior:
We've been manually resizing the disk as this task was not being done
automatically, but having an official Public Image from GCP would make
this easier (automatic) and it also integrates better with other GCP
services

Configuration differences: https://cloud.google.com/compute/docs/images/os-details#notable-difference-debian

Solution:
- Use `debian-11` from the official public images https://cloud.google.com/compute/docs/images/os-details#debian
- Remove the manual disk resizing from the pipeline
@gustavovalverde gustavovalverde self-assigned this Oct 10, 2022
@github-actions github-actions bot added C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels Oct 10, 2022
@gustavovalverde gustavovalverde added A-infrastructure Area: Infrastructure changes A-devops Area: Pipelines, CI/CD and Dockerfiles P-High 🔥 I-integration-fail Continuous integration fails, including build and test failures and removed C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels Oct 10, 2022
@gustavovalverde gustavovalverde changed the base branch from revert-ssh-fix to main October 10, 2022 22:12
@teor2345
Copy link
Contributor

I'm going to ask the new CI change questions from the retro:

  • how are we sure that this change is the best solution to the problem?
  • how are we going to test this?
  • what manual testing has been done?

Let's see if they help us make these changes work.

@gustavovalverde
Copy link
Member Author

Personal note: Consider the changes done here while testing -> #5367 (comment)

@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Oct 11, 2022
Some GCP disk images are 160 GB, which means they could get to the current
200 GB size soon.
@gustavovalverde gustavovalverde marked this pull request as ready for review October 11, 2022 13:34
@gustavovalverde gustavovalverde requested a review from a team as a code owner October 11, 2022 13:34
@gustavovalverde gustavovalverde requested review from teor2345 and removed request for a team October 11, 2022 13:34
@gustavovalverde
Copy link
Member Author

  • how are we sure that this change is the best solution to the problem?
  • how are we going to test this?
  • what manual testing has been done?

Added to the PR description

@teor2345
Copy link
Contributor

Configuration differences: https://cloud.google.com/compute/docs/images/os-details#notable-difference-debian

Some things to note, no blockers:

IPv6 is enabled.

This might change sync speeds, it could be faster.

The Unattended-upgrades package is installed and configured to download and install Debian security updates daily. This can be configured or disabled by changing the values in /etc/apt/apt.conf.d/50unattended-upgrades and /etc/apt/apt.conf.d/02periodic.

This is probably unnecessary, if it causes problems, it will mainly impact the full sync.

The SSH server configuration is set up as follows:
Root login is disabled.

This change should be fine if CI passes.

@teor2345 teor2345 changed the title ci(disk): use an official GCP image on CI VMs for disk auto-resizing ci(disk): use an official GCP image on CI VMs for disk auto-resizing, make CI & CD disks 300GB Oct 12, 2022
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has about the same CI speed as before this change, both around 6 hours.

This PR:
https://github.com/ZcashFoundation/zebra/actions/runs/3227078141

main branch:
https://github.com/ZcashFoundation/zebra/actions/runs/3222101502

@gustavovalverde
Copy link
Member Author

@Mergifyio refresh

@mergify
Copy link
Contributor

mergify bot commented Oct 13, 2022

refresh

✅ Pull request refreshed

@gustavovalverde
Copy link
Member Author

Merging manually as this has been stalled for a while

@gustavovalverde gustavovalverde merged commit 25b46ea into main Oct 16, 2022
@gustavovalverde gustavovalverde deleted the fix-disk-repartitioning branch October 16, 2022 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles A-infrastructure Area: Infrastructure changes C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG I-integration-fail Continuous integration fails, including build and test failures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increase GCP disk size to 300 GB
2 participants