Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an autoscaling group for the docs-rs-builder #243

Merged
merged 2 commits into from
Mar 7, 2023

Conversation

rylev
Copy link
Member

@rylev rylev commented Feb 16, 2023

This adds an autoscaling group for the docs-rs-builder.

Currently this works by grabbing the latest docs-rs-builder AMI, creating a launch template with that AMI, and then using that template to make an autoscaling group.

I'm not sure we want to actually deploy this way in the long term, but I think it's a good start for testing how the autoscaling group behaves in practice.

device_name = "/dev/sda1"

ebs {
volume_size = 64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the current filesystem usage (~100 GB) I would prefer something at least double this size.

( of course the current usage also includes the database & some web cache)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we didn't need so much storage. I believe a large part of that storage is only needed during a single crate's build and afterwards can be deleted, no? Would it be possible to add some clean up to the builder process so that the filesystem usage doesn't grow so large?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. we are cleaning up after the build.

Plus the cleanup tasks for docker images, which are in cron right now.
( btw, cc @Nemo157 @jyn514 , these cronjobs would need to be configured in our ansible images too, right? )

Only looking at the above I could totally imagine to just try with the current definition above, let the builder build, and watch how much space is used. ( assuming the big docker image is configured?)

But, we're also planning on adding some build artifact caching: rust-lang/docs.rs#1757
( of course we could increase storage only then, when that feature is finished)

Copy link
Member

@Nemo157 Nemo157 Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we just have a daily cronjob (systemd-timer) running docker container prune --force && docker image prune --force (and cargo-sweep which shouldn't be necessary if we rebuild the image for a new version?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the cronjob currently configured? I can add this to the Ansible configuration (though I wouldn't block merging this).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in

/etc/systemd/system/prune-disk-space.service
/etc/systemd/system/prune-disk-space.timer

@syphar
Copy link
Member

syphar commented Feb 16, 2023

two questions:

  • I assume the actual autoscaling is handled differently? I mean fetching the metric & scaling based on it?
  • also, metrics fetching from prometheus will also be added later?

@rylev
Copy link
Member Author

rylev commented Feb 20, 2023

I assume the actual autoscaling is handled differently?

Correct, the autoscaling currently is whatever the default is for ec2 instance health checks which I believe is super basic (e.g., if the instance goes into serviced mode). This is definitely not what we want, but I'd like to do that as a follow-up PR.

metrics fetching from prometheus will also be added later?

Correct

@rylev rylev requested a review from jdno February 20, 2023 10:23
@rylev rylev force-pushed the builder-autoscale branch from 68709ae to f4a394b Compare February 20, 2023 13:34
@rylev rylev force-pushed the builder-autoscale branch from f4a394b to 743d0e8 Compare February 21, 2023 09:03
Comment on lines +28 to +29
min_num_builder_instances = 1
max_num_builder_instances = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what the autoscaling does when you've pinned it to always be at one instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It assures that there's one healthy instance. So if one instance stops or gets terminated a new one boots.

@rylev rylev merged commit e83cf54 into rust-lang:master Mar 7, 2023
@rylev rylev deleted the builder-autoscale branch March 7, 2023 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants