Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for capacity buffer #39

Merged
merged 18 commits into from
Jul 30, 2019
Merged

Adding support for capacity buffer #39

merged 18 commits into from
Jul 30, 2019

Conversation

jones2026
Copy link
Contributor

This is to enable the drone autoscaler to have standby capacity ready so you can have warm instances before scaling is needed. This will help avoid builds waiting on nodes to be provisioned and should not affect the normal operation of the autoscaler if you do not want to use this feature.

@tboerger or @bradrydzewski let me know if you have any concerns or suggestions

Copy link
Contributor

@tboerger tboerger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why you changed the drone config, but on the first view that LGTM

@bradrydzewski
Copy link
Member

bradrydzewski commented Jul 4, 2019

I would be ok with changing the default behavior so that when the autoscaler starts, it immediately creates the min number of servers set via DRONE_POOL_MIN. This seems to be how most people expect it to work anyway and it slightly simplifies the implementation by not adding a new configuration parameter.

engine/planner.go Outdated Show resolved Hide resolved
@jones2026
Copy link
Contributor Author

I would be ok with changing the default behavior so that when the autoscaler starts, it immediately creates the min number of servers set via DRONE_POOL_MIN. This seems to be how most people expect it to work and it (slightly) simplifies the implementation.

While this change might accidentally accomplish that (I wasn't actually even thinking of that when I made this), it really is more to prevent builds queuing and to proactively spin up a new instance ahead of time to have it ready before builds are actually waiting. It won't stop all queuing, but it will add a buffer to reduce what I am hoping is the majority of it.

As I think this over though, I could accomplish a similar outcome by increasing the DRONE_POOL_MIN_AGE to more than the default, then the majority of the queuing would result in the morning when builds historically start to ramp up and should reduce most queuing throughout the remainder of the day. Really it's just enabling the choice of being only reactive to demand or slightly proactive with the fallback of still being reactive if demand exceeds that DRONE_STANDBY_CAPACITY

@jones2026
Copy link
Contributor Author

Switched it from DRONE_STANDBY_CAPACITY to use instead DRONE_POOL_STANDBY_CAPACITY

Thinking that might be more clear what this is used for?

@jones2026
Copy link
Contributor Author

I don't get why you changed the drone config, but on the first view that LGTM

@tboerger, sorry I have a habit of just running go fmt and drone fmt anytime I work on things and this added those extra unwanted changes to this PR. I have reverted files I didn't want to actually change.

@bradrydzewski
Copy link
Member

@jones2026 ok I think I get it. Is the goal to enable always having a little extra capacity instead of the exact capacity? For example:

  • warm count is 1, min server count is 2, current demand is for 0 servers. 2 servers are provisioned
  • warm count is 1, min server count is 2, current demand is for 3 servers. 4 servers are provisioned (3 servers to meet current demand + 1 warm instance)
  • warm count is 1, max server count is 5, current demand is for 5 servers, 5 servers are provisioned (warm instance is ignored to prevent exceeding max)

Am I understanding correctly? And does this cover all the permutations? Lets make sure we have a unit test for each permutation as well.

Thanks for the pull request, and just to let you know, I'm going to be traveling today and tomorrow so my replies may be delayed.

@jones2026
Copy link
Contributor Author

jones2026 commented Jul 4, 2019

@bradrydzewski no worries, I honestly didn't expect any replies today!

That is exactly what I was going for, except the current implementation in my PR is around server capacity (i.e. concurrency * number of servers) and not the number of actual servers.

Do you think it would be clearer to switch it to number of warm server instances instead of spare capacity? (Once we decide whether we think it's best for standby servers or standby capacity I will add all the permutations to the test)

@bradrydzewski
Copy link
Member

do you think it would be clearer to switch it to number of warm server instances instead of spare capacity

Sorry for the delayed reply. I went back and forth on this. I think both approaches could work just fine. I think capacity is more granular and therefore makes more sense.

In terms of variable names, I think using something like DRONE_CAPACITY_BUFFER could be a good option. The DRONE_POOL_ variables deal with instance counts as opposed to capacity which could cause some confusion.

I think overall this looks good. Once we have the additional unit tests in place we should be all set :)

@jones2026 jones2026 changed the title Adding support for standby capacity Adding support for capacity buffer Jul 28, 2019
@jones2026
Copy link
Contributor Author

@bradrydzewski I updated the variable name and added tests for the other permutations you mentioned above. Let me know if you see any other issues.

@bradrydzewski bradrydzewski merged commit e5184ab into drone:master Jul 30, 2019
@bradrydzewski
Copy link
Member

Thanks for this. I also updated the documentation accordingly:
https://autoscale.drone.io/reference/drone-capacity-buffer/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants