Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[📦 NEW] -m, --minimum-kill-interval #69

Closed
wants to merge 5 commits into from
Closed

[📦 NEW] -m, --minimum-kill-interval #69

wants to merge 5 commits into from

Conversation

andrei-pavel
Copy link
Contributor

Implements #67

It uses the time intervals introduced with the whitelisting feature. The way it works is once a kill time has been decided, an amount of time between the minimum kill interval before and after the decided kill time is blacklisted. This is required so that the two features can work together and from that point of view, the solution is quite elegant in my opinion. Note that these time intervals are used even when whitelists or blacklists are not used so whitelists don't have to be used in order for minimum kill interval to work.

However, it does still have one quirk. It will work as intended on the first sweep. But when a node is selected the second time for scheduling and from that time onwards, the respective blacklists will extraneously munch away from the active interval in which kill times could have been decided, eventually the interval reaching zero, moment at which it will panic saying that minimum kill interval is too high, even though that's not the case. I haven't tested this, I only know this theoretically.

The fix might be to schedule a removal of the blacklist interval when the node has died. But since the node might die from external factors (correct me if I'm wrong), this is not the best solution.

Otherwise, the best fix might be to look at all the nodes like @JorritSalverda said in the issue #67 (comment).

I'm unavailable for a few days, I'll sleep on it, I think this pull request is still viable for integration in master as is.

@andrei-pavel
Copy link
Contributor Author

Fixed accrual of blacklisting on consecutive processing of node groups. If minimum kill time interval proves too high for a node, reset and try again for the remaining nodes. if it happens again during the same node group, it means that miminum kill interval is indeed too high.

Please review.

Here's a stress test run.

$ ./scripts/all-in-one-test -w '12:00 - 17:00' -m '2h' -i 10

Annotations:        estafette.io/gke-preemptible-killer-state: 2020-03-03T14:31:54Z
CreationTimestamp:  Mon, 02 Mar 2020 21:34:37 +0200
Annotations:        estafette.io/gke-preemptible-killer-state: 2020-03-03T16:31:19Z
CreationTimestamp:  Mon, 02 Mar 2020 21:34:37 +0200
Annotations:        estafette.io/gke-preemptible-killer-state: 2020-03-03T12:27:27Z
CreationTimestamp:  Mon, 02 Mar 2020 21:34:37 +0200

@JorritSalverda JorritSalverda changed the base branch from master to main September 23, 2020 15:32
@mykhailosmolynets
Copy link

Hi, any updates when this change will be merged?

@andrei-pavel
Copy link
Contributor Author

This might have something to do with why no effort is put into MRs anymore:

#6 (comment)

moved away from preemptibles for the time being

sad face :(

we did abandon preemptibles for a while since the pressure on europe-west1 mounted and preemptions became more commonplace.

@andrei-pavel andrei-pavel closed this by deleting the head repository Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants