Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-1793] Handle placement rule and queue changes during initialisation #601

Closed
wants to merge 5 commits into from

Conversation

craigcondit
Copy link
Contributor

What is this PR for?

Adds a tag to mark applications as forced create. This allows most validation to be suppressed and if necessary, will redirect the app to a recovery queue.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-1793

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@craigcondit craigcondit self-assigned this Jul 31, 2023
@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Merging #601 (63633a3) into master (a4b2ebb) will increase coverage by 0.10%.
Report is 2 commits behind head on master.
The diff coverage is 76.50%.

@@            Coverage Diff             @@
##           master     #601      +/-   ##
==========================================
+ Coverage   77.46%   77.56%   +0.10%     
==========================================
  Files          77       78       +1     
  Lines       12927    12977      +50     
==========================================
+ Hits        10014    10066      +52     
+ Misses       2593     2591       -2     
  Partials      320      320              
Files Changed Coverage Δ
pkg/scheduler/context.go 29.61% <0.00%> (ø)
pkg/scheduler/placement/rule.go 93.93% <ø> (ø)
pkg/scheduler/partition.go 78.03% <50.00%> (+0.19%) ⬆️
pkg/scheduler/placement/placement.go 81.66% <71.42%> (-1.55%) ⬇️
pkg/scheduler/placement/fixed_rule.go 77.94% <77.77%> (+0.32%) ⬆️
pkg/scheduler/placement/user_rule.go 78.57% <77.77%> (+0.38%) ⬆️
pkg/common/security/usergroup.go 84.44% <78.57%> (-1.27%) ⬇️
pkg/scheduler/placement/provided_rule.go 80.32% <80.00%> (+0.32%) ⬆️
pkg/scheduler/placement/tag_rule.go 80.59% <80.00%> (+0.29%) ⬆️
pkg/common/utils.go 89.41% <84.61%> (-0.36%) ⬇️
... and 4 more

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

pkg/scheduler/partition.go Outdated Show resolved Hide resolved
@craigcondit craigcondit force-pushed the YUNIKORN-1793 branch 7 times, most recently from fdfde49 to 244d52c Compare August 3, 2023 17:19
@craigcondit
Copy link
Contributor Author

Updated approach to move the queue placement logic entirely within the placeholder manager and incorporated YUNIKORN-19 (always use placement manager). This simplifies the logic considerably as the placement manager is now always active and if it returns a queue, that queue is always to be created if it does not exist.

Copy link
Contributor

@wilfred-s wilfred-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 'recovery' rule (not available in the configurable list) pointing to a fixed queue (cannot be created in the config) which has a fixed ACL matching to an invalid user that cannot be specified all based on a tag that the k8shim always sets based on the state of init will I think simplify this solution.

pkg/scheduler/objects/application.go Show resolved Hide resolved
pkg/scheduler/objects/queue.go Show resolved Hide resolved
if err != nil {
return fmt.Errorf("failed to create rule based queue %s for application %s", queueName, appID)
if common.IsRecoveryQueue(queueName) {
queue, err = pc.createRecoveryQueue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that the recovery queue create should ever fail. It needs to always be there or be created otherwise we are in the same situation with an unrecoverable application again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not fail, however it's still possible (and is most definitely a bug if it does). Nothing configuration-related is checked, only sanity checks such as the root queue being provided properly.

pkg/common/security/usergroup.go Show resolved Hide resolved
pkg/scheduler/placement/provided_rule.go Outdated Show resolved Hide resolved
@craigcondit
Copy link
Contributor Author

Rebased on latest master.

…lisation

Adds a tag to mark applications as forced create. This allows most validation to be
suppressed and if necessary, will redirect the app to a recovery queue.
Refactored CreateRecoveryQueue / CreateDynamicQueue implementation
Recovery queue now always fails ACL checks so it can only be
submitted to if the recovery rule triggers it.
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM (after Wilfred's comments are addressed)

@craigcondit
Copy link
Contributor Author

@wilfred-s I believe all the comments have been addressed. Can you do a final review?

@craigcondit craigcondit deleted the YUNIKORN-1793 branch August 30, 2023 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants