-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2907] Queue config processing log spew #1009
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1009 +/- ##
==========================================
- Coverage 82.34% 82.20% -0.14%
==========================================
Files 97 97
Lines 15627 15674 +47
==========================================
+ Hits 12868 12885 +17
- Misses 2479 2510 +31
+ Partials 280 279 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kousei47747 thanks for this patch. There is one small question remaining. Please take a look.
pkg/scheduler/objects/queue.go
Outdated
return queue, err | ||
} | ||
|
||
// NewConfiguredShadowQueue creates a new queue from scratch based on the configuration and logs at debug level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kousei47747, thank you for your contribution!
Could you provide more details about the differences between NewConfiguredQueue
and NewConfiguredShadowQueue
?
Currently, it appears the only difference is the log level, which may be confusing for those unfamiliar with shadow queues.
It would be helpful to explain when developers should use NewConfiguredQueue
versus NewConfiguredShadowQueue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blueBlue0102 thanks for reviewing! I will provide more detailed description in the new patch.
I dropped a large comment into the jira to provide some guidance and point out another issue. |
@wilfred-s thanks for reviewing! I've replied on jira, please take a look. |
I’ve updated the PR based on the guidance and noticed that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. I'm not sure about the current approach. But if you keep it, please make sure it's tested properly. PartitionContext
does not have the best unit test coverage (at least direct coverage), but new code should definitely be covered.
@@ -79,7 +79,29 @@ type PartitionContext struct { | |||
locking.RWMutex | |||
} | |||
|
|||
// newPartitionContextForValidation initializes a shadow partition based on the configuration. | |||
// The shadow partition is used to validate the configuration, it is not used for scheduling. | |||
func newPartitionContextForValidation(conf configs.PartitionConfig, rmID string, cc *ClusterContext) (*PartitionContext, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value *PartitionContext
is never used from this function, so you might as well just remove it and just call it validateConfiguration()
. Unless of course, you create unit tests which actually use it for some kind of verification... See other comments.
// We need to pass in the locked version of the GetQueue function. | ||
// Placing an application will not have a lock on the partition context. | ||
pc.placementManager = placement.NewPlacementManager(conf.PlacementRules, pc.GetQueue) | ||
// get the user group cache for the partition | ||
pc.userGroupCache = security.GetUserGroupCache("") | ||
pc.updateNodeSortingPolicyForValidation(conf) | ||
pc.updatePreemption(conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are unnecessary here. No return values are checked, so whether they run or not is irrelevant.
return err | ||
} | ||
|
||
func (pc *PartitionContext) addQueueInternal(conf []configs.QueueConfig, parent *objects.Queue, newQueueFn func(configs.QueueConfig, *objects.Queue) (*objects.Queue, error)) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered passing down a boolean flag validate
? All these method duplications - it's a bit dubious to me. For example, passing a function pointer to run code whether it's a validation or not just doesn't seem right.
NewConfiguredQueue()
definitely has some callers, but it's not the end of the world. I don't know where others stand on this, but I'm in favor of a flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbacsko Thanks for reviewing! I'm not sure about these method duplications either.
My first thought is like you said that NewConfiguredQueue() has some callers, adding a flag would require all of them to include an additional parameter. But after implementing these method duplications, the code feels redundant, and it turns out that more test cases need to be covered.
I'll try the flag approach. Thanks a lot, my confusion is now cleared!
What is this PR for?
Logs the shadow queue structure created during configuration update at debug level with a different message to prevent log spew and avoid misleading information.
What type of PR is it?
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-2907