Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2907] Queue config processing log spew #1009

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kousei47747
Copy link

What is this PR for?

Logs the shadow queue structure created during configuration update at debug level with a different message to prevent log spew and avoid misleading information.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2907

Copy link

codecov bot commented Dec 28, 2024

Codecov Report

Attention: Patch coverage is 40.35088% with 34 lines in your changes missing coverage. Please review.

Project coverage is 82.20%. Comparing base (a351764) to head (7476680).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
pkg/scheduler/partition.go 31.11% 31 Missing ⚠️
pkg/scheduler/objects/queue.go 81.81% 2 Missing ⚠️
pkg/scheduler/context.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1009      +/-   ##
==========================================
- Coverage   82.34%   82.20%   -0.14%     
==========================================
  Files          97       97              
  Lines       15627    15674      +47     
==========================================
+ Hits        12868    12885      +17     
- Misses       2479     2510      +31     
+ Partials      280      279       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kousei47747 thanks for this patch. There is one small question remaining. Please take a look.

pkg/scheduler/partition.go Outdated Show resolved Hide resolved
return queue, err
}

// NewConfiguredShadowQueue creates a new queue from scratch based on the configuration and logs at debug level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kousei47747, thank you for your contribution!

Could you provide more details about the differences between NewConfiguredQueue and NewConfiguredShadowQueue?
Currently, it appears the only difference is the log level, which may be confusing for those unfamiliar with shadow queues.

It would be helpful to explain when developers should use NewConfiguredQueue versus NewConfiguredShadowQueue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blueBlue0102 thanks for reviewing! I will provide more detailed description in the new patch.

@kousei47747 kousei47747 marked this pull request as draft December 29, 2024 08:27
@wilfred-s
Copy link
Contributor

I dropped a large comment into the jira to provide some guidance and point out another issue.

@kousei47747
Copy link
Author

@wilfred-s thanks for reviewing! I've replied on jira, please take a look.

@kousei47747
Copy link
Author

I’ve updated the PR based on the guidance and noticed that NewPlacementManager also logs. Should it be silenced too?

@kousei47747 kousei47747 marked this pull request as ready for review January 2, 2025 07:24
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. I'm not sure about the current approach. But if you keep it, please make sure it's tested properly. PartitionContext does not have the best unit test coverage (at least direct coverage), but new code should definitely be covered.

@@ -79,7 +79,29 @@ type PartitionContext struct {
locking.RWMutex
}

// newPartitionContextForValidation initializes a shadow partition based on the configuration.
// The shadow partition is used to validate the configuration, it is not used for scheduling.
func newPartitionContextForValidation(conf configs.PartitionConfig, rmID string, cc *ClusterContext) (*PartitionContext, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value *PartitionContext is never used from this function, so you might as well just remove it and just call it validateConfiguration(). Unless of course, you create unit tests which actually use it for some kind of verification... See other comments.

Comment on lines +147 to +153
// We need to pass in the locked version of the GetQueue function.
// Placing an application will not have a lock on the partition context.
pc.placementManager = placement.NewPlacementManager(conf.PlacementRules, pc.GetQueue)
// get the user group cache for the partition
pc.userGroupCache = security.GetUserGroupCache("")
pc.updateNodeSortingPolicyForValidation(conf)
pc.updatePreemption(conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are unnecessary here. No return values are checked, so whether they run or not is irrelevant.

return err
}

func (pc *PartitionContext) addQueueInternal(conf []configs.QueueConfig, parent *objects.Queue, newQueueFn func(configs.QueueConfig, *objects.Queue) (*objects.Queue, error)) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered passing down a boolean flag validate? All these method duplications - it's a bit dubious to me. For example, passing a function pointer to run code whether it's a validation or not just doesn't seem right.

NewConfiguredQueue() definitely has some callers, but it's not the end of the world. I don't know where others stand on this, but I'm in favor of a flag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbacsko Thanks for reviewing! I'm not sure about these method duplications either.

My first thought is like you said that NewConfiguredQueue() has some callers, adding a flag would require all of them to include an additional parameter. But after implementing these method duplications, the code feels redundant, and it turns out that more test cases need to be covered.

I'll try the flag approach. Thanks a lot, my confusion is now cleared!

@kousei47747 kousei47747 marked this pull request as draft January 24, 2025 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants