[YUNIKORN-2907] Queue config processing log spew #1009

kousei47747 · 2024-12-26T09:53:15Z

What is this PR for?

Logs the shadow queue structure created during configuration update at debug level with a different message to prevent log spew and avoid misleading information.

What type of PR is it?

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2907

codecov · 2024-12-28T13:12:33Z

Codecov Report

Attention: Patch coverage is 40.35088% with 34 lines in your changes missing coverage. Please review.

Project coverage is 82.20%. Comparing base (a351764) to head (7476680).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/scheduler/partition.go	31.11%	31 Missing ⚠️
pkg/scheduler/objects/queue.go	81.81%	2 Missing ⚠️
pkg/scheduler/context.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1009      +/-   ##
==========================================
- Coverage   82.34%   82.20%   -0.14%     
==========================================
  Files          97       97              
  Lines       15627    15674      +47     
==========================================
+ Hits        12868    12885      +17     
- Misses       2479     2510      +31     
+ Partials      280      279       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

chia7712

@kousei47747 thanks for this patch. There is one small question remaining. Please take a look.

pkg/scheduler/partition.go

blueBlue0102 · 2024-12-29T06:40:26Z

pkg/scheduler/objects/queue.go

+	return queue, err
+}
+
+// NewConfiguredShadowQueue creates a new queue from scratch based on the configuration and logs at debug level


Hi @kousei47747, thank you for your contribution!

Could you provide more details about the differences between NewConfiguredQueue and NewConfiguredShadowQueue?
Currently, it appears the only difference is the log level, which may be confusing for those unfamiliar with shadow queues.

It would be helpful to explain when developers should use NewConfiguredQueue versus NewConfiguredShadowQueue.

@blueBlue0102 thanks for reviewing! I will provide more detailed description in the new patch.

wilfred-s · 2024-12-31T03:37:31Z

I dropped a large comment into the jira to provide some guidance and point out another issue.

kousei47747 · 2024-12-31T04:56:08Z

@wilfred-s thanks for reviewing! I've replied on jira, please take a look.

kousei47747 · 2024-12-31T17:51:14Z

I’ve updated the PR based on the guidance and noticed that NewPlacementManager also logs. Should it be silenced too?

pbacsko

See comments. I'm not sure about the current approach. But if you keep it, please make sure it's tested properly. PartitionContext does not have the best unit test coverage (at least direct coverage), but new code should definitely be covered.

pbacsko · 2025-01-23T12:26:51Z

pkg/scheduler/partition.go

@@ -79,7 +79,29 @@ type PartitionContext struct {
 	locking.RWMutex
 }

+// newPartitionContextForValidation initializes a shadow partition based on the configuration.
+// The shadow partition is used to validate the configuration, it is not used for scheduling.
+func newPartitionContextForValidation(conf configs.PartitionConfig, rmID string, cc *ClusterContext) (*PartitionContext, error) {


The return value *PartitionContext is never used from this function, so you might as well just remove it and just call it validateConfiguration(). Unless of course, you create unit tests which actually use it for some kind of verification... See other comments.

pbacsko · 2025-01-23T12:51:52Z

pkg/scheduler/partition.go

+	// We need to pass in the locked version of the GetQueue function.
+	// Placing an application will not have a lock on the partition context.
+	pc.placementManager = placement.NewPlacementManager(conf.PlacementRules, pc.GetQueue)
+	// get the user group cache for the partition
+	pc.userGroupCache = security.GetUserGroupCache("")
+	pc.updateNodeSortingPolicyForValidation(conf)
+	pc.updatePreemption(conf)


These lines are unnecessary here. No return values are checked, so whether they run or not is irrelevant.

pbacsko · 2025-01-23T12:57:04Z

pkg/scheduler/partition.go

+	return err
+}
+
+func (pc *PartitionContext) addQueueInternal(conf []configs.QueueConfig, parent *objects.Queue, newQueueFn func(configs.QueueConfig, *objects.Queue) (*objects.Queue, error)) error {


Have you considered passing down a boolean flag validate? All these method duplications - it's a bit dubious to me. For example, passing a function pointer to run code whether it's a validation or not just doesn't seem right.

NewConfiguredQueue() definitely has some callers, but it's not the end of the world. I don't know where others stand on this, but I'm in favor of a flag.

@pbacsko Thanks for reviewing! I'm not sure about these method duplications either.

My first thought is like you said that NewConfiguredQueue() has some callers, adding a flag would require all of them to include an additional parameter. But after implementing these method duplications, the code feels redundant, and it turns out that more test cases need to be covered.

I'll try the flag approach. Thanks a lot, my confusion is now cleared!

[YUNIKORN-2907] Queue config processing log spew

64eed25

chenyulin0719 assigned kousei47747 Dec 28, 2024

chenyulin0719 self-requested a review December 28, 2024 13:11

chenyulin0719 requested a review from wilfred-s December 28, 2024 13:36

chia7712 reviewed Dec 28, 2024

View reviewed changes

pkg/scheduler/partition.go Outdated Show resolved Hide resolved

blueBlue0102 reviewed Dec 29, 2024

View reviewed changes

kousei47747 marked this pull request as draft December 29, 2024 08:27

Michael added 2 commits January 1, 2025 01:30

[YUNIKORN-2907] fix based on reviews

3f85580

[YUNIKORN-2997] remove redundant comment

7476680

kousei47747 marked this pull request as ready for review January 2, 2025 07:24

pbacsko requested changes Jan 23, 2025

View reviewed changes

kousei47747 marked this pull request as draft January 24, 2025 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YUNIKORN-2907] Queue config processing log spew #1009

[YUNIKORN-2907] Queue config processing log spew #1009

kousei47747 commented Dec 26, 2024

codecov bot commented Dec 28, 2024 •

edited

Loading

chia7712 left a comment

blueBlue0102 Dec 29, 2024

kousei47747 Dec 31, 2024

wilfred-s commented Dec 31, 2024

kousei47747 commented Dec 31, 2024

kousei47747 commented Dec 31, 2024

pbacsko left a comment

pbacsko Jan 23, 2025

pbacsko Jan 23, 2025

pbacsko Jan 23, 2025

kousei47747 Jan 24, 2025

[YUNIKORN-2907] Queue config processing log spew #1009

Are you sure you want to change the base?

[YUNIKORN-2907] Queue config processing log spew #1009

Conversation

kousei47747 commented Dec 26, 2024

What is this PR for?

What type of PR is it?

What is the Jira issue?

codecov bot commented Dec 28, 2024 • edited Loading

Codecov Report

chia7712 left a comment

Choose a reason for hiding this comment

blueBlue0102 Dec 29, 2024

Choose a reason for hiding this comment

kousei47747 Dec 31, 2024

Choose a reason for hiding this comment

wilfred-s commented Dec 31, 2024

kousei47747 commented Dec 31, 2024

kousei47747 commented Dec 31, 2024

pbacsko left a comment

Choose a reason for hiding this comment

pbacsko Jan 23, 2025

Choose a reason for hiding this comment

pbacsko Jan 23, 2025

Choose a reason for hiding this comment

pbacsko Jan 23, 2025

Choose a reason for hiding this comment

kousei47747 Jan 24, 2025

Choose a reason for hiding this comment

codecov bot commented Dec 28, 2024 •

edited

Loading