Conversation
2e95b17 to
080783a
Compare
| } | ||
|
|
||
| // Stop gracefully shuts down the WorkloadRule collector | ||
| func (c *WorkloadRuleCollector) Stop() error { |
There was a problem hiding this comment.
⚠️ Bug: Race: concurrent Stop() calls can double-close batchChan
The Stop() method can be called concurrently from the background goroutine (lines 127-136) and from an external caller. The batchChan nil-check-and-close at lines 187-191 is not protected by a mutex, so two goroutines can both see c.batchChan != nil and both attempt close(c.batchChan), causing a panic: close of closed channel.
Additionally, the informer's event handler (line 150) can still be executing c.batchChan <- ... after Stop() closes the channel, causing a panic: send on closed channel. There's no synchronization between the informer shutdown (async via stopCh) and the immediate close(c.batchChan).
Impact: Under graceful shutdown or context cancellation, the collector can panic and crash the process.
Suggested fix: Add a sync.Once to protect the channel close, or use a mutex around both the send and close paths. For example:
type WorkloadRuleCollector struct {
// ... existing fields
stopOnce sync.Once
}
func (c *WorkloadRuleCollector) Stop() error {
c.stopOnce.Do(func() {
close(c.stopCh)
// wait briefly for informer handlers to drain
close(c.batchChan)
})
if c.batcher != nil {
c.batcher.stop()
}
return nil
}Note: The WorkloadRecommendationCollector has the same issue, but that's pre-existing.
Was this helpful? React with 👍 / 👎
🔍 CI failure analysis for 3fddc6c: The build-installer CI check failed because dist/install.yaml was not regenerated after RBAC changes. Running make build-installer locally and committing the result will fix this.IssueThe CI job failed because running Root CauseThe PR adds new RBAC permissions for the WorkloadRule CRD in DetailsThe CI detected this uncommitted line in + - workloadrulesThis corresponds to the new RBAC rule added in Code Review
|
| Auto-apply | Compact |
|
|
Was this helpful? React with 👍 / 👎 | Gitar
| c.logger.Info("Starting resources batcher for WorkloadRules") | ||
| c.batcher.start() | ||
|
|
||
| stopCh := c.stopCh |
There was a problem hiding this comment.
⚠️ Bug: Goroutine can re-enter Stop() causing double-close of batchChan
The background goroutine in Start() (lines 127-136) differs from the template WorkloadRecommendationCollector in a problematic way. When stopCh is closed by an external Stop() call, the goroutine checks ctx.Err() != nil and calls Stop() again (line 133). This creates a guaranteed re-entrant call path:
- External caller invokes
Stop()→ closesstopCh, closesbatchChan, sets it to nil - Goroutine receives on
<-stopCh(line 131) - If context is also cancelled (
ctx.Err() != nil), goroutine callsStop()again (line 133) - Second
Stop()readsc.batchChanas nil (set in step 1), skips the close — this specific path is safe
However, there's a race window: if Stop() is called externally and context cancels simultaneously, both the goroutine's ctx.Done() branch and the external caller run Stop() concurrently. Neither call is protected by a mutex, so both can see batchChan != nil, and both can attempt close(batchChan), causing a panic.
The WorkloadRecommendationCollector avoids this by simply returning when stopCh fires (line 154-155 of that file), without calling Stop() again. The fix is to either:
- Match the template pattern: remove the re-entrant
Stop()call on the<-stopChbranch - Add a
sync.Onceto guard the teardown logic inStop()
Suggested fix:
stopCh := c.stopCh
go func() {
select {
case <-ctx.Done():
_ = c.Stop()
case <-stopCh:
// Channel was closed by Stop() method
}
}()
Was this helpful? React with 👍 / 👎
[Title]
📚 Description of Changes
Provide an overview of your changes and why they’re needed. Link to any related issues (e.g., "Fixes #123"). If your PR fixes a bug, resolves a feature request, or updates documentation, please explain how.
What Changed:
(Describe the modifications, additions, or removals.)
Why This Change:
(Explain the problem this PR addresses or the improvement it provides.)
Affected Components:
(Which component does this change affect? - put x for all components)
Compose
K8s
Other (please specify)
❓ Motivation and Context
Why is this change required? What problem does it solve?
Context:
(Provide background information or link to related discussions/issues.)
Relevant Tasks/Issues:
(e.g., Fixes: #GitHub Issue)
🔍 Types of Changes
Indicate which type of changes your code introduces (check all that apply):
🔬 QA / Verification Steps
Describe the steps a reviewer should take to verify your changes:
make testto verify all tests pass.")make create-kind && make deploy.")✅ Global Checklist
Please check all boxes that apply: