-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing bug in chuck cache that can cause panic during shutdown #4398
fixing bug in chuck cache that can cause panic during shutdown #4398
Conversation
Signed-off-by: Roger Steneteg <rsteneteg@ea.com>
f546446
to
2e6b9a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Are there any similar issues in other places where we do bounded parallelism?
One question before I approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
1a722ab
to
46a1213
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of thoughts
pkg/chunk/chunk_store_utils.go
Outdated
select { | ||
case <-c.quit: | ||
return | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen: quit
is not closed, so we go to default
and block on send.
Then quit
is closed and all workers exit.
Now we are hung.
Should the chan send be brought up to the select
, so either can proceed each time round the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit unsure about if we could use both send and receive cases in the same select, but seems to be OK since the channels are read/closed from other goroutines, so I moved it up to a case.
46a1213
to
8b96709
Compare
…on closed channel during stop Signed-off-by: Roger Steneteg <rsteneteg@ea.com>
8b96709
to
5a86c1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -239,11 +257,15 @@ func (c *Memcached) Store(ctx context.Context, keys []string, bufs [][]byte) { | |||
|
|||
// Stop does nothing. | |||
func (c *Memcached) Stop() { | |||
if c.inputCh == nil { | |||
if c.quit == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] c.quit
is set in the new function and I can't see where we ever set it to nil
so I'm not sure we need this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, we can merge anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If batchsize or parallelism is off then we return the cache before we set the quit/inputCh channels
if cfg.BatchSize == 0 || cfg.Parallelism == 0 { return c }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks! Didn't notice it.
I encountered a race warning at #4508, which seems to be a similar case; could you take a look at #4511 please @rsteneteg ? |
…xproject#4398) * fixing bug in chuck cache that can cause panic during shutdown Signed-off-by: Roger Steneteg <rsteneteg@ea.com> * adding separate quit channel for stopping chunkfetcher to avoid send on closed channel during stop Signed-off-by: Roger Steneteg <rsteneteg@ea.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com>
Signed-off-by: Roger Steneteg rsteneteg@ea.com
What this PR does:
Fixes a bug in the chunk storage cache that can cause a panic in ingesters during shutdown
Which issue(s) this PR fixes:
Fixes #4397
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]