Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/tikv: fix goroutine leak in gcworker #10622

Merged
merged 2 commits into from
May 29, 2019

Conversation

tiancaiamao
Copy link
Contributor

What problem does this PR solve?

Fix goroutine leak:

goroutine 28938475 [chan send, 485 minutes]:
github.com/pingcap/tidb/store/tikv.(*RangeTaskRunner).RunOnRange(0xc014fa6f40, 0x21eed60, 0xc00b3592c0, 0x3483a58, 0x0, 0x0, 0x3483a58, 0x0, 0x0, 0x0, ...)
/home/jenkins/workspace/build_tidb_release-3.0/go/src/github.com/pingcap/tidb/store/tikv/range_task.go:171 +0xd18
github.com/pingcap/tidb/store/tikv/gcworker.(*GCWorker).resolveLocks(0xc00b356e00, 0x21eed60, 0xc00b3592c0, 0x5abe9147da00000, 0x3, 0xc003fb53c0, 0x90)
/home/jenkins/workspace/build_tidb_release-3.0/go/src/github.com/pingcap/tidb/store/tikv/gcworker/gc_worker.go:691 +0x58d
github.com/pingcap/tidb/store/tikv/gcworker.(*GCWorker).runGCJob(0xc00b356e00, 0x21eed60, 0xc00b3592c0, 0x5abe9147da00000, 0x3)
/home/jenkins/workspace/build_tidb_release-3.0/go/src/github.com/pingcap/tidb/store/tikv/gcworker/gc_worker.go:425 +0xd8
created by github.com/pingcap/tidb/store/tikv/gcworker.(*GCWorker).leaderTick
/home/jenkins/workspace/build_tidb_release-3.0/go/src/github.com/pingcap/tidb/store/tikv/gcworker/gc_worker.go:288 +0xc6b

What is changed and how it works?

In range task runner's RunOnRange function, it create many workers, and dispatch task to the workers through the taskCh.

When the worker meets the first error, it call cancel, which makes all workers exit, but the dispatch loop is still writing to the channel... so RunOnRange blocks on here taskCh <- task forever.

In this commit I make the sender check <-ctx.Done() to handle the error case

Check List

Tests

  • Integration test
    Our schrodinger test platform find this leak and it's easy to reproduce there.

Related changes

  • Need to cherry-pick to the release branch

@tiancaiamao tiancaiamao requested a review from MyonKeminta May 28, 2019 09:56
@tiancaiamao
Copy link
Contributor Author

PTAL @MyonKeminta @disksing

@codecov
Copy link

codecov bot commented May 28, 2019

Codecov Report

Merging #10622 into master will increase coverage by 0.0094%.
The diff coverage is 60%.

@@               Coverage Diff               @@
##             master    #10622        +/-   ##
===============================================
+ Coverage   77.6835%   77.693%   +0.0094%     
===============================================
  Files           413       413                
  Lines         87505     87560        +55     
===============================================
+ Hits          67977     68028        +51     
- Misses        14376     14378         +2     
- Partials       5152      5154         +2

@tiancaiamao
Copy link
Contributor Author

/run-all-tests

@tiancaiamao
Copy link
Contributor Author

PTAL @MyonKeminta @disksing

Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I should add some failpoint test soon..

@MyonKeminta
Copy link
Contributor

@disksing PTAL

select {
case taskCh <- task:
case <-ctx.Done():
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need return err here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it :) LGTM!

@disksing disksing added the status/LGT2 Indicates that a PR has LGTM 2. label May 29, 2019
@disksing disksing merged commit 41f8612 into pingcap:master May 29, 2019
@tiancaiamao tiancaiamao deleted the gcworker-leak branch May 30, 2019 01:26
tiancaiamao added a commit to tiancaiamao/tidb that referenced this pull request Jun 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants