Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: fix the issue that pb memory cannot be released quickly #10815

Merged
merged 8 commits into from
Jun 18, 2019

Conversation

qw4990
Copy link
Contributor

@qw4990 qw4990 commented Jun 14, 2019

What problem does this PR solve?

Fix the issue that pb memory cannot be released quickly.

What is changed and how it works?

In this PR(stmtctx.go:296), we let sc hold detail until the query finish to get more coptask's execution information.

Because detail is nested in copResponse, if we hold its pointer, the Golang GC scanner will regard copResponse as active, and will not release until the query finish.

Then another field pbResp held by copResonse will not be released quickly.

Since pbResp is used to buffer pb data, it results in more memory consumption.

Check List

Tests

@qw4990 qw4990 added type/bugfix This PR fixes a bug. sig/execution SIG execution labels Jun 14, 2019
@qw4990
Copy link
Contributor Author

qw4990 commented Jun 14, 2019

We need more tests for this case.
Do you have any suggestions about how to test this case?
@zz-jason @XuHuaiyu @lysu

@codecov
Copy link

codecov bot commented Jun 14, 2019

Codecov Report

Merging #10815 into master will increase coverage by 0.0074%.
The diff coverage is 53.8461%.

@@               Coverage Diff                @@
##             master     #10815        +/-   ##
================================================
+ Coverage   80.2889%   80.2964%   +0.0074%     
================================================
  Files           417        417                
  Lines         88453      88456         +3     
================================================
+ Hits          71018      71027         +9     
+ Misses        12231      12221        -10     
- Partials       5204       5208         +4

@codecov
Copy link

codecov bot commented Jun 14, 2019

Codecov Report

Merging #10815 into master will increase coverage by 0.0211%.
The diff coverage is 53.8461%.

@@               Coverage Diff               @@
##            master     #10815        +/-   ##
===============================================
+ Coverage   80.829%   80.8502%   +0.0211%     
===============================================
  Files          419        417         -2     
  Lines        88608      88262       -346     
===============================================
- Hits         71621      71360       -261     
+ Misses       11759      11693        -66     
+ Partials      5228       5209        -19

@zz-jason
Copy link
Member

We need more tests for this case.
Do you have any suggestions about how to test this case?
@zz-jason @XuHuaiyu @lysu

Can we check the two numbers of runtime.MemStats.HeapObjects before and after calling runtime.GC()?

@qw4990
Copy link
Contributor Author

qw4990 commented Jun 14, 2019

We need more tests for this case.
Do you have any suggestions about how to test this case?
@zz-jason @XuHuaiyu @lysu

Can we check the two numbers of runtime.MemStats.HeapObjects before and after calling runtime.GC()?

There are many unit tests running parallelly, so they may influence each other to let this unit test unstable.

@crazycs520
Copy link
Contributor

/run-all-tests

@lysu
Copy link
Contributor

lysu commented Jun 14, 2019

emm..parallel running is a problem...

maybe we can make an inuse assert util...

the idea is that after running just fetch a heap profile, using runtime.MemProfile, just stealing code from

https://github.com/golang/go/blob/cfbe3cfbeb6b2de561fc709b8644d4d7a4e182bb/src/runtime/pprof/pprof.go#L533

then just loop []runtime.MemProfileRecord

https://github.com/golang/go/blob/f9a4ae018d99c5afb0e4f128545ff26e01d7b498/src/runtime/pprof/protomem.go#L38

then if we can find any MemProfileRecord that:

  • stack contains current running test method name e.g. TestDistSqlLeak
  • locate line meet some condition?
  • and inuse > 0?

@lysu
Copy link
Contributor

lysu commented Jun 14, 2019

....but this case held memory is alloc in another goroutine...stack isn't start with TestDistSqlLeak 🤣

@qw4990
Copy link
Contributor Author

qw4990 commented Jun 17, 2019

After discussing with @XuHuaiyu and reviewing check.RunAll, we can use SerialSuites to let our UT be run serially.
Using SerialSuites and runtime.GC() can meet the test requirement.

@qw4990 qw4990 force-pushed the fix_mem branch 2 times, most recently from 2837656 to 4b356bf Compare June 17, 2019 13:15
@qw4990
Copy link
Contributor Author

qw4990 commented Jun 17, 2019

/run-all-tests

@qw4990
Copy link
Contributor Author

qw4990 commented Jun 17, 2019

A unit test has been added, PTAL~ @XuHuaiyu @lysu @zz-jason

// prepare data
totalSize := uint64(256 << 20) // 256MB
blockSize := uint64(8 << 10) // 8KB
delta := totalSize / 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an empirical value, used to judge if memory leak happens.
The difference between InUseHeap before and after a query must be less than it.

}
d, err := session.BootstrapSession(s.store)
c.Assert(err, IsNil)
d.SetStatsUpdating(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to perform?

Copy link
Contributor Author

@qw4990 qw4990 Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unnecessary code in testMemoryLeak has been removed now.

se.Close()
runtime.GC()
allocatedFinal, inUseFinal := s.readMem()
c.Assert(allocatedFinal-allocatedAfter, Less, delta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check allocatedFinal-allocatedAfter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure that these executors don't allocate much memory when closing.

runtime.GC()
allocatedAfter, inUseAfter := s.readMem()
c.Assert(allocatedAfter-allocatedBegin, GreaterEqual, totalSize)
c.Assert(s.memDiff(inUseAfter, inUseBegin), Less, delta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check s.memDiff(inUseAfter, inUseBegin) before line 86?

Copy link
Contributor Author

@qw4990 qw4990 Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check ensures that the memory used in this query can be recycled by the GC scanner.
If we put this check before line 86, since we don't know if GC happens when processing this query, the value of s.memDiff(inUseAfter, inUseBegin) would be indeterminate.

@zz-jason zz-jason added the priority/release-blocker This issue blocks a release. Please solve it ASAP. label Jun 18, 2019
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 18, 2019
Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/release-blocker This issue blocks a release. Please solve it ASAP. sig/execution SIG execution status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants