Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-3340 Add a test for goroutine leaks. #1874

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

matthewdale
Copy link
Collaborator

@matthewdale matthewdale commented Oct 23, 2024

GODRIVER-3340

Summary

  • Add a new submodule internal/test/goleak that tests for goroutine leaks.
  • Add a new build variant "Goroutine Leaks Test" that runs the new test.

Background & Motivation

Our usage of the zstd.Decoder pooled with a sync.Pool created the possibility of a gorutine leak when using the "zstd" compressor. We didn't catch it because we currently don't have any way to detect goroutine leaks. Add a test that uses github.com/uber-go/goleak to check for goroutine leaks.

@mongodb-drivers-pr-bot mongodb-drivers-pr-bot bot added dependencies Pull requests that update a dependency file priority-3-low Low Priority PR for Review labels Oct 23, 2024
Copy link
Contributor

mongodb-drivers-pr-bot bot commented Oct 23, 2024

API Change Report

No changes found!

@matthewdale matthewdale force-pushed the experiment-detect-goroutines branch 5 times, most recently from c8d11ec to e81225e Compare October 25, 2024 18:52
@matthewdale matthewdale marked this pull request as ready for review October 25, 2024 18:56
@matthewdale matthewdale force-pushed the experiment-detect-goroutines branch from e81225e to e339231 Compare October 25, 2024 19:07
@blink1073 blink1073 requested review from prestonvasquez and removed request for blink1073 October 28, 2024 09:55
}{
{
desc: "base",
opts: options.Client().ApplyURI(uri),
Copy link
Collaborator

@prestonvasquez prestonvasquez Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyURI is used in every test and if it's empty will default to localhost:27017. Suggest applying this value in the test runner:

opts := tc.opts
if u := os.Getenv("MONGODB_URI"); u != "" {
	opts.ApplyURI(u)
}

client, err := mongo.Connect(context.Background(), opts)
require.NoError(t, err)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with that approach is that ApplyURI will overwrite the explicitly configured values if it's called after. However, I believe I can get the same effect using something like

base := options.Client()
if u := os.Getenv("MONGODB_URI"); u != "" {
	base.ApplyURI(u)
}
client, err := mongo.Connect(context.Background(), base, t.opts)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

// These can't be run in parallel because goleak currently can't filter
// out goroutines from other parallel subtests.
t.Run(tc.desc, func(t *testing.T) {
defer goleak.VerifyNone(t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uber-go/goleak documentation implies that we don't have to make this check both in the subtest and at the TestMain() level: https://github.com/uber-go/goleak#quick-start . If that's true, suggest removing TestMain().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I added it to catch goroutine leaks that I didn't know I was looking for, but it's never caught a goroutine leak that goleak.VerifyNone didn't catch, so I'm fine removing it.

We should definitely prioritize using goleak.VerifyNone in specific test cases because it both indicates which scenario(s) caused a goroutine leak and the output shows up clearly in the Evergreen test logs. goleak.VerifyTestMain does neither.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed TestMain.

_, err = coll.InsertOne(context.Background(), bson.M{"x": 123})
require.NoError(t, err)

for i := 0; i < 20; i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of 20? Should we constantize this value to give it context?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 is "enough to cause a noticeable problem", so it doesn't have any special significance. However, note that we've only detected one type of goroutine leak with this test so far, so it's possible that we'll need to change the test scenario, duration, etc to catch future goroutine leaks. I'll leave a comment about the significance and limitations of the overall test loop related to detecting goroutine leaks. Ideally, we should actually be testing for goroutine leaks in our existing test suites, but that has proved somewhat difficult to implement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing the sleep, increased the iterations to 50, moved the InsertOne into the loop, added a ChangeStream simulated workload, and added some operations that intentionally time out. I added a comment describing the intent of the different operations.

var res bson.D
err = coll.FindOne(context.Background(), bson.D{}).Decode(&res)
require.NoError(t, err)
time.Sleep(50 * time.Millisecond)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment to describe what we are awaiting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, adding the sleep increases the number of goroutines that leak when using "zstd" compression. I will add a comment describing the current purpose for sleeping there.

@matthewdale matthewdale merged commit c3448e6 into mongodb:v1 Nov 12, 2024
34 checks passed
@matthewdale
Copy link
Collaborator Author

drivers-pr-bot please backport to master

Copy link
Contributor

Sorry, unable to cherry-pick to master, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout -b cherry-pick-master-c3448e685952c728a60ac002bd8c3df36c8b899b master

git fetch origin c3448e685952c728a60ac002bd8c3df36c8b899b
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 c3448e685952c728a60ac002bd8c3df36c8b899b
  1. You will likely have some merge/cherry-pick conflicts here, fix them and commit:
git commit -am {message}
  1. Push to a named branch:
git push origin cherry-pick-master-c3448e685952c728a60ac002bd8c3df36c8b899b
  1. Create a PR against branch master. I would have named this PR:

"GODRIVER-3340 Add a test for goroutine leaks. (#1874) [master]"

matthewdale added a commit to matthewdale/mongo-go-driver that referenced this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file priority-3-low Low Priority PR for Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants