Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race when starting a service while the agent serviceManager is … #12302

Merged
merged 2 commits into from
Feb 10, 2022

Conversation

dhiaayachi
Copy link
Contributor

This fix a data race when we add a service while the service manager is stopping. A waitGroup Add is not safe to be called while waiting on the waitGroup.

WARNING: DATA RACE
Write at 0x00c000b4dd78 by goroutine 22:
  runtime.racewrite()
      <autogenerated>:1 +0x24
  github.com/hashicorp/consul/agent.(*ServiceManager).Stop()
      /home/circleci/project/agent/service_manager.go:54 +0x50
  github.com/hashicorp/consul/agent.(*Agent).ShutdownAgent()
      /home/circleci/project/agent/agent.go:1373 +0x1b8
  github.com/hashicorp/consul/agent.(*TestAgent).Shutdown()
      /home/circleci/project/agent/testagent.go:307 +0xe7
  github.com/hashicorp/consul/connect/proxy.TestAgentConfigWatcherSidecarProxy·dwrap·13()
      /home/circleci/project/connect/proxy/config_test.go:91 +0x39
  github.com/hashicorp/consul/connect/proxy.TestAgentConfigWatcherSidecarProxy()
      /home/circleci/project/connect/proxy/config_test.go:177 +0xed8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47

Previous read at 0x00c000b4dd78 by goroutine 213:
  runtime.raceread()
      <autogenerated>:1 +0x24
  github.com/hashicorp/consul/agent.(*serviceConfigWatch).start()
      /home/circleci/project/agent/service_manager.go:222 +0x3c4
  github.com/hashicorp/consul/agent.(*ServiceManager).AddService()
      /home/circleci/project/agent/service_manager.go:98 +0x379
  github.com/hashicorp/consul/agent.(*Agent).addServiceLocked()
      /home/circleci/project/agent/agent.go:2010 +0xca4
  github.com/hashicorp/consul/agent.(*Agent).AddService()
      /home/circleci/project/agent/agent.go:1997 +0x2e4
  github.com/hashicorp/consul/agent.(*HTTPHandlers).AgentRegisterService()
      /home/circleci/project/agent/agent_endpoint.go:1244 +0x1404
  github.com/hashicorp/consul/agent.(*HTTPHandlers).handler.func3()
      /home/circleci/project/agent/http.go:292 +0x65
  github.com/hashicorp/consul/agent.(*HTTPHandlers).wrap.func1()
      /home/circleci/project/agent/http.go:548 +0x1029
  github.com/hashicorp/consul/agent.(*HTTPHandlers).handler.func1.1()
      /home/circleci/project/agent/http.go:226 +0xcb
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2047 +0x4d
  github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1()
      /home/circleci/go/pkg/mod/github.com/!n!y!times/gziphandler@v1.0.1/gzip.go:287 +0x433
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2047 +0x4d
  net/http.(*ServeMux).ServeHTTP()
      /usr/local/go/src/net/http/server.go:2425 +0xc5
  github.com/hashicorp/go-cleanhttp.PrintablePathCheckHandler.func1()
      /home/circleci/go/pkg/mod/github.com/hashicorp/go-cleanhttp@v0.5.1/handlers.go:42 +0xc1
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2047 +0x4d
  github.com/hashicorp/consul/agent.(*wrappedMux).ServeHTTP()
      /home/circleci/project/agent/http.go:155 +0x69
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2879 +0x89a
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1930 +0x12e4
  net/http.(*Server).Serve·dwrap·87()
      /usr/local/go/src/net/http/server.go:3034 +0x58

Goroutine 22 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1306 +0x726
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1598 +0x99
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1596 +0x7ca
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1504 +0x9d1
  main.main()
      _testmain.go:57 +0x22b

Goroutine 213 (running) created at:
  net/http.(*Server).Serve()
      /usr/local/go/src/net/http/server.go:3034 +0x847
  github.com/hashicorp/consul/agent.newAPIServerHTTP.func1()
      /home/circleci/project/agent/apiserver.go:104 +0x78
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/circleci/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x96
==================

@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 10, 2022 14:23 Inactive
@vercel vercel bot temporarily deployed to Preview – consul February 10, 2022 14:23 Inactive
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! For anyone else wondering why this fixes the problem, I believe it's because the calls to the WaitGroup.Add all come from Agent.AddService which acquires this stateLock. So by moving the lock above serviceManager.Stop (where WaitGroup.Wait is called) we know that calls to the two WaitGroup methods will not race with each other.

https://pkg.go.dev/sync#WaitGroup.Add

@dhiaayachi dhiaayachi merged commit 4f0a71d into main Feb 10, 2022
@dhiaayachi dhiaayachi deleted the service-manager-race branch February 10, 2022 18:30
@hc-github-team-consul-core
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/579623.

@hc-github-team-consul-core
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/579667.

@hc-github-team-consul-core
Copy link
Contributor

🍒✅ Cherry pick of commit 4f0a71d onto release/1.11.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Feb 10, 2022
#12302)

* fix race when starting a service while the agent `serviceManager` is stopping

* add changelog
@hc-github-team-consul-core
Copy link
Contributor

🍒✅ Cherry pick of commit 4f0a71d onto release/1.10.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Feb 10, 2022
#12302)

* fix race when starting a service while the agent `serviceManager` is stopping

* add changelog
@hc-github-team-consul-core
Copy link
Contributor

🍒✅ Cherry pick of commit 4f0a71d onto release/1.9.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Feb 10, 2022
#12302)

* fix race when starting a service while the agent `serviceManager` is stopping

* add changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants