Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Data race in subnet manager's exchange protocol #235

Closed
dnkolegov opened this issue Aug 4, 2022 · 4 comments · Fixed by #234
Closed

Data race in subnet manager's exchange protocol #235

dnkolegov opened this issue Aug 4, 2022 · 4 comments · Fixed by #234
Assignees

Comments

@dnkolegov
Copy link
Collaborator

dnkolegov commented Aug 4, 2022

Describe the Bug

Data race: read at eudico/chain/consensus/hierarchical/subnet/submgr/exchange.go:62ref and write at eudico/chain/consensus/hierarchical/subnet/submgr/exchange.go:69 ref

Logging Information

==================
WARNING: DATA RACE
Read at 0x00c0051b1350 by goroutine 1318:
  runtime.mapaccess1_faststr()
      /opt/homebrew/Cellar/go/1.18.2/libexec/src/runtime/map_faststr.go:13 +0x41c
  github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Subnet).helloBack()
      /Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/exchange.go:62 +0x1c8
  github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Subnet).handleHelloStream()
      /Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/exchange.go:193 +0x634
  github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Subnet).handleHelloStream-fm()
      <autogenerated>:1 +0x48
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/host/basic/basic_host.go:573 +0x80
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler.func1()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/host/basic/basic_host.go:416 +0x70

Previous write at 0x00c0051b1350 by goroutine 1335:
  runtime.mapaccess2_faststr()
      /opt/homebrew/Cellar/go/1.18.2/libexec/src/runtime/map_faststr.go:108 +0x43c
  github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Subnet).helloBack()
      /Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/exchange.go:69 +0x230
  github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Subnet).handleHelloStream()
      /Users/alpha/Projects/eudico/chain/consensus/hierarchical/subnet/submgr/exchange.go:193 +0x634
  github.com/filecoin-project/lotus/chain/consensus/hierarchical/subnet/submgr.(*Subnet).handleHelloStream-fm()
      <autogenerated>:1 +0x48
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/host/basic/basic_host.go:573 +0x80
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler.func1()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/host/basic/basic_host.go:416 +0x70

Goroutine 1318 (running) created at:
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/host/basic/basic_host.go:416 +0x6f4
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler-fm()
      <autogenerated>:1 +0x48
  github.com/libp2p/go-libp2p/p2p/net/mock.(*peernet).handleNewStream.func1()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/net/mock/mock_peernet.go:90 +0x58

Goroutine 1335 (running) created at:
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/host/basic/basic_host.go:416 +0x6f4
  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler-fm()
      <autogenerated>:1 +0x48
  github.com/libp2p/go-libp2p/p2p/net/mock.(*peernet).handleNewStream.func1()
      /Users/alpha/go/pkg/mod/github.com/libp2p/go-libp2p@v0.19.3/p2p/net/mock/mock_peernet.go:90 +0x58
==================

Repo Steps

No response

@dnkolegov
Copy link
Collaborator Author

dnkolegov commented Aug 5, 2022

The bug is reproduced when Mir runs in the rootnet and in a subnet on the same machine with 4 eudico full nodes.
After my investigation, I found two things.

First, is that there is the case when a user can run the HC in the following setting:
/root -> EC
/root/A ->Mir
/root/A/B ->Mir
In this case, peer ID will be the same in both subnets, and a data race can happen.

The second is that I changed the key mutex to the traditional mutex and the bug disappeared. If I change Keymutex to github.com/im7mortal/kmutex then the bug is still reproduced. So despite the fact that keymutex is suitable here, the race detector doesn't know about that semantics and will report a bug.

I would suggest using sync.Map or sync.RWMutex.

P.S. In the Mir reconfiguration draft I have implemented sync.Map.

@adlrocha what do you think?

@dnkolegov
Copy link
Collaborator Author

I am suggesting this fix - bdd2590#diff-100f0122618ed5b5f4e40838698391b594f842011d057c947a39e5cdf1ec92b2

@adlrocha what do you think?

@adlrocha
Copy link
Collaborator

adlrocha commented Aug 25, 2022

Yes, I think this definitely fixes it. I completely missed that race. Good catch, thanks! Feel free to close this issue once the PR is merged.

@adlrocha
Copy link
Collaborator

Fixed in #234

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants