Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chainstore: Don't take heaviestLk with backlogged reorgCh #6526

Merged
merged 1 commit into from
Jun 18, 2021

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Jun 18, 2021

This likely was the cause of many tests hanging on CI for no apparent reason. Not sure if this can happen on real nodes, maybe with A LOT of deals (which use event handlers, which use reorg notifs).

There probably are better (but much more involved) ways to do this with more channels / goroutines, this is the simple fix.

@magik6k magik6k requested a review from arajasek June 18, 2021 17:21
@magik6k
Copy link
Contributor Author

magik6k commented Jun 18, 2021

Example deadlock this fixes:

goroutine 1382 [chan send]:
github.com/filecoin-project/lotus/chain/store.(*ChainStore).takeHeaviestTipSet(0xc000314c60, 0x37f9c80, 0xc00541ecc0, 0xc000be65c0, 0x0, 0x0)
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:620 +0x1f1
github.com/filecoin-project/lotus/chain/store.(*ChainStore).MaybeTakeHeavierTipSet(0xc000314c60, 0x37f9c80, 0xc00541ecc0, 0xc000be65c0, 0x0, 0x0)
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:431 +0x205
github.com/filecoin-project/lotus/chain/store.(*ChainStore).PutTipSet(0xc000314c60, 0x37f9c80, 0xc00541ecc0, 0xc0d1764a00, 0xc001a8d7c0, 0x0)
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:397 +0x31d
github.com/filecoin-project/lotus/chain.(*Syncer).Sync(0xc0004cca50, 0x37f9c80, 0xc00170e000, 0xc0d1764a00, 0x0, 0x0)
	/home/magik6k/github.com/filecoin-project/go-lotus/chain/sync.go:565 +0x425
github.com/filecoin-project/lotus/node/impl/full.(*SyncAPI).SyncSubmitBlock(0xc00030f688, 0x37f9c80, 0xc00170e000, 0xc00501c980, 0x0, 0x0)
	/home/magik6k/github.com/filecoin-project/go-lotus/node/impl/full/sync.go:89 +0x6ca
github.com/filecoin-project/lotus/miner.(*Miner).mine(0xc0008c0aa0, 0x37f9c10, 0xc000058178)
	/home/magik6k/github.com/filecoin-project/go-lotus/miner/miner.go:339 +0xcc2
created by github.com/filecoin-project/lotus/miner.(*Miner).Start
	/home/magik6k/github.com/filecoin-project/go-lotus/miner/miner.go:149 +0xe5

^ Has heaviestLk, block on send cs.reorgCh


goroutine 91 [chan send, 1 minutes]:
github.com/whyrusleeping/pubsub.(*PubSub).Pub(...)
/home/magik6k/.opt/go/pkg/mod/github.com/whyrusleeping/pubsub@v0.0.0-20190708150250-92bcb0691325/pubsub.go:72
github.com/filecoin-project/lotus/chain/store.NewChainStore.func1(0x0, 0x0, 0x0, 0xc002caef98, 0x1, 0x1, 0x0, 0x0)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:200 +0x31c
github.com/filecoin-project/lotus/chain/store.(*ChainStore).reorgWorker.func1(0xc000314c60, 0xc00088b3c8, 0xc000894a80, 0x37f9bd8, 0xc00096e780)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:573 +0x688
created by github.com/filecoin-project/lotus/chain/store.(*ChainStore).reorgWorker
/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:538 +0x118

^ blocks cs.reorgCh, blocked on bestTips.Pub

goroutine 1323 [semacquire, 1 minutes]:
sync.runtime_SemacquireMutex(0xc00030b47c, 0xc001315d00, 0x1)
/usr/lib/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00030b478)
/usr/lib/go/src/sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
/usr/lib/go/src/sync/mutex.go:81
github.com/filecoin-project/lotus/chain/events.(*hcEvents).processHeadChangeEvent(0xc00030b440, 0x0, 0x0, 0x0, 0xc002caef00, 0x1, 0x1, 0x0, 0x0)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/events/events_called.go:120 +0x45d
github.com/filecoin-project/lotus/chain/events.(*Events).headChange(0xc000e8b7c0, 0x37f9bd8, 0xc001a8c480, 0x0, 0x0, 0x0, 0xc002caef00, 0x1, 0x1, 0x0, ...)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/events/events.go:201 +0x22d
github.com/filecoin-project/lotus/chain/events.(*Events).listenHeadChangesOnce(0xc000e8b7c0, 0x37f9bd8, 0xc001a8c480, 0x0, 0x0)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/events/events.go:173 +0x73b
github.com/filecoin-project/lotus/chain/events.(*Events).listenHeadChanges(0xc000e8b7c0, 0x37f9bd8, 0xc001a8c440)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/events/events.go:103 +0xc5
created by github.com/filecoin-project/lotus/chain/events.NewEventsWithConfidence
/home/magik6k/github.com/filecoin-project/go-lotus/chain/events/events.go:85 +0x5a5

^ blocked on events.lk.Lock(), blocks bestTips.Pub

goroutine 29756 [semacquire]:
sync.runtime_SemacquireMutex(0xc000314cac, 0x0, 0x0)
/usr/lib/go/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).RLock(...)
/usr/lib/go/src/sync/rwmutex.go:63
github.com/filecoin-project/lotus/chain/store.(*ChainStore).GetHeaviestTipSet(0xc000314c60, 0x0)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:902 +0x92
github.com/filecoin-project/lotus/chain/store.(*ChainStore).GetTipSetFromKey(0xc000314c60, 0x0, 0x0, 0x15632bd, 0xc000314c60, 0xc0041ac8d0)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/store/store.go:1787 +0x39
github.com/filecoin-project/lotus/node/impl/full.(*StateModule).StateSearchMsg(0xc000d184a0, 0x37f9c10, 0xc000058178, 0x0, 0x0, 0xc019b512f0, 0x26, 0xffffffffffffffff, 0x1, 0xc000e244f0, ...)
/home/magik6k/github.com/filecoin-project/go-lotus/node/impl/full/state.go:561 +0x5e
github.com/filecoin-project/lotus/extern/storage-sealing.(*CurrentDealInfoAPIAdapter).StateSearchMsg(0xc000a3d790, 0x37f9c10, 0xc000058178, 0xc019b512f0, 0x26, 0x400, 0x7fa485af0c00, 0x20300000000000)
/home/magik6k/github.com/filecoin-project/go-lotus/extern/storage-sealing/currentdealinfo.go:189 +0xa2
github.com/filecoin-project/lotus/extern/storage-sealing.(*CurrentDealInfoManager).dealIDFromPublishDealsMsg(0xc000a3d7a0, 0x37f9c10, 0xc000058178, 0xc04d1900c0, 0x26, 0x30, 0xc154204480, 0xc019b512f0, 0x26, 0xd, ...)
/home/magik6k/github.com/filecoin-project/go-lotus/extern/storage-sealing/currentdealinfo.go:67 +0x7f
github.com/filecoin-project/lotus/extern/storage-sealing.(*CurrentDealInfoManager).GetCurrentDealInfo(0xc000a3d7a0, 0x37f9c10, 0xc000058178, 0xc04d1900c0, 0x26, 0x30, 0xc154204480, 0xc019b512f0, 0x26, 0x0, ...)
/home/magik6k/github.com/filecoin-project/go-lotus/extern/storage-sealing/currentdealinfo.go:41 +0xca
github.com/filecoin-project/lotus/markets/storageadapter.(*SectorCommittedManager).checkIfDealAlreadyActive(0xc001143dd0, 0x37f9c10, 0xc000058178, 0xc0a12fa280, 0xc154204480, 0xc019b512f0, 0x26, 0x0, 0x0, 0x0, ...)
/home/magik6k/github.com/filecoin-project/go-lotus/markets/storageadapter/ondealsectorcommitted.go:335 +0x126
github.com/filecoin-project/lotus/markets/storageadapter.(*SectorCommittedManager).OnDealSectorCommitted.func2(0xc0a12fa280, 0xc0a12fa280, 0x0, 0x0)
/home/magik6k/github.com/filecoin-project/go-lotus/markets/storageadapter/ondealsectorcommitted.go:186 +0x7f
github.com/filecoin-project/lotus/chain/events.(*hcEvents).onHeadChanged(0xc00030b440, 0xc19b0799c0, 0xc06dd0b1d0, 0x35b9918, 0x6, 0x2001, 0x0, 0x0, 0x0)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/events/events_called.go:318 +0x18b
github.com/filecoin-project/lotus/chain/events.(*messageEvents).Called(0xc00030b4a8, 0xc19b0799c0, 0xc19b079a00, 0x35b9918, 0x6, 0x2001, 0xc016fe11c0, 0x0, 0x0)
/home/magik6k/github.com/filecoin-project/go-lotus/chain/events/events_called.go:602 +0xc9
github.com/filecoin-project/lotus/markets/storageadapter.(*SectorCommittedManager).OnDealSectorCommitted(0xc001143dd0, 0x37f9c10, 0xc000058178, 0xc19a9dd236, 0x3, 0xa, 0xc019b511d0, 0x27, 0x200, 0x0, ...)
/home/magik6k/github.com/filecoin-project/go-lotus/markets/storageadapter/ondealsectorcommitted.go:260 +0x2de
github.com/filecoin-project/lotus/markets/storageadapter.(*ProviderNodeAdapter).OnDealSectorCommitted(0xc001ebf380, 0x37f9c10, 0xc000058178, 0xc19a9dd236, 0x3, 0x1e, 0xa, 0xc019b511d0, 0x27, 0x200, ...)
/home/magik6k/github.com/filecoin-project/go-lotus/markets/storageadapter/provider.go:281 +0xe5
github.com/filecoin-project/go-fil-markets/storagemarket/impl/providerstates.VerifyDealActivated(0x7fa4340b7008, 0xc16a4cef30, 0x7fa47c3e8678, 0xc000c9c4f8, 0xc019b511d0, 0x27, 0x200, 0x0, 0xc199d7eec0, 0x31, ...)
/home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-fil-markets@v1.4.0/storagemarket/impl/providerstates/provider_states.go:475 +0x16d
reflect.Value.call(0x2f6b0e0, 0x35b8738, 0x13, 0x33041ea, 0x4, 0xc000e25728, 0x3, 0x3, 0x560a3f, 0x3150c00, ...)
/usr/lib/go/src/reflect/value.go:476 +0x8e7
reflect.Value.Call(0x2f6b0e0, 0x35b8738, 0x13, 0xc019e3e728, 0x3, 0x3, 0x2fa3f00, 0xc089136000, 0xc000058178)
/usr/lib/go/src/reflect/value.go:337 +0xb9
github.com/filecoin-project/go-statemachine/fsm.fsmHandler.handler.func1(0xc05da0c300, 0x2, 0x2, 0xc05da0c300, 0x22bbbbab593584fe, 0x1)
/home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/fsm/fsm.go:170 +0x437
reflect.Value.call(0xc004fa9140, 0xc05da0c270, 0x13, 0x33041ea, 0x4, 0xc019e3ef70, 0x2, 0x2, 0x560a3f, 0x3061560, ...)
/usr/lib/go/src/reflect/value.go:476 +0x8e7
reflect.Value.Call(0xc004fa9140, 0xc05da0c270, 0x13, 0xc019d18770, 0x2, 0x2, 0xc19b079900, 0x4d8c8c0, 0x308f440)
/usr/lib/go/src/reflect/value.go:337 +0xb9
github.com/filecoin-project/go-statemachine.(*StateMachine).run.func3(0xc02c205a60, 0x37f9c10, 0xc000058178, 0xc02c205ae0, 0xc02c205a70, 0xc005d06300)
/home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/machine.go:102 +0x2c9
created by github.com/filecoin-project/go-statemachine.(*StateMachine).run
/home/magik6k/.opt/go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/machine.go:100 +0x372

^ holds events.lk.Lock(), blocked on heaviestLk

@BigLep BigLep linked an issue Jun 18, 2021 that may be closed by this pull request
@magik6k magik6k merged commit e7fa858 into release/v1.10.0 Jun 18, 2021
@magik6k magik6k deleted the fix/reorgch-deadlock branch June 18, 2021 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2E Testing for FIP8 & FIP13
2 participants