Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lotus sync stucked with error "reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets" #7803

Open
8 of 18 tasks
ysh0566 opened this issue Dec 16, 2021 · 26 comments · Fixed by #8635
Labels
area/chain Area: Chain kind/bug Kind: Bug need/team-input Hint: Needs Team Input

Comments

@ysh0566
Copy link

ysh0566 commented Dec 16, 2021

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Daemon:  1.13.2+mainnet+git.e05fdf892.dirty+api1.4.0
Local: lotus version 1.13.2+mainnet+git.e05fdf892.dirty

Describe the Bug

Client subscribe mpool with MpoolSub and client's network stucked or changed, lotus may cannot publish message to client and chain sync stucked.

Logging Information

2021-12-16T01:55:02.655+0800    ERROR   chainstore      store/store.go:409      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2021-12-16T01:55:03.135+0800    ERROR   chainstore      store/store.go:409      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2021-12-16T01:55:03.139+0800    ERROR   chainstore      store/store.go:409      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2021-12-16T01:55:03.140+0800    ERROR   chainstore      store/store.go:409      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2021-12-16T01:55:03.150+0800    ERROR   chainstore      store/store.go:409      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2021-12-16T01:55:03.155+0800    ERROR   chainstore      store/store.go:409      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets

Repo Steps

  1. Run code on debug mode.
func main() {
	ctx := context.Background()
	//token := ""
	url := "ws://127.0.0.1:1234/rpc/v0"
	headers := http.Header{}

	var nodeStruct v0api.FullNodeStruct
	closer, err := jsonrpc.NewMergeClient(context.Background(), url, "Filecoin", []interface{}{&nodeStruct.Internal, &nodeStruct.CommonStruct.Internal}, headers)
	if err != nil {
		panic(err)
	}
	defer closer()
	msg, err := nodeStruct.MpoolSub(ctx)
	if err != nil {
		panic(err)
	}
	for ss := range msg {
		fmt.Printf("%+v\n", ss) // DEBUG ON THIS LINE
	}
}
  1. wait for a while, lotus chain sync stucked.
  2. See error 'reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets'
@magik6k
Copy link
Contributor

magik6k commented Dec 16, 2021

Can you get a goroutine dump when that happens - lotus pprof goroutines > goroutines.txt?

@ysh0566
Copy link
Author

ysh0566 commented Dec 16, 2021

Can you get a goroutine dump when that happens - lotus pprof goroutines > goroutines.txt?
goroutines.txt

@Anonyma
Copy link

Anonyma commented Mar 11, 2022

We have come across this issue too on v1.14.1

@TippyFlitsUK TippyFlitsUK added need/team-input Hint: Needs Team Input area/chain Area: Chain and removed need/triage labels Mar 24, 2022
@kevinlv1024
Copy link

I have same error
goroutines.txt
.

@mur-me
Copy link

mur-me commented Apr 7, 2022

Hey, I've gotten this in 1.15.0 version.
Unfortunately, I've restarted lotus and haven't saved goroutines dump.
I haven't found anything special in logs, just a bunch of:
2022-04-07T09:38:45.634Z ERROR chainstore store/store.go:409 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets

@Anonyma
Copy link

Anonyma commented May 4, 2022

Coming back to report that it is still happening on v1.15.1, and actually much more frequently than before.

@raynaudoe
Copy link

raynaudoe commented May 7, 2022

Also reporting that this issue is still present on v1.15.2. Nodes are getting out of sync very frequently.
Another detail is that if in that state, if I ask for the sync status
# lotus sync status

(extract)

sync status:
worker 3113:
        Base:   [bafy2bzacecvejocogarpwlgf22q3snegtgb3sbqq7jh6lmevghbikcbt6ltk6 bafy2bzacecsgvdclkzeamkztb7npimwooklyontm6z5u4xlkmz2w4pllqbuya bafy2bzacediiwh2bh34im56g2dhztd6leoqwo62o3t4xveqybaebzndrm3oco]
        Target: [bafy2bzaceb6urpbv6dcvtnapuqgob2ifvu6sc3btozuotk6wtkxkazkkoq3h2] (1785463)
        Height diff:    1
        Stage: complete
        Height: 1785463
        Elapsed: 4.239672621s

Which all workers are shown in the complete stage, making it hard to identify, through the rpc, if the node is fully synced.

@RobQuistNL
Copy link
Contributor

I just ha this happen after a restart of my daemon

@jleni
Copy link

jleni commented May 10, 2022

we have noticed this much more often in the latest versions

@RobQuistNL
Copy link
Contributor

After restarting it again the daemon continued. I don't think I've seen it happen while running yet.

@kakaly
Copy link

kakaly commented May 10, 2022

+1

Stebalien added a commit that referenced this issue May 10, 2022
When unsubscribing all topics, pubsub will drain the subscription
channel to avoid deadlocks. See
whyrusleeping/pubsub#1.

fixes #7803
Stebalien added a commit that referenced this issue May 10, 2022
When unsubscribing _all_ topics, pubsub will drain the subscription
channel to avoid deadlocks. See
whyrusleeping/pubsub#1.

fixes #7803
arajasek pushed a commit that referenced this issue May 10, 2022
When unsubscribing _all_ topics, pubsub will drain the subscription
channel to avoid deadlocks. See
whyrusleeping/pubsub#1.

fixes #7803
@arajasek
Copy link
Contributor

arajasek commented May 11, 2022

While we have high confidence that this issue was fixed by #8635, I'm going to leave it open for any more reports. We can close it when a few people who were experiencing the issue confirm the fix worked.

In the meanwhile, if anyone else experiences this, please share your daemon logs here.

@rjan90
Copy link
Contributor

rjan90 commented Jun 17, 2022

Closing this now as there has been no reports of reorg channel is heavily backlogged by node operators on v1.15.3. Node-operators has also seen way less long block duration times after #8635 landed, ref: #8565.

If someone still experiences this on Lotus v1.15.3 or higher, do not hesitate to reopen this, or open a new issue!

@rjan90 rjan90 closed this as completed Jun 17, 2022
@beck-8
Copy link
Contributor

beck-8 commented Jun 30, 2022

Closing this now as there has been no reports of reorg channel is heavily backlogged by node operators on v1.15.3. Node-operators has also seen way less long block duration times after #8635 landed, ref: #8565.

If someone still experiences this on Lotus v1.15.3 or higher, do not hesitate to reopen this, or open a new issue!

v1.16.0
reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets

@sunjiebin
Copy link

Closing this now as there has been no reports of reorg channel is heavily backlogged by node operators on v1.15.3. Node-operators has also seen way less long block duration times after #8635 landed, ref: #8565.
If someone still experiences this on Lotus v1.15.3 or higher, do not hesitate to reopen this, or open a new issue!

v1.16.0 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets

+1 , v1.16.0 also has this error, But it disappeared after a few minutes

@marshyonline
Copy link

+1 on v1.16.0 - I have not seen this issue on any other version until now
goroutines.txt

@rjan90 rjan90 reopened this Jul 5, 2022
@dongjiahong
Copy link

+1 v1.16.0

@kevinlv1024
Copy link

2022-07-06T14:04:03.764+0800 ERROR chainstore store/store.go:409 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets

+1 v1.16.0

@newwebgroup
Copy link

image

+1 v1.16.0

goroutines (1).txt

@ArseniiPetrovich
Copy link
Contributor

+1 v1.16.0
Is there any workaround? What should we do to fix this issue?

@magik6k
Copy link
Contributor

magik6k commented Oct 13, 2022

People seeing this since v1.16.0 - are you running PoSt workers?

@magik6k
Copy link
Contributor

magik6k commented Oct 14, 2022

This is relevant because it would appear that post-worker alive check will just spam heartbeats in a loop with sleep:

https://github.com/filecoin-project/lotus/blob/master/storage/sealer/sched_post.go#L171-L197

Which in some cases, with multiple PoSt workers, because worker communication is over simple http RPCs, may exhaust all default http client connections..

If this is in fact the bug I'll have a fix for it tomorrow

@RobQuistNL
Copy link
Contributor

Yes. I haven't seen it in a while, but I've mainly stopped the post worker because of other issues (#8362)

@mostcute
Copy link

V1.23.0

023-04-26T23:12:12.159+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:12.165+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:12.219+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:12.640+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:12.650+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:12.659+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:12.666+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:12.719+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.140+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.151+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets                            
2023-04-26T23:12:13.160+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.166+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.220+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.641+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.652+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.660+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.666+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:13.720+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.141+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.152+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.160+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.167+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.221+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.642+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets                                                                                            l2cghz3le76lt2bwl6tdftedf2 bafy2bzacecth63frrwmp4oef5jhr52vlzgfyl7j6hglhkvf5km4de4tzdt
2023-04-26T23:12:14.652+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.661+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.667+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:14.721+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.142+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.153+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.161+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.167+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets                                                                                            c2evshvckv27rzsiuh7rkr6nzy bafy2bzaced7nwp77h6o3pdkmicdb4d2idb3l2cghz3le76lt2bwl6tdfte
2023-04-26T23:12:15.222+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.643+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.653+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.662+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.668+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets                                                                                            xc2evshvckv27rzsiuh7rkr6nzy bafy2bzaced7nwp77h6o3pdkmicdb4d2idb3l2cghz3le76lt2bwl6tdft
2023-04-26T23:12:15.722+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:15.901+0800    WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNTSFywHjGbmGN1aEqJNp54pDKaiwqpthRrvFZETo37pW", "err": "rate limit exceeded"}                                                                c2evshvckv27rzsiuh7rkr6nzy bafy2bzaced7nwp77h6o3pdkmicdb4d2idb3l2cghz3le76lt2bwl6tdfte
2023-04-26T23:12:16.143+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:16.154+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:16.163+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:16.168+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:16.222+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:16.644+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:16.654+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets
2023-04-26T23:12:16.663+0800    ERROR   chainstore      store/store.go:419      reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets

@jleni
Copy link

jleni commented Aug 1, 2023

Happening again in 1.23.2

@BBQFIL
Copy link

BBQFIL commented Dec 5, 2023

1.24 again

2023-12-05T14:39:10.366+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:10.366+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:10.434+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:10.867+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:10.867+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:10.867+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:10.867+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:10.935+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:11.368+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:11.368+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:11.368+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:11.368+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets 2023-12-05T14:39:11.435+0800 ERROR chainstore store/store.go:419 reorg channel is heavily backlogged, waiting a bit before trying to take process new tipsets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chain Area: Chain kind/bug Kind: Bug need/team-input Hint: Needs Team Input
Projects
None yet
Development

Successfully merging a pull request may close this issue.