Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop syncing after enabling splitstore in version 1.23.0 #10760

Closed
5 of 11 tasks
he426100 opened this issue Apr 26, 2023 · 16 comments
Closed
5 of 11 tasks

Stop syncing after enabling splitstore in version 1.23.0 #10760

he426100 opened this issue Apr 26, 2023 · 16 comments
Labels
area/chain Area: Chain kind/enhancement Kind: Enhancement kind/stale need/analysis Hint: Needs Analysis P3 P3: Might get resolved

Comments

@he426100
Copy link

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus fvm/fevm - Lotus FVM and FEVM interactions
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt/WinningPoSt)
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Daemon:  1.23.0+mainnet+git.96c0587+api1.5.0
Local: lotus version 1.23.0+mainnet+git.96c0587

Daemon:  1.23.0+mainnet+git.0c0fecc+api1.5.0
Local: lotus-miner version 1.23.0+mainnet+git.0c0fecc

Repro Steps

  1. Run '...'
  2. Do '...'
  3. See error '...'
    ...

Describe the Bug

I enabled splitstore in version 1.20.1 but didn't observe any effects. The chain data exceeded 2T but didn't trigger pruning (default configuration), which is completely different from what I expected - automatic maintenance within a certain size range。

After upgrading to version 1.23.0, I ran into problems with Lotus synchronization. The sync diff has been getting larger and larger, and the log is filled with "sub/incoming.go:588 ignoring indexer message {"sender": "xxx", "err": "rate limit exceeded"}". When I execute 'lotus sync wait', the synchronization stops and the mining and proving functions of lotus-miner freeze.

Logging Information

2023-04-26T17:26:37.959+0800	WARN	sub	sub/incoming.go:588	ignoring indexer message	{"sender": "12D3KooWFYza3DCiYss3eUpGPLKq2ANDRrpHXgVRxju1aQicXBRN", "err": "rate limit exceeded"}
2023-04-26T17:26:39.527+0800	WARN	sub	sub/incoming.go:588	ignoring indexer message	{"sender": "12D3KooWFYza3DCiYss3eUpGPLKq2ANDRrpHXgVRxju1aQicXBRN", "err": "rate limit exceeded"}
2023-04-26T17:26:40.201+0800	WARN	sub	sub/incoming.go:588	ignoring indexer message	{"sender": "12D3KooWFYza3DCiYss3eUpGPLKq2ANDRrpHXgVRxju1aQicXBRN", "err": "rate limit exceeded"}
2023-04-26T17:26:40.634+0800	WARN	sub	sub/incoming.go:588	ignoring indexer message	{"sender": "12D3KooWFYza3DCiYss3eUpGPLKq2ANDRrpHXgVRxju1aQicXBRN", "err": "rate limit exceeded"}
2023-04-26T17:26:40.874+0800	WARN	sub	sub/incoming.go:588	ignoring indexer message	{"sender": "12D3KooWHuK15dwCZ8LDTZdFMR4AmQFRaHynYZb648dGotPCmQQ7", "err": "rate limit exceeded"}
2023-04-26T17:26:42.723+0800	WARN	sub	sub/incoming.go:588	ignoring indexer message	{"sender": "12D3KooWHuK15dwCZ8LDTZdFMR4AmQFRaHynYZb648dGotPCmQQ7", "err": "rate limit exceeded"}
2023-04-26T17:26:43.587+0800	WARN	sub	sub/incoming.go:588	ignoring indexer message	{"sender": "12D3KooWHuK15dwCZ8LDTZdFMR4AmQFRaHynYZb648dGotPCmQQ7", "err": "rate limit exceeded"}
@TippyFlitsUK
Copy link
Contributor

Hey @he426100 👋
This may be a symptom of the current chain state. Can you please try this again after 14:00 UTC on Friday the 28th of April 2023 and let us know if you are still seeing the same issues?
Many thanks!!

@TippyFlitsUK TippyFlitsUK added kind/enhancement Kind: Enhancement area/chain Area: Chain need/author-input Hint: Needs Author Input and removed need/triage kind/bug Kind: Bug labels Apr 27, 2023
@he426100
Copy link
Author

thank your response, i'll follow up

@TippyFlitsUK TippyFlitsUK added the P3 P3: Might get resolved label Apr 27, 2023
@he426100
Copy link
Author

@TippyFlitsUK The problem still exists, Lotus may stop sync at any time, can be restored by restart, I've seen others submit similar issue.
I have one node where spitstore is not enabled, and it is always syncing properly

@TippyFlitsUK
Copy link
Contributor

Thank you for the update @he426100!!

Can you please provide your system sec?

@he426100
Copy link
Author

I'm not sure what system sec stands for. My machine is protected by a hardware firewall, otherwise it is ubuntu's default configuration.
If my machine wants to provide external services, port forwarding needs to be configured on the hardware firewall

@TippyFlitsUK
Copy link
Contributor

Sorry @he426100!! Typo!! I meant SPEC. 👍

@he426100
Copy link
Author

cpu: 7542x2
ram: 512G
disk: m2 (intel/inspur)
gpu: 3080/3090 x2

my miner and daemon are on the same machine

@he426100
Copy link
Author

daemon stop sync all night!
image

@sokach-dev
Copy link

I also had this phenomenon and found that the program was out of sync after running for a while. had to restart to get it to work, then four or five hours later it started again😭😭

截屏2023-04-30 22 35 56

me too

@gnowakpoloniex
Copy link

I am suffering from this issue as well. It's repeating daily now and no amount of restarting the node seems to help

@git-ljm
Copy link

git-ljm commented May 4, 2023

me too

@mithmith
Copy link

mithmith commented May 6, 2023

I have the same on lotus version 1.23.0+mainnet+git.d1d4b35ad
After restarting, the lotus daemon is successfully synchronized and runs for 2-3 hours, then freezes and synchronization disappears.

lotus log is:

2023-05-06T14:44:13.526Z        WARN    hello   hello/hello.go:108      other peer hasnt completed libp2p identify, waiting a bit
2023-05-06T14:44:20.695Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWFgpeYMKHBRgC4Qc3HRVRWFfbJX7moqEKKqT1vMFCySU4 addr=/ip4/1.83.125.173/tcp/26826 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T14:45:26.287Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWSyE7CspqGinUWdDqvZdVRJS3dnnJ4SXFnbu9MnW7DTNd addr=/ip4/39.109.125.191/tcp/13003 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T14:53:55.440Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWL4fa7WQTn51A31hf3muMofHZnSKixTTYnXkDjdcZYnFW addr=/ip4/139.215.226.71/tcp/33459 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T15:04:47.965Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWECcYQ4Mwvq5eSD1VvLtBU5HpunmHR9wHGz9AAKid1mWh addr=/ip4/122.190.56.248/tcp/55888 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T15:08:20.540Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWFPzNZHn6nq6C9eePghgxuBti8NpAgPrfbhW38vycxtpH", "err": "rate limit exceeded"}
2023-05-06T15:08:31.019Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWFPzNZHn6nq6C9eePghgxuBti8NpAgPrfbhW38vycxtpH", "err": "rate limit exceeded"}
2023-05-06T15:08:43.098Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWFPzNZHn6nq6C9eePghgxuBti8NpAgPrfbhW38vycxtpH", "err": "rate limit exceeded"}
2023-05-06T15:13:56.387Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWBvnurf9L8yESf4W3GaZfAzQ6n9c5w5PCzHFwGJeDU2qG addr=/ip4/154.210.12.64/tcp/10101 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T15:14:04.489Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWB5k1a8E6TaJ8Ru5A3VhixHe2Njdm7toHx7MehPiSXAx9 addr=/ip4/154.42.3.136/tcp/24001 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T15:24:39.974Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWAv3kxvUAmJXmQns4GEVqeXhRSRpEmWHANWxbM9p9Uj2d addr=/ip4/116.169.54.254/tcp/43864 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T15:26:51.660Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T15:27:32.933Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T15:28:11.266Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T15:28:49.399Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T15:29:44.119Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T15:29:54.246Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T15:35:36.629Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWBwajPfNpGr3hTbiMTP1KoLA7BBWefkpXk69zhPB1VcZC", "err": "rate limit exceeded"}
2023-05-06T15:35:46.650Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWBwajPfNpGr3hTbiMTP1KoLA7BBWefkpXk69zhPB1VcZC", "err": "rate limit exceeded"}
2023-05-06T15:35:46.795Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWBwajPfNpGr3hTbiMTP1KoLA7BBWefkpXk69zhPB1VcZC", "err": "rate limit exceeded"}
2023-05-06T15:36:06.569Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWBwajPfNpGr3hTbiMTP1KoLA7BBWefkpXk69zhPB1VcZC", "err": "rate limit exceeded"}
2023-05-06T15:36:18.672Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWBwajPfNpGr3hTbiMTP1KoLA7BBWefkpXk69zhPB1VcZC", "err": "rate limit exceeded"}
2023-05-06T15:44:00.718Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWQyX7XAX6ngjgdZTDNfbstvC58nzQFfbEaxBFHPWVQfdP addr=/ip4/63.217.225.70/tcp/43319 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T15:44:57.273Z        INFO    net/identify    identify/id.go:396      failed negotiate identify protocol with peer    {"peer": "12D3KooWHC3ZVATqH8ryUz7WnjdcmtuyTQ1SrFKAu69aUuujyD6q", "error": "stream reset"}
2023-05-06T15:44:57.273Z        WARN    net/identify    identify/id.go:371      failed to identify 12D3KooWHC3ZVATqH8ryUz7WnjdcmtuyTQ1SrFKAu69aUuujyD6q: stream reset
2023-05-06T15:45:39.445Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWNf7AN5JGcvtfiVtEbw84UXKRZmbLKQvQWH7HxGJtXnU3 addr=/ip4/221.206.40.14/tcp/61214 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T15:53:55.560Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWSqaeFgtgx1idtZfqzMebKLQyrxVADz1z4miD4wYcourM addr=/ip4/65.181.92.94/tcp/24001 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T16:02:39.172Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWANtvNpKWnzATuruX8AS4BSCwj52KLcqgQdaw9bZ1oTv5", "err": "rate limit exceeded"}
2023-05-06T16:05:13.045Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWQyX7XAX6ngjgdZTDNfbstvC58nzQFfbEaxBFHPWVQfdP addr=/ip4/63.217.225.70/tcp/43319 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T16:24:06.703Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWBxsG4ChM3ioufdp4sZ4oKh3EDKjcWUf863GvhfgZFwgc addr=/ip4/121.66.119.169/tcp/30170 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T16:24:11.581Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWFgpeYMKHBRgC4Qc3HRVRWFfbJX7moqEKKqT1vMFCySU4 addr=/ip4/1.83.125.173/tcp/26826 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T16:24:13.852Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWGziXA8iNKdoaVyYAGVuPbsCD2HT3Snv3UuExvbFS9Xbu addr=/ip4/203.208.141.111/tcp/16168 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T16:24:40.517Z        INFO    canonical-log   swarm/swarm_dial.go:500 CANONICAL_PEER_STATUS: peer=12D3KooWAdmv6VV141JD9HwXCaPhzKDquGskQuEwMN7Aa2HmAUeT addr=/ip4/144.48.65.140/tcp/31149 sample_rate=100 connection_status="established" dir="outbound"
2023-05-06T16:27:15.291Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T16:27:40.914Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T16:29:37.823Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNxvbu1bijMHQfaXsGLRUwLToegzbgGVGD8hooECjMb6F", "err": "rate limit exceeded"}
2023-05-06T16:34:06.402Z        WARN    hello   hello/hello.go:108      other peer hasnt completed libp2p identify, waiting a bit
2023-05-06T16:37:44.736Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNmbBtLWZDrWJgg8FpvDvMYJ51WDhkxWunqeSfJ3da9LU", "err": "rate limit exceeded"}
2023-05-06T16:38:30.287Z        WARN    sub     sub/incoming.go:588     ignoring indexer message        {"sender": "12D3KooWNmbBtLWZDrWJgg8FpvDvMYJ51WDhkxWunqeSfJ3da9LU", "err": "rate limit exceeded"}

@filecoin-project filecoin-project deleted a comment from github-actions bot May 10, 2023
@TippyFlitsUK TippyFlitsUK added need/analysis Hint: Needs Analysis and removed need/author-input Hint: Needs Author Input labels May 10, 2023
@rjan90
Copy link
Contributor

rjan90 commented May 11, 2023

We have opened a PR that should fix this issue, see the comment here: #10788 (comment)

@arajasek
Copy link
Contributor

Is anyone experiencing this issue with latest Lotus master (or any release with this fix cherry-picked)? I am optimistic we have a fix.

Also, anyone experiencing this, please share the output of lotus pprof goroutines here.

@rjan90
Copy link
Contributor

rjan90 commented Aug 21, 2023

Closing this issue, as we are comfortable saying this has been fixed by #10973. Full breakdown of the bug and fix can be read here: #10906 (comment)

@rjan90 rjan90 closed this as completed Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chain Area: Chain kind/enhancement Kind: Enhancement kind/stale need/analysis Hint: Needs Analysis P3 P3: Might get resolved
Projects
None yet
Development

No branches or pull requests

9 participants