Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splitstore purge Suspend chain synchronization #10791

Closed
5 of 11 tasks
marco-storswift opened this issue Apr 30, 2023 · 5 comments
Closed
5 of 11 tasks

splitstore purge Suspend chain synchronization #10791

marco-storswift opened this issue Apr 30, 2023 · 5 comments
Labels

Comments

@marco-storswift
Copy link
Contributor

marco-storswift commented Apr 30, 2023

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus fvm/fevm - Lotus FVM and FEVM interactions
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt/WinningPoSt)
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

V 1.23.0

Repro Steps

lotus deamon run

Describe the Bug

blockstore\splitstore\splitstore_compact.go #line 808
// 5. purge cold objects from the hotstore, taking protected references into account
log.Info("purging cold objects from the hotstore")
startPurge := time.Now()
err = s.purge(purger, checkpoint, markSet)
if err != nil {
	return xerrors.Errorf("error purging cold objects: %w", err)
}
log.Infow("purging cold objects from hotstore done", "took", time.Since(startPurge))
/log$ grep purging lotus.log
2023-04-26T10:11:31.946Z        INFO    splitstore      splitstore/splitstore_compact.go:808    purging cold objects from the hotstore
2023-04-26T10:11:44.255Z        INFO    splitstore      splitstore/splitstore_compact.go:814    purging cold objects from hotstore done {"took": 12.309352428}
2023-04-27T07:24:14.285Z        INFO    splitstore      splitstore/splitstore_compact.go:808    purging cold objects from the hotstore
2023-04-27T07:29:36.812Z        INFO    splitstore      splitstore/splitstore_compact.go:814    purging cold objects from hotstore done {"took": 322.526538782}
2023-04-29T15:19:29.726+0800    INFO    splitstore      splitstore/splitstore_compact.go:808    purging cold objects from the hotstore
2023-04-29T20:14:30.086+0800    INFO    splitstore      splitstore/splitstore_compact.go:1491   purging cold objects from the hotstore
2023-04-29T20:14:30.088+0800    INFO    splitstore      splitstore/splitstore_compact.go:1497   purging cold objects from hotstore done {"took": 0.001268274}
2023-04-30T02:42:01.135+0800    INFO    splitstore      splitstore/splitstore_compact.go:808    purging cold objects from the hotstore
2023-04-30T07:32:23.456+0800    INFO    splitstore      splitstore/splitstore_compact.go:1491   purging cold objects from the hotstore
2023-04-30T07:32:23.457+0800    INFO    splitstore      splitstore/splitstore_compact.go:1497   purging cold objects from hotstore done {"took": 0.000368761}
log$ grep purged lotus.log
2023-04-26T10:11:44.255Z        INFO    splitstore      splitstore/splitstore_compact.go:1358   purged cold objects     {"purged": 1546028, "live": 0}
2023-04-27T07:29:36.812Z        INFO    splitstore      splitstore/splitstore_compact.go:1358   purged cold objects     {"purged": 29479159, "live": 12}
2023-04-29T20:14:30.087+0800    INFO    splitstore      splitstore/splitstore_compact.go:1534   purged cold objects     {"purged": 0, "live": 0}
2023-04-30T07:32:23.457+0800    INFO    splitstore      splitstore/splitstore_compact.go:1534   purged cold objects     {"purged": 0, "live": 0}
it looks like  2023-04-29T15:19:29.726+0800 and 2023-04-30T02:42:01.135+0800 be hold in   s.purge functions

Logging Information

the chain sync be stoped
@marco-storswift
Copy link
Contributor Author

marco-storswift commented Apr 30, 2023

2023-04-30T02:43:50.829+0800    INFO    consensus-common        consensus/compute_state.go:300  ApplyBlocks stats       {"early": 0.00001383, "earlyCronGas": 0, "vmMsg": 45.448407322, "msgGas": 21249558239, "vmCron": 53.190443896, "cronGas": 29603212923, "vmFlush": 3.129416941, "epoch": "2816244", "tsk": "{bafy2bzacebupojrjnyz75vnputnah2olmqvedqhl7dvo6guu3sgy6xcp2atso,bafy2bzacebuhwz5czc7nhhekrf6jyre3z3pxoczipsejtwguduuqjucsuazyk,bafy2bzacedj2jlswadk64cjgnnwv4qboivly2cjjzdyukltzrqxtqcoz7mx4a,bafy2bzacea6zrpuzmvoifgc3c4csn4m3ck7kd736on7cqoogcpdkeijwma6w4,bafy2bzaceczhns2wsaaw3kdnc2nbzjdr5u4ytollv6zzcsm23yerdgxi2i4lc,bafy2bzacebxcryt4cfzquxihyxh5adtlooqu3ebshxxhyb56bnryby7l7tt3i}"}
vmCron": 53.190443896 is very Crazy number

@marco-storswift
Copy link
Contributor Author

marco-storswift commented Apr 30, 2023

2023-04-29T15:21:25.741+0800    INFO    consensus-common        consensus/compute_state.go:300  ApplyBlocks stats       {"early": 0.00000635, "earlyCronGas": 0, "vmMsg": 34.998562862, "msgGas": 6910722868, "vmCron": 74.645851745, "cronGas": 27360249873, "vmFlush": 2.966820771, "epoch": "2814879", "tsk": "{bafy2bzacedp2q3ft2p4loexohwhih54uxl2saz3wbgkoptbpsayxllukhuyl4}"}
LOTUS_FVM_CONCURRENCY set default value 4

@marco-storswift
Copy link
Contributor Author

marco-storswift commented Apr 30, 2023

tail -n 1000 lotus.log  | grep splitstore | grep -v missing
2023-04-30T19:51:07.990+0800    INFO    splitstore      splitstore/splitstore_compact.go:870    beginning critical section
2023-04-30T19:51:07.991+0800    INFO    splitstore      splitstore/splitstore_compact.go:398    protecting transactional references     {"refs": 466426}
2023-04-30T19:52:31.098+0800    INFO    splitstore      splitstore/splitstore_compact.go:451    protecting transactional refs done      {"took": 83.106278329, "protected": 282430, "protected size": 421057635}
2023-04-30T19:52:31.098+0800    INFO    splitstore      splitstore/splitstore_compact.go:398    protecting transactional references     {"refs": 27237}
2023-04-30T19:52:33.375+0800    INFO    splitstore      splitstore/splitstore_compact.go:451    protecting transactional refs done      {"took": 2.276505963, "protected": 7003, "protected size": 10692694}
2023-04-30T19:52:33.375+0800    INFO    splitstore      splitstore/splitstore_compact.go:398    protecting transactional references     {"refs": 186}
2023-04-30T19:52:33.378+0800    INFO    splitstore      splitstore/splitstore_compact.go:451    protecting transactional refs done      {"took": 0.002603639, "protected": 15, "protected size": 10338}
2023-04-30T19:52:33.456+0800    INFO    splitstore      splitstore/splitstore_compact.go:398    protecting transactional references     {"refs": 344}
2023-04-30T19:52:33.460+0800    INFO    splitstore      splitstore/splitstore_compact.go:897    waiting for sync
2023-04-30T19:52:59.661+0800    INFO    splitstore      splitstore/splitstore_compact.go:904    waiting for sync done   {"took": 26.20077273}
2023-04-30T19:52:59.661+0800    INFO    splitstore      splitstore/splitstore_compact.go:808    purging cold objects from the hotstore
Suspend  one hour now

@marco-storswift
Copy link
Contributor Author

marco-storswift commented Apr 30, 2023

lotus pprof goroutines > pprof.txt 
got callstack
sync.runtime_SemacquireRWMutex(0x7c6b4b?, 0xe5?, 0x7d2f67?)
        /usr/local/go/src/runtime/sema.go:87 +0x26
sync.(*RWMutex).Lock(0x7bd788?)
        /usr/local/go/src/sync/rwmutex.go:152 +0x71
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).purgeBatch(0xc0006ee580, {0xc038430000, 0x4000, 0x0?}, {0xc0389aa000, 0x0, 0x4000}, 0xc0cc971740?, {0x7433ee0, 0xc0be0b0be0})
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:1396 +0x10d
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).purge.func2()
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:1362 +0x6b
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).purge.func3({{0xc0cc971740?, 0xc0f88cfe00?}})
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:1374 +0xd9
github.com/filecoin-project/lotus/blockstore/splitstore.(*ColdSetReader).ForEach(0xc084ddca60, 0xc0365355b8)
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/coldset.go:80 +0x6f
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).purge(0xc0006ee580, 0x0?, 0xc0a3366e50, {0x7433ee0, 0xc0be0b0be0})
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:1371 +0x205
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).doCompact(0xc0006ee580, 0xc085d59a40)
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:810 +0x1b94
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).compact(0xc0006ee580, 0x0?)
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:535 +0x198
github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).HeadChange.func1()
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:158 +0x133
created by github.com/filecoin-project/lotus/blockstore/splitstore.(*SplitStore).HeadChange
        /mnt/lotus/lotusbuild/20230426013433/marco/blockstore/splitstore/splitstore_compact.go:151 +0x5f8

@Reiers Reiers added area/chain Area: Chain splitstore and removed need/triage labels May 1, 2023
@rjan90
Copy link
Contributor

rjan90 commented Jul 10, 2023

Closing this issue, as we are comfortable saying the underlying bug here causing the synchronization issue has been fixed by #10973. Full breakdown of the bug and fix can be read here: #10906 (comment)

@rjan90 rjan90 closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants