Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[venus-miner] 同一miner连续两个高度出块时,由于父区块孤块,导致子区块产生孤块 #6006

Closed
1 of 11 tasks
cloudxin opened this issue Jun 6, 2023 · 6 comments · May be fixed by ipfs-force-community/sophon-miner#205
Labels
C-Blocked Category: temporarily stuck without a good solution C-bug Category: This is a bug CU-force-community Category: from force community P1 High - we should be working on this now or in the immediate future

Comments

@cloudxin
Copy link

cloudxin commented Jun 6, 2023

链服务模块 / Chain Service Components

  • venus
  • venus-auth
  • venus-gateway
  • venus-messager
  • venus-miner
  • 文档 / docs

订单服务模块 / Deal Service Components

  • venus-market
  • 文档 / docs

算力服务模块 / Storage Power Service Components

  • venus-sector-manager
  • venus-worker
  • 文档 / docs

版本 / Version

venus-miner -v
venus-miner version 1.11.0-rc1'+gitd870383'

描述 / Describe the Bug

同一miner连续两个高度出块时,由于父区块多获取一个base产生孤块后,影响子区块正常出块。

日志 / Logging Information

2023-06-06T09:29:51.586+0800    INFO    miner   miner/minerwpp.go:55    Computing WinningPoSt ;[{SealProof:8 SectorNumber:29705 SectorKey:<nil> SealedCID:bagboea4b5abcafzlnhyvuhs24xythtsxonxg7qjb3jyp7uhezkyabbpbvuuwkrlk}]; [19 118 8 110 190 189 202 164 195 161 21 28 104 155 24 41 134 142 104 142 121 158 89 139 250 188 238 236 171 108 100 229]
2023-06-06T09:29:52.195+0800    INFO    miner   miner/multiminer.go:790 mine one        {"miner": "t0xxxx", "compute ticket": 0.699802776}
2023-06-06T09:29:52.879+0800    INFO    miner   miner/multiminer.go:809 not to be winner        {"miner": "t04040"}
2023-06-06T09:29:55.975+0800    INFO    miner   miner/minerwpp.go:70    GenerateWinningPoSt took 4.389147768s
2023-06-06T09:29:55.975+0800    INFO    miner   miner/multiminer.go:868 mine one        {"miner": "t0xxxx", "compute proof": 4.40181878}
2023-06-06T09:29:55.975+0800    INFO    miner   miner/multiminer.go:875 mined new block ( -> Proof)     {"took": 4.49549033, "miner": "t0xxxx"}
2023-06-06T09:29:55.975+0800    INFO    miner   miner/multiminer.go:316 mining compute  {"number of wins": 1, "total miner": 17}
2023-06-06T09:29:55.987+0800    INFO    miner   miner/multiminer.go:363 select message  {"tickets": 1}
2023-06-06T09:29:56.049+0800    INFO    miner   miner/multiminer.go:976 create block time consuming     {"miner": "t01037", "tMinerCreateBlockAPI": 0.012979475, "tBlockSIgn": 0.035008963}
2023-06-06T09:29:56.049+0800    INFO    miner   miner/multiminer.go:396 mined new block {"cid": "bafy2bzacebt4blewgrquemy4akmwghpqzlvmklyv6nwolb2k2cr6czx2t5u6w", "height": "622954", "miner": "t0xxx", "parents": ["t011757","t01024","t01491","t01037","t01001"], "wincount": 1, "weight": "11497750263", "took": 4.56883635}
2023-06-06T09:29:56.049+0800    INFO    miner   miner/multiminer.go:397 mining time consuming   {"miner": "t0xxx", "tMinerBaseInfo": 0.016385147, "tTicket": 0.035226046, "tIsWinner": 0.037237644, "tSeed": 0.00478519, "tProof": 4.40181878, "tSelMsg": 0.025329584, "tCreateBlock": 0.048053959}
2023-06-06T09:30:15.764+0800    WARN    auth-miners     miner-manager/auth_manager.go:132       user: defaultLocalToken state is disabled, it's miners won't be updated
2023-06-06T09:30:15.987+0800    DEBUG   miner   miner/miningmgr.go:251  polling miners success

重现步骤 / Repo Steps

No response

@cloudxin cloudxin added the C-bug Category: This is a bug label Jun 6, 2023
@diwufeiwen
Copy link
Contributor

diwufeiwen commented Jun 6, 2023

这个问题的根源在于链分叉,产生原因如下:

  1. 上一轮矿工(假设为 t01200)出块的base包含了分叉链上的块,这个目前同步节点也没法解决,因为分叉链上的块没有共识问题,但是在本轮却因链再次聚合被排除出主链了,具体为啥待查,有点劣币淘汰良币的感觉(见 https://github.com/filecoin-project/venus/issues/5666);
  2. 上一轮t01200在slashfilter的数据库中记录了自己出的孤块,所以本轮还有出块权的情况下,会判断共识:必须包含上一轮自己出的块,而导致共识错误。

关于共识判断问题的解决方案参考: 在判断 parent-grinding fault 时,应再次从链上确认上一轮自己的出块有没有被承认,没有被承认则不判定为 parent-grinding fault.这个问题lotus也存在

@diwufeiwen diwufeiwen added the P3 Low - not important right now label Jun 6, 2023
@diwufeiwen
Copy link
Contributor

这个问题虽重,但是主网出现概率极小,故紧急度不高

@Fatman13 Fatman13 added the CU-force-community Category: from force community label Jun 6, 2023
@elvin-du elvin-du added P1 High - we should be working on this now or in the immediate future and removed P3 Low - not important right now labels Jun 16, 2023
@diwufeiwen
Copy link
Contributor

diwufeiwen commented Jun 29, 2023

这个问题的原因其实是链分叉引起的,而miner的slashfilter检查逻辑本身是没啥问题的,建议采用补救式的措施:
*** 对于容许跳过的slashfilter error 加环境变量 ***

  • time-offset mining faults,表示同一高度出两个块,这个错误不许跳过,因为会造成惩罚;
  • parent-grinding fault, 表示当前区块没有包含上一轮自己的出块,这个当上一轮自己的出块是个孤块时就会造成误判,而上一个自己的块成为孤块有些情况下出块本身没有问题,所以miner和同步节点在出块周期内是没法判断出来的,在 chain revert 事件后才被排除主链。故这种错误允许跳过。
var TimeOffsetMiningFaults = errors.New("time-offset mining faults")
var ParentGrindingFaults = errors.New("parent-grinding fault")

func (f *mysqlSlashFilter) MinedBlock(ctx context.Context, bh *types.BlockHeader, parentEpoch abi.ChainEpoch) error {
...
	if !found {
		return errors.Wrapf(ParentGrindingFaults, "produced block would trigger consensus fault; miner: %s; bh: %s, expected parent: %s", bh.Miner, bh.Cid(), parent)
	}
}

func (m *Miner) broadCastBlock(ctx context.Context, base MiningBase, bm *sharedTypes.BlockMsg) {
if !(errors.Is(err, slashfilter.ParentGrindingFaults) &&
			os.Getenv("SOPHON_MINER_NO_SLASHFILTER") == "_yes_i_know_and_i_accept_that_may_loss_my_fil") {
			if err = m.sf.PutBlock(ctx, bm.Header, base.TipSet.Height()+base.NullRounds, time.Time{}, slashfilter.Error); err != nil {
				log.Errorf("failed to put block: %s", err)
			}

			mtsMineBlockFailCtx, _ := tag.New(
				ctx,
				tag.Upsert(metrics.MinerID, bm.Header.Miner.String()),
			)
			stats.Record(mtsMineBlockFailCtx, metrics.NumberOfMiningError.M(1))
			return
		}
}

启动时设置环境变量 SOPHON_MINER_NO_SLASHFILTER="_yes_i_know_and_i_accept_that_may_loss_my_fil" 可避免这个issue中的问题

@diwufeiwen
Copy link
Contributor

另外 venus 的 slashfilter 逻辑也要做相应的修改,因为新的出块在广播之前venus本地也会做共识检查,并且在出块周期同样没法判断这样的孤块。 ----------待具体考察

@elvin-du
Copy link
Collaborator

测试网出现的概率非常高,但是主网出现的概率很低。

@elvin-du elvin-du added C-Blocked Category: temporarily stuck without a good solution and removed C-Blocked Category: temporarily stuck without a good solution labels Jul 11, 2023
@elvin-du
Copy link
Collaborator

暂时无法找到完美的解决方案,暂时无法推进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Blocked Category: temporarily stuck without a good solution C-bug Category: This is a bug CU-force-community Category: from force community P1 High - we should be working on this now or in the immediate future
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants