in PC1, all three producer thread was binded to core group 0 #5076

longmaosen · 2020-12-01T10:50:57Z

I got 3 pc1 workers on same machine, and i set FIL_PROOFS_USE_MULTICORE_SDR=1, taskset -c 0,1,2,3 lotus-worker run，taskset -c 4,5,6,7 lotus-worker run，taskset -c 8,9,10,11 lotus-worker run，i excpect every worker takes its cpuset(work1:0,1,2,3 ;work2:4,5,6,7;work3:8,9,10,11) and every worker seals 1 layer in 20 minutes, but its result is that three worker's core all bind to core_group 0(cpuset 0,1,2,3), average layer takes 40 minutes.

2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::proof > replicate_phase1
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::graph > using parent_cache[2048 / 1073741824]
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::cache > parent cache: opening /data/cpfs/PROOFS_PARENT/v28-sdr-parent-21981246c370f9d76c7a77ab273d94bde0ceb4e938292334960bce05585dc117.cache, verify enabled: false
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::proof > multi core replication
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > create labels
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > Cores: 128, Shared Caches: 32, cores per cache (group_size): 4
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > checked out core group 0
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::create_label::multi > binding core in main thread
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > allowed cpuset: 0
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > binding to 0
2020-11-29T23:28:10.559 INFO storage_proofs_porep::stacked::vanilla::memory_handling > initializing cache

2020-11-29T23:28:57.189 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Layer 1
2020-11-29T23:28:57.190 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Creating labels for layer 1

jennijuju · 2020-12-01T20:21:41Z

whats the hardware you are using?

maxmalong · 2020-12-02T08:52:53Z

PC1 WORKER: cpu AMD EPYC 7H12; 2048GiB mem
lotus version 1.2.1

qiusugang · 2020-12-02T10:09:16Z

I have same problem, It seem that PC1 worker is not effective for taskset specified CPU！

kimimhong · 2020-12-03T07:09:11Z

I have the same problem.
Epyc 7F32 When only 1 process is executed, it falls into 0~~3 and progresses as multi-core, but even when 6 processes are executed, all processes go to 0~~3, and the same phenomenon occurs even if taskset is assigned.

cwhiggins · 2020-12-28T21:28:15Z

EPYC 7272 512Gib Ram Just started to experiment with multicore yesterday. I see a 20% drop in time from a little over 5 hours to 4 hours flat, for up to two PC1 tasks on the same worker. If I add another worker and add a PC1 task the system slows down.
All workers were started using taskset to specify cpu affinity, PID 8080 uses three cores 0-2 which it was not set to use, causing a slow down.
Looking at hwloc-ps p showed this

7059 PU:12 PU:13 lotus-worker //this is my add piece worker
7100 PU:0 PU:1 PU:2 PU:3 PU:4 PU:5 lotus-worker //PC1 worker #1
7144 PU:14 PU:15 lotus-worker //PC2 worker
8080 PU:0 PU:1 PU:2 PU:6 PU:7 PU:8 PU:9 PU:10 PU:11 lotus-worker //PC1 worker #2

So running two PC1 tasks on the first worker runs smoothly, when a task is added to the second worker it slows down as it then tries to use cores already being used, 0-2, which should not even be assigned to that PID.

lotus version
Daemon:  1.4.0+git.e9989d0e4+api1.0.0
Local: lotus version 1.4.0+git.e9989d0e4

github-actions · 2021-07-17T00:06:55Z

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.

github-actions · 2021-07-18T00:07:06Z

This issue was closed because it is missing author input.

longmaosen changed the title ~~in PC1, all three producer thread was binede to core group 0~~ in PC1, all three producer thread was binded to core group 0 Dec 1, 2020

jennijuju added the need/author-input Hint: Needs Author Input label Dec 1, 2020

github-actions bot added the kind/stale label Jul 17, 2021

github-actions bot closed this as completed Jul 18, 2021

TippyFlitsUK removed the need/author-input Hint: Needs Author Input label Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in PC1, all three producer thread was binded to core group 0 #5076

in PC1, all three producer thread was binded to core group 0 #5076

longmaosen commented Dec 1, 2020

jennijuju commented Dec 1, 2020

maxmalong commented Dec 2, 2020

qiusugang commented Dec 2, 2020

kimimhong commented Dec 3, 2020

cwhiggins commented Dec 28, 2020 •

edited

Loading

github-actions bot commented Jul 17, 2021

github-actions bot commented Jul 18, 2021

in PC1, all three producer thread was binded to core group 0 #5076

in PC1, all three producer thread was binded to core group 0 #5076

Comments

longmaosen commented Dec 1, 2020

jennijuju commented Dec 1, 2020

maxmalong commented Dec 2, 2020

qiusugang commented Dec 2, 2020

kimimhong commented Dec 3, 2020

cwhiggins commented Dec 28, 2020 • edited Loading

github-actions bot commented Jul 17, 2021

github-actions bot commented Jul 18, 2021

cwhiggins commented Dec 28, 2020 •

edited

Loading