Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in PC1, all three producer thread was binded to core group 0 #5076

Closed
longmaosen opened this issue Dec 1, 2020 · 7 comments
Closed

in PC1, all three producer thread was binded to core group 0 #5076

longmaosen opened this issue Dec 1, 2020 · 7 comments

Comments

@longmaosen
Copy link

I got 3 pc1 workers on same machine, and i set FIL_PROOFS_USE_MULTICORE_SDR=1, taskset -c 0,1,2,3 lotus-worker run,taskset -c 4,5,6,7 lotus-worker run,taskset -c 8,9,10,11 lotus-worker run,i excpect every worker takes its cpuset(work1:0,1,2,3 ;work2:4,5,6,7;work3:8,9,10,11) and every worker seals 1 layer in 20 minutes, but its result is that three worker's core all bind to core_group 0(cpuset 0,1,2,3), average layer takes 40 minutes.

2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::proof > replicate_phase1
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::graph > using parent_cache[2048 / 1073741824]
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::cache > parent cache: opening /data/cpfs/PROOFS_PARENT/v28-sdr-parent-21981246c370f9d76c7a77ab273d94bde0ceb4e938292334960bce05585dc117.cache, verify enabled: false
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::proof > multi core replication
2020-11-29T23:28:10.477 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > create labels
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > Cores: 128, Shared Caches: 32, cores per cache (group_size): 4
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > checked out core group 0
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::create_label::multi > binding core in main thread
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > allowed cpuset: 0
2020-11-29T23:28:10.542 DEBUG storage_proofs_porep::stacked::vanilla::cores > binding to 0
2020-11-29T23:28:10.559 INFO storage_proofs_porep::stacked::vanilla::memory_handling > initializing cache

2020-11-29T23:28:57.189 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Layer 1
2020-11-29T23:28:57.190 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Creating labels for layer 1

4
5

@longmaosen longmaosen changed the title in PC1, all three producer thread was binede to core group 0 in PC1, all three producer thread was binded to core group 0 Dec 1, 2020
@jennijuju
Copy link
Member

whats the hardware you are using?

@jennijuju jennijuju added the need/author-input Hint: Needs Author Input label Dec 1, 2020
@maxmalong
Copy link

PC1 WORKER: cpu AMD EPYC 7H12; 2048GiB mem
lotus version 1.2.1

@qiusugang
Copy link

I have same problem, It seem that PC1 worker is not effective for taskset specified CPU!

@kimimhong
Copy link

I have the same problem.
Epyc 7F32 When only 1 process is executed, it falls into 03 and progresses as multi-core, but even when 6 processes are executed, all processes go to 03, and the same phenomenon occurs even if taskset is assigned.

@cwhiggins
Copy link

cwhiggins commented Dec 28, 2020

EPYC 7272 512Gib Ram Just started to experiment with multicore yesterday. I see a 20% drop in time from a little over 5 hours to 4 hours flat, for up to two PC1 tasks on the same worker. If I add another worker and add a PC1 task the system slows down.
All workers were started using taskset to specify cpu affinity, PID 8080 uses three cores 0-2 which it was not set to use, causing a slow down.
Looking at hwloc-ps p showed this

7059 PU:12 PU:13 lotus-worker //this is my add piece worker
7100 PU:0 PU:1 PU:2 PU:3 PU:4 PU:5 lotus-worker //PC1 worker #1
7144 PU:14 PU:15 lotus-worker //PC2 worker
8080 PU:0 PU:1 PU:2 PU:6 PU:7 PU:8 PU:9 PU:10 PU:11 lotus-worker //PC1 worker #2

So running two PC1 tasks on the first worker runs smoothly, when a task is added to the second worker it slows down as it then tries to use cores already being used, 0-2, which should not even be assigned to that PID.

lotus version
Daemon:  1.4.0+git.e9989d0e4+api1.0.0
Local: lotus version 1.4.0+git.e9989d0e4

@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.

@github-actions
Copy link

This issue was closed because it is missing author input.

@TippyFlitsUK TippyFlitsUK removed the need/author-input Hint: Needs Author Input label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants