Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The AVX version often under utilizes multiple cores #23

Open
kentavv opened this issue Dec 28, 2020 · 0 comments
Open

The AVX version often under utilizes multiple cores #23

kentavv opened this issue Dec 28, 2020 · 0 comments

Comments

@kentavv
Copy link
Owner

kentavv commented Dec 28, 2020

Top-level iteration 6/8 of new_problem.in demonstrates this. The example from Top below is an extreme example, but the imbalance is consistent enough that we should investigate. Need to profile the threads to understand how work is being divided up. Is the sort in the top-level thread creating a bottleneck? The truncated-dense method may be less sensitive to infrequent sorting. Is the OpenMP dynamic scheduler creating a bottleneck? Does a vector of rows with non-zero entries in the currently reducing column need to be created to better balance load? This vector may be possible to create while reducing the preceding column.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

137309 kent 20 0 4256260 3.1g 4408 R 99.9 5.0 28:33.65 albert
137314 kent 20 0 4256260 3.1g 4408 S 54.5 5.0 17:35.25 albert
137317 kent 20 0 4256260 3.1g 4408 S 27.3 5.0 17:30.03 albert
137322 kent 20 0 4256260 3.1g 4408 S 27.3 5.0 17:31.70 albert
137323 kent 20 0 4256260 3.1g 4408 S 27.3 5.0 17:35.79 albert
137310 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:35.87 albert
137311 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.10 albert
137312 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:32.59 albert
137313 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:32.85 albert
137315 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.32 albert
137316 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.05 albert
137318 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:35.82 albert
137319 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.11 albert
137320 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:34.18 albert
137321 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:32.93 albert
137324 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:34.70 albert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant