The AVX version often under utilizes multiple cores #23

kentavv · 2020-12-28T17:38:29Z

Top-level iteration 6/8 of new_problem.in demonstrates this. The example from Top below is an extreme example, but the imbalance is consistent enough that we should investigate. Need to profile the threads to understand how work is being divided up. Is the sort in the top-level thread creating a bottleneck? The truncated-dense method may be less sensitive to infrequent sorting. Is the OpenMP dynamic scheduler creating a bottleneck? Does a vector of rows with non-zero entries in the currently reducing column need to be created to better balance load? This vector may be possible to create while reducing the preceding column.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

137309 kent 20 0 4256260 3.1g 4408 R 99.9 5.0 28:33.65 albert
137314 kent 20 0 4256260 3.1g 4408 S 54.5 5.0 17:35.25 albert
137317 kent 20 0 4256260 3.1g 4408 S 27.3 5.0 17:30.03 albert
137322 kent 20 0 4256260 3.1g 4408 S 27.3 5.0 17:31.70 albert
137323 kent 20 0 4256260 3.1g 4408 S 27.3 5.0 17:35.79 albert
137310 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:35.87 albert
137311 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.10 albert
137312 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:32.59 albert
137313 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:32.85 albert
137315 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.32 albert
137316 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.05 albert
137318 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:35.82 albert
137319 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:33.11 albert
137320 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:34.18 albert
137321 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:32.93 albert
137324 kent 20 0 4256260 3.1g 4408 S 18.2 5.0 17:34.70 albert

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The AVX version often under utilizes multiple cores #23

The AVX version often under utilizes multiple cores #23

kentavv commented Dec 28, 2020

The AVX version often under utilizes multiple cores #23

The AVX version often under utilizes multiple cores #23

Comments

kentavv commented Dec 28, 2020