fix hang in tc microbenchmark #127

elgarten · 2024-07-11T19:00:24Z

added a shared WaitGroup between vertex doAll and the nested, per vertex, edge doAll
Original code would hang because of the separate wait groups: after enqueueing a doAll in edge_tc_couting, harts wait for it to complete (tc_algos.cpp:42). However, this occurs on every hart because of the outer doAll in tc_no_chunk, therefore every hart is waiting and none is available to complete the work being waited on. When using one combined wait group, the outer doAll tasks are able to complete after enqueuing, but before completion of, the inner doAll tasks. Thus, harts are freed to complete the inner doAll and therefore forward progress.

tewaro · 2024-07-15T16:14:38Z

Please explain why this change solves the problem in your PR description.

tewaro · 2024-07-29T15:29:14Z

CI fails because the containers need to be rebuilt.

elgarten · 2024-07-30T15:42:03Z

Superseded by #137.

…#127 + #129) (#137) * Fix hang in TC microbenchmark Added a shared WaitGroup between vertex doAll and the nested, per vertex, edge doAll. Original code would hang because of the separate wait groups: after enqueueing a doAll in edge_tc_couting, harts wait for it to complete (tc_algos.cpp:42). However, this occurs on every hart because of the outer doAll in tc_no_chunk, therefore every hart is waiting and none is available to complete the work being waited on. When using one combined wait group, the outer doAll tasks are able to complete after enqueuing, but before completion of, the inner doAll tasks. Thus, harts are freed to complete the inner doAll and therefore forward progress. * fix overlapping prep timers * add synchronization to drv timers

fix hang in tc microbenchmark

02abcb1

tewaro requested a review from pingle14 July 12, 2024 22:06

elgarten and others added 3 commits July 16, 2024 01:46

add barriers to prep counters logging to prevent overlap

30ba218

Merge remote-tracking branch 'remotes/origin/main' into brenden/timing

f574e27

Merge branch 'main' into brenden/timing

6851b20

elgarten mentioned this pull request Jul 30, 2024

Fix hang in tc microbenchmark + fix overlapping prep timers (supersede #127 + #129) #137

Merged

elgarten closed this Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix hang in tc microbenchmark #127

fix hang in tc microbenchmark #127

elgarten commented Jul 11, 2024 •

edited

Loading

tewaro commented Jul 15, 2024

tewaro commented Jul 29, 2024

elgarten commented Jul 30, 2024

fix hang in tc microbenchmark #127

fix hang in tc microbenchmark #127

Conversation

elgarten commented Jul 11, 2024 • edited Loading

tewaro commented Jul 15, 2024

tewaro commented Jul 29, 2024

elgarten commented Jul 30, 2024

elgarten commented Jul 11, 2024 •

edited

Loading