Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix hang in tc microbenchmark #127

Closed
wants to merge 4 commits into from
Closed

fix hang in tc microbenchmark #127

wants to merge 4 commits into from

Conversation

elgarten
Copy link
Contributor

@elgarten elgarten commented Jul 11, 2024

  • added a shared WaitGroup between vertex doAll and the nested, per vertex, edge doAll
    Original code would hang because of the separate wait groups: after enqueueing a doAll in edge_tc_couting, harts wait for it to complete (tc_algos.cpp:42). However, this occurs on every hart because of the outer doAll in tc_no_chunk, therefore every hart is waiting and none is available to complete the work being waited on. When using one combined wait group, the outer doAll tasks are able to complete after enqueuing, but before completion of, the inner doAll tasks. Thus, harts are freed to complete the inner doAll and therefore forward progress.

@tewaro tewaro requested a review from pingle14 July 12, 2024 22:06
@tewaro
Copy link
Contributor

tewaro commented Jul 15, 2024

Please explain why this change solves the problem in your PR description.

@tewaro
Copy link
Contributor

tewaro commented Jul 29, 2024

CI fails because the containers need to be rebuilt.

@elgarten
Copy link
Contributor Author

Superseded by #137.

@elgarten elgarten closed this Jul 30, 2024
elgarten added a commit that referenced this pull request Aug 12, 2024
…#127 + #129) (#137)

* Fix hang in TC microbenchmark

Added a shared WaitGroup between vertex doAll and the nested, per vertex,
edge doAll. Original code would hang because of the separate wait groups:
after enqueueing a doAll in edge_tc_couting, harts wait for it to complete
(tc_algos.cpp:42). However, this occurs on every hart because of the outer
doAll in tc_no_chunk, therefore every hart is waiting and none is available
to complete the work being waited on. When using one combined wait group,
the outer doAll tasks are able to complete after enqueuing, but before
completion of, the inner doAll tasks. Thus, harts are freed to complete the
inner doAll and therefore forward progress.

* fix overlapping prep timers

* add synchronization to drv timers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants