[Question]How to implement the barrier-rTask in generated code #373

zqj2333 · 2022-01-12T13:57:43Z

I have generated code of six models mentioned in paper(RAMMER,Figure 11) with nnfusion in branch "osdi20_artifact",I toke a look at these generated code and found that it seems that there is no code about how to implement the barrier-rTask mentioned in paper,
such as:
"step array",
"each rTask use its first thread to increase step array",
"barrier-rTask use its first N thread to poll on the corresponding elements in the step array".

So I want to know how the barrier-rTask reflected in the generated code.

Thanks for your response!

xysmlx · 2022-01-13T03:02:28Z

Hi, the block-level barrier-rTask can be enabled by set -fblockfusion_level=2. It is implemented for CUDA and ROCm in here and here. Because there is still a TODO task that automatically detect active thread blocks to avoid deadlock, the block-level barrier-rTask is not enabled by default. Therefore, you may need to pay attention to the active thread blocks manually to avoid deadlock. Current -fblockfusion_level=1 implementation leverages global kernel launch for barriers.

zqj2333 · 2022-01-14T05:11:30Z

Hi,
Thank you for your response! I have understood it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]How to implement the barrier-rTask in generated code #373

[Question]How to implement the barrier-rTask in generated code #373

zqj2333 commented Jan 12, 2022

xysmlx commented Jan 13, 2022

zqj2333 commented Jan 14, 2022

[Question]How to implement the barrier-rTask in generated code #373

[Question]How to implement the barrier-rTask in generated code #373

Comments

zqj2333 commented Jan 12, 2022

xysmlx commented Jan 13, 2022

zqj2333 commented Jan 14, 2022