Stop at "Started a local Ray instance." when using Nvidia Nsight compute tool to capture kernel in multi-GPU case. #116

david-beckham-315 · 2024-12-16T11:58:47Z

A100 8xGPU card，using Nsight Compute tool to capture kernel in mochi inference.

Test condition:
CUDA12.4
python3.10.12
torch2.5.1
ncu 2024.1.1.0 / 2024.3.0.0 (either of them has same issue)
ray 2.40.0

step:

export CUDA_VISIBLE_DEVICES=0,1
ncu --set full -o A100_mochi_2gpu_240P_step2_fps24_frame163_flash_fwd_kernel.ncu-rep -f -s 1 -c 5 --kernel-name "flash_fwd_kernel" --target-processes all python3 ./demos/cli.py --model_dir /data/NousResearch/mochi-1-preview --num_steps 2 --width 422 --height 240

the result show as below:
==PROF== Connected to process 640812 (/usr/bin/nvidia-smi)
==PROF== Disconnected from process 640812
Launching with 2 GPUs. If you want to force single GPU mode use CUDA_VISIBLE_DEVICES=0.
==PROF== Connected to process 640742 (/usr/bin/python3.10)
Attention mode: flash
==PROF== Connected to process 641010 (/usr/bin/nvidia-smi)
==PROF== Disconnected from process 641010
==PROF== Connected to process 641005 (/usr/bin/nvidia-smi)
==PROF== Disconnected from process 641005
==PROF== Connected to process 641131 (/usr/bin/nvidia-smi)
==PROF== Disconnected from process 641131
2024-12-13 10:12:58,746 INFO worker.py:1819 -- Started a local Ray instance.

From the pipeline.py, it stop at ray.int()
class MochiMultiGPUPipeline:
def init(
self,
*,
text_encoder_factory: ModelFactory,
dit_factory: ModelFactory,
decoder_factory: ModelFactory,
world_size: int,
):
ray.init()
RemoteClass = ray.remote(MultiGPUContext)
self.ctxs = [
RemoteClass.options(num_gpus=1).remote(
text_encoder_factory=text_encoder_factory,
dit_factory=dit_factory,
decoder_factory=decoder_factory,
world_size=world_size,
device_id=0,
local_rank=i,
)
for i in range(world_size)
]
for ctx in self.ctxs:
ray.get(ctx.ray_ready.remote())

Do you know whether the mochi support Nsight Compute in multi-GPU case?

Note: Mochi support Nsight Compute in single GPU case.

ved-genmo · 2024-12-17T21:43:57Z

Interesting. Is the NousResearch team working on Mochi? :)

I haven't tested Mochi with Nsight so it's not officially supported. The compatibility issue likely stems from Ray. If you need Nsight integration, you might want to try the diffusers version of Mochi instead - it doesn't use Ray and might work better.

ajayjain added the question Further information is requested label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop at "Started a local Ray instance." when using Nvidia Nsight compute tool to capture kernel in multi-GPU case. #116

Stop at "Started a local Ray instance." when using Nvidia Nsight compute tool to capture kernel in multi-GPU case. #116

david-beckham-315 commented Dec 16, 2024 •

edited

Loading

ved-genmo commented Dec 17, 2024

Stop at "Started a local Ray instance." when using Nvidia Nsight compute tool to capture kernel in multi-GPU case. #116

Stop at "Started a local Ray instance." when using Nvidia Nsight compute tool to capture kernel in multi-GPU case. #116

Comments

david-beckham-315 commented Dec 16, 2024 • edited Loading

ved-genmo commented Dec 17, 2024

david-beckham-315 commented Dec 16, 2024 •

edited

Loading