Started a local Ray instance - MochiMultiGPUPipeline never loads the pipeline #84

fbarretto · 2024-11-20T04:58:27Z

I've been trying to run this repo in a 4x H100 environment and the pipeline loading freezes

self.pipeline = MochiMultiGPUPipeline(
                text_encoder_factory = T5ModelFactory(),
                dit_factory = DitModelFactory(
                    model_path = f"{model_path}/dit.safetensors",
                    model_dtype = "bf16"),
                decoder_factory = DecoderModelFactory(
                    model_path = f"{model_path}/decoder.safetensors"),
                world_size = torch.cuda.device_count(),
                )

Attention mode: flash
INFO worker.py:1816 -- Started a local Ray instance. <-- it freezes here

I'm using

cuda 12.4.0
python 3.11
torch 2.4.1

And installing the repo with flash attention
pip install -e .[flash] --no-build-isolation

Any hints?

The text was updated successfully, but these errors were encountered:

Lifedecoder · 2024-11-21T09:54:00Z

use single gpu, debug will pass through the code of pipeline

fbarretto changed the title ~~Started a local Ray instance - never runs~~ Started a local Ray instance - MochiMultiGPUPipeline never loads the pipeline Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Started a local Ray instance - MochiMultiGPUPipeline never loads the pipeline #84

Started a local Ray instance - MochiMultiGPUPipeline never loads the pipeline #84

fbarretto commented Nov 20, 2024 •

edited

Loading

Lifedecoder commented Nov 21, 2024

Started a local Ray instance - MochiMultiGPUPipeline never loads the pipeline #84

Started a local Ray instance - MochiMultiGPUPipeline never loads the pipeline #84

Comments

fbarretto commented Nov 20, 2024 • edited Loading

Lifedecoder commented Nov 21, 2024

fbarretto commented Nov 20, 2024 •

edited

Loading