How can I debug your code inference_magicdrive.py with pycharm? #15

zhujiagang · 2024-12-19T09:56:42Z

Thanks for sharing your excellent work.
I usually debug code use pycharm. By copying your inference_magicdrive.py into MagicDriveDiT/, I want to run your code using a gpu. But I encounter the following error:

codes/MagicDriveDiT_code/MagicDriveDiT/magicdrivedit/acceleration/parallel_states.py", line 13, in get_data_parallel_group
    raise RuntimeError("data_parallel_group is None")
RuntimeError: data_parallel_group is None

It seems the code didn't enter into this line
if is_distributed():

        dist.init_process_group(backend="nccl", timeout=timedelta(hours=1))
        torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())
        coordinator = DistCoordinator()
        cfg.sp_size = dist.get_world_size()
        if cfg.sp_size > 1:
            DP_AXIS, SP_AXIS = 0, 1
            dp_size = dist.get_world_size() // cfg.sp_size
            pg_mesh = ProcessGroupMesh(dp_size, cfg.sp_size)
            dp_group = pg_mesh.get_group_along_axis(DP_AXIS)
            sp_group = pg_mesh.get_group_along_axis(SP_AXIS)
            set_sequence_parallel_group(sp_group)
            print(f"Using sp_size={cfg.sp_size}")
        else:
            # TODO: sequence_parallel_group unset!
            dp_group = dist.group.WORLD
        set_data_parallel_group(dp_group)
        enable_sequence_parallelism = cfg.sp_size > 1
    else:
        # dist.init_process_group(backend="nccl", timeout=timedelta(hours=1))
        # torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())
        # coordinator = DistCoordinator()
        cfg.sp_size = 1
        coordinator = FakeCoordinator()
        enable_sequence_parallelism = False
    set_random_seed(seed=cfg.get("seed", 1024))

Looking forward to your reply. Thanks a lot.

The text was updated successfully, but these errors were encountered:

flymin · 2024-12-20T02:10:07Z

You should launch the program with torchrun, which will set the env params used by is_distributed. I did not know whether it also works with manually set env params, but I think it is ok.

~~Another workaround is to delete the error caused by:~~ Sorry, this may not work since our dataloader relies on this.

codes/MagicDriveDiT_code/MagicDriveDiT/magicdrivedit/acceleration/parallel_states.py", line 13, in get_data_parallel_group
    raise RuntimeError("data_parallel_group is None")
RuntimeError: data_parallel_group is None

This is a fail-safe design for different parallel groups. However, if you only have one process, it should always be safe to use the default process group.

hard-coded launch from localhost:12355 when not provided Ref #15

flymin · 2024-12-20T08:39:04Z

Please try the above pr, which should support launching by python command.

hard-coded launch from localhost:12355 when not provided Ref #15

flymin added a commit that referenced this issue Dec 20, 2024

refactor: support launch with python cmd

04160ec

hard-coded launch from localhost:12355 when not provided Ref #15

flymin mentioned this issue Dec 20, 2024

refactor: support launch with python cmd #16

Merged

flymin added a commit that referenced this issue Dec 20, 2024

refactor: support launch with python cmd

8279ff6

hard-coded launch from localhost:12355 when not provided Ref #15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I debug your code inference_magicdrive.py with pycharm? #15

How can I debug your code inference_magicdrive.py with pycharm? #15

zhujiagang commented Dec 19, 2024 •

edited

Loading

flymin commented Dec 20, 2024 •

edited

Loading

flymin commented Dec 20, 2024

How can I debug your code inference_magicdrive.py with pycharm? #15

How can I debug your code inference_magicdrive.py with pycharm? #15

Comments

zhujiagang commented Dec 19, 2024 • edited Loading

flymin commented Dec 20, 2024 • edited Loading

flymin commented Dec 20, 2024

zhujiagang commented Dec 19, 2024 •

edited

Loading

flymin commented Dec 20, 2024 •

edited

Loading