FusedRMSNormAffineMixedDtypesFunction is not importable in the PyTorch build without distributed support #1853

IvanYashchuk · 2024-10-28T10:02:06Z

Describe the Bug
from apex.transformer import pipeline_parallel is not guarded by the torch.distributed.is_available() function resulting in import problems for non-distributed specific parts of Apex.

Minimal Steps/Code to Reproduce the Bug

Modify is_available() function to return False in torch/distributed/init.py
Modify is_available() function to return False in torch/distributed/rpc/init.py
Run

from apex.normalization.fused_layer_norm import FusedRMSNormAffineMixedDtypesFunction

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/apex/__init__.py", line 27, in <module>
    from . import transformer
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/__init__.py", line 4, in <module>
    from apex.transformer import pipeline_parallel
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/__init__.py", line 1, in <module>
    from apex.transformer.pipeline_parallel.schedules import get_forward_backward_func
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/schedules/__init__.py", line 3, in <module>
    from apex.transformer.pipeline_parallel.schedules.fwd_bwd_no_pipelining import (
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/schedules/fwd_bwd_no_pipelining.py", line 10, in <module>
    from apex.transformer.pipeline_parallel.schedules.common import Batch
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/schedules/common.py", line 9, in <module>
    from apex.transformer.pipeline_parallel.p2p_communication import FutureTensor
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/p2p_communication.py", line 25, in <module>
    from apex.transformer.utils import split_tensor_into_1d_equal_chunks
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/utils.py", line 11, in <module>
    torch.distributed.all_gather_into_tensor = torch.distributed._all_gather_base
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

Expected Behavior

No import errors.

The text was updated successfully, but these errors were encountered:

IvanYashchuk added the bug Something isn't working label Oct 28, 2024

IvanYashchuk mentioned this issue Oct 29, 2024

Skip importing Apex if torch distributed is not available Lightning-AI/lightning-thunder#1359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FusedRMSNormAffineMixedDtypesFunction is not importable in the PyTorch build without distributed support #1853

FusedRMSNormAffineMixedDtypesFunction is not importable in the PyTorch build without distributed support #1853

IvanYashchuk commented Oct 28, 2024 •

edited

Loading

FusedRMSNormAffineMixedDtypesFunction is not importable in the PyTorch build without distributed support #1853

FusedRMSNormAffineMixedDtypesFunction is not importable in the PyTorch build without distributed support #1853

Comments

IvanYashchuk commented Oct 28, 2024 • edited Loading

IvanYashchuk commented Oct 28, 2024 •

edited

Loading