Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FusedRMSNormAffineMixedDtypesFunction is not importable in the PyTorch build without distributed support #1853

Open
IvanYashchuk opened this issue Oct 28, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@IvanYashchuk
Copy link
Contributor

IvanYashchuk commented Oct 28, 2024

Describe the Bug
from apex.transformer import pipeline_parallel is not guarded by the torch.distributed.is_available() function resulting in import problems for non-distributed specific parts of Apex.

Minimal Steps/Code to Reproduce the Bug

  1. Modify is_available() function to return False in torch/distributed/init.py
  2. Modify is_available() function to return False in torch/distributed/rpc/init.py
  3. Run
from apex.normalization.fused_layer_norm import FusedRMSNormAffineMixedDtypesFunction

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/apex/__init__.py", line 27, in <module>
    from . import transformer
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/__init__.py", line 4, in <module>
    from apex.transformer import pipeline_parallel
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/__init__.py", line 1, in <module>
    from apex.transformer.pipeline_parallel.schedules import get_forward_backward_func
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/schedules/__init__.py", line 3, in <module>
    from apex.transformer.pipeline_parallel.schedules.fwd_bwd_no_pipelining import (
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/schedules/fwd_bwd_no_pipelining.py", line 10, in <module>
    from apex.transformer.pipeline_parallel.schedules.common import Batch
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/schedules/common.py", line 9, in <module>
    from apex.transformer.pipeline_parallel.p2p_communication import FutureTensor
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/pipeline_parallel/p2p_communication.py", line 25, in <module>
    from apex.transformer.utils import split_tensor_into_1d_equal_chunks
  File "/usr/local/lib/python3.12/dist-packages/apex/transformer/utils.py", line 11, in <module>
    torch.distributed.all_gather_into_tensor = torch.distributed._all_gather_base
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

Expected Behavior

No import errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant