Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP full state dict mangles fsspec path #20406

Open
oceanusxiv opened this issue Nov 8, 2024 · 1 comment
Open

FSDP full state dict mangles fsspec path #20406

oceanusxiv opened this issue Nov 8, 2024 · 1 comment
Labels
bug Something isn't working ver: 2.4.x ver: 2.5.x

Comments

@oceanusxiv
Copy link

Bug description

In FSDPStrategy.save_checkpoint, the filepath variable is transformed via

path = Path(self.broadcast(filepath))

This only makes sense if doing sharded checkpointing, and in fact mangles any legitimate fsspec path that is passed in.

When self._state_dict_type == "full",

super().save_checkpoint(checkpoint=checkpoint, filepath=path)

is called, using the normal CheckpointIO workflow, but with the mangled path.

The expected behavior should be that if the user chooses full state dict type, CheckpointIO and remote paths should work as usual, but currently full state dict checkpoints cannot be saved to remote paths.

What version are you seeing the problem on?

v2.4

How to reproduce the bug

trainer = L.Trainer(
        strategy="fsdp"
        default_root_dir="s3://example/path"
    )

trainer.fit(model=...)

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.4.0):
#- PyTorch Version (e.g., 2.4):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

@oceanusxiv oceanusxiv added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Nov 8, 2024
@lantiga lantiga added ver: 2.5.x and removed needs triage Waiting to be triaged by maintainers labels Nov 18, 2024
@lantiga
Copy link
Collaborator

lantiga commented Nov 18, 2024

Thank you @oceanusxiv if you could send a complete repro it would speed up the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ver: 2.4.x ver: 2.5.x
Projects
None yet
Development

No branches or pull requests

2 participants