don't support both options fsdp and gradient_checkpointing . why? #52

yanmj0601 · 2024-11-15T07:56:46Z

assert not (train_args.fsdp and train_args.gradient_checkpointing), "currently, we don't support both options. open an issue for details."

why??

tianweiy · 2024-11-25T05:28:35Z

Using pure PyTorch's FSDP would offer better compatibility with gradient checkpointing and potentially much faster training, especially with newer PyTorch versions (unfortunately, we're currently stuck on PyTorch 2.0.1 due to Accelerate).

I plan to implement an Accelerate-free version in the future. However, I'm currently occupied with interviews, so I don't have a clear timeline for this yet :(

yanmj0601 · 2024-11-25T07:34:12Z

ok, thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't support both options fsdp and gradient_checkpointing . why? #52

don't support both options fsdp and gradient_checkpointing . why? #52

yanmj0601 commented Nov 15, 2024

tianweiy commented Nov 25, 2024

yanmj0601 commented Nov 25, 2024

don't support both options fsdp and gradient_checkpointing . why? #52

don't support both options fsdp and gradient_checkpointing . why? #52

Comments

yanmj0601 commented Nov 15, 2024

assert not (train_args.fsdp and train_args.gradient_checkpointing), "currently, we don't support both options. open an issue for details."

tianweiy commented Nov 25, 2024

yanmj0601 commented Nov 25, 2024