huggingface / accelerate Public

Notifications You must be signed in to change notification settings
Fork 966
Star 7.9k

Code
Issues 105
Pull requests 26
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: huggingface/accelerate

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

105 Open 1,522 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer

#3230 opened Nov 8, 2024 by Isdriai

2 of 4 tasks

torch.cuda.is_available() false when running multi-gpu inference with accelerate launch

#3225 opened Nov 6, 2024 by paulgekeler

2 of 4 tasks

"mat2 must be a matrix" error when finetuning Dreambooth flux with FSDP

#3224 opened Nov 5, 2024 by weixiong-ur

2 of 4 tasks

Add case-insensitive parsing of bool environment variables

#3222 opened Nov 5, 2024 by wizeng23

Incorrect type in output of utils.pad_across_processes when input is torch.bool

#3218 opened Nov 4, 2024 by mariusarvinte

2 of 4 tasks

PyPI published Accelerate==1.1.0 is missing Source Distributions

#3216 opened Nov 4, 2024 by helloworld1

4 tasks

ConnectionError: Tried to launch distributed communication on port 29401, but another process is utilizing it. Please specify a different port (such as using the --main_process_port flag or specifying a different main_process_port in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to 0.

#3214 opened Nov 4, 2024 by qinchangchang

1 of 4 tasks

How could I convert ZeRO-0 deepspeed weights into fp32 model checkpoint?

#3210 opened Nov 1, 2024 by liming-ai

The optimizer is not receiving the FSDP model parameters.

#3209 opened Nov 1, 2024 by eljandoubi

2 of 4 tasks

Multiple node inference

#3208 opened Nov 1, 2024 by DLCM-wrz

Multinode, multigpu example fails

#3206 opened Oct 31, 2024 by ffrancesco94

2 of 4 tasks

Command line arguments related to deepspeed for accelerate launch do not override those of default_config.yaml

#3203 opened Oct 29, 2024 by JdbermeoUZH

2 of 4 tasks

Problem with metrics calculation and dataloader

#3202 opened Oct 28, 2024 by gssriram

2 of 4 tasks

Cuda OOM when accelerator.prepare

#3200 opened Oct 25, 2024 by antoinedelplace

2 of 4 tasks

using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half.

#3197 opened Oct 25, 2024 by PMPBinZhang

2 of 4 tasks

Possible issue in Accelerate FSDP Documentation

#3195 opened Oct 24, 2024 by Quicksilver466

Unable to access model gradients with DeepSpeed and Accelerate

#3184 opened Oct 22, 2024 by shouyezhe

2 of 4 tasks

accelerator.prepare() get OOM,but available in single GPU

#3182 opened Oct 21, 2024 by lqf0624

2 of 4 tasks

[Bug] The clip_grad_norm of xla fsdp is not right

#3180 opened Oct 21, 2024 by hanwen-sun

4 tasks

Why don't ML engineers use shampoo ?🧴

#3178 opened Oct 18, 2024 by G-structure

Split_batches argument in Accelerator.__init__ is available, but not used

#3177 opened Oct 17, 2024 by yaraksen

MPI on CPU-only: "no support for _allgather_base"

#3176 opened Oct 17, 2024 by tikhu

2 of 4 tasks

Support --standalone for concurrent single node multi-GPU jobs

#3175 opened Oct 16, 2024 by Olive-Z

Can I load model once or dataset once and copy to subprocess?

#3172 opened Oct 16, 2024 by Hans-digit

[Bug] The Transformer Engine plugin seems to be incompatible with LayerNorm that has no weights.

#3171 opened Oct 16, 2024 by IDKiro

2 of 4 tasks

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly