We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepspeed 0.15.1 torch 2.4.1
I can not run the original code successfully with the following command. It seems the tensor dimensions mistach for rope.
accelerate launch --config_file accelerate_configs/single_node.yaml train.py --batch-size 1 --gradient-accumulate-every 1 --output-dir ./output/7B_0.256M_bs_1M_rope_10M_step_500_lr_2e-5 --seed 2023 --max-train-steps 500 --learning-rate 2e-5 --model /cfs/models/llama-7b-hf --dataset /cfs/datasets/slimpajama-per-source-length-upsample --seq-length 256000 --rope-theta 10000000 --parallel_mode usp_attn
use_cache=Trueis incompatible with gradient checkpointing. Settinguse_cache=False`. [rank5]: Traceback (most recent call last): [rank5]: File "/cfs/fjr2/EasyContextv2/train.py", line 212, in [rank5]: main(args.parse_args()) [rank5]: File "/cfs/fjr2/EasyContextv2/train.py", line 132, in main [rank5]: logits = model( [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl [rank5]: return forward_call(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1899, in forward [rank5]: loss = self.module(*inputs, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward [rank5]: outputs = self.model( [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1005, in forward [rank5]: layer_outputs = self._gradient_checkpointing_func( [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/_compile.py", line 31, in inner [rank5]: return disable_fn(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn [rank5]: return fn(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 481, in checkpoint [rank5]: return CheckpointFunction.apply(function, preserve, *args) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply [rank5]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 255, in forward [rank5]: outputs = run_function(*args) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/cfs/fjr2/EasyContextv2/easy_context/zigzag_ring_attn/monkey_patch.py", line 69, in new_decoder_forward [rank5]: hidden_states, self_attn_weights, present_key_value = self.self_attn( [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 447, in forward [rank5]: query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) [rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 206, in apply_rotary_pos_emb [rank5]: q_embed = (q * cos) + (rotate_half(q) * sin) [rank5]: RuntimeError: The size of tensor a (16384) must match the size of tensor b (32000) at non-singleton dimension 2 [rank7]: Traceback (most recent call last):
is incompatible with gradient checkpointing. Setting
The text was updated successfully, but these errors were encountered:
No branches or pull requests
deepspeed 0.15.1
torch 2.4.1
I can not run the original code successfully with the following command. It seems the tensor dimensions mistach for rope.
accelerate launch
--config_file accelerate_configs/single_node.yaml
train.py
--batch-size 1
--gradient-accumulate-every 1
--output-dir ./output/7B_0.256M_bs_1M_rope_10M_step_500_lr_2e-5
--seed 2023
--max-train-steps 500
--learning-rate 2e-5
--model /cfs/models/llama-7b-hf
--dataset /cfs/datasets/slimpajama-per-source-length-upsample
--seq-length 256000
--rope-theta 10000000
--parallel_mode usp_attn
use_cache=True
is incompatible with gradient checkpointing. Setting
use_cache=False`.[rank5]: Traceback (most recent call last):
[rank5]: File "/cfs/fjr2/EasyContextv2/train.py", line 212, in
[rank5]: main(args.parse_args())
[rank5]: File "/cfs/fjr2/EasyContextv2/train.py", line 132, in main
[rank5]: logits = model(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]: return forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank5]: ret_val = func(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank5]: loss = self.module(*inputs, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
[rank5]: outputs = self.model(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1005, in forward
[rank5]: layer_outputs = self._gradient_checkpointing_func(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/_compile.py", line 31, in inner
[rank5]: return disable_fn(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank5]: return fn(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 481, in checkpoint
[rank5]: return CheckpointFunction.apply(function, preserve, *args)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
[rank5]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 255, in forward
[rank5]: outputs = run_function(*args)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/cfs/fjr2/EasyContextv2/easy_context/zigzag_ring_attn/monkey_patch.py", line 69, in new_decoder_forward
[rank5]: hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 447, in forward
[rank5]: query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 206, in apply_rotary_pos_emb
[rank5]: q_embed = (q * cos) + (rotate_half(q) * sin)
[rank5]: RuntimeError: The size of tensor a (16384) must match the size of tensor b (32000) at non-singleton dimension 2
[rank7]: Traceback (most recent call last):
The text was updated successfully, but these errors were encountered: