Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not run the example script succesully. #51

Open
feifeibear opened this issue Sep 14, 2024 · 0 comments
Open

Can not run the example script succesully. #51

feifeibear opened this issue Sep 14, 2024 · 0 comments

Comments

@feifeibear
Copy link
Contributor

deepspeed 0.15.1
torch 2.4.1

I can not run the original code successfully with the following command. It seems the tensor dimensions mistach for rope.

accelerate launch
--config_file accelerate_configs/single_node.yaml
train.py
--batch-size 1
--gradient-accumulate-every 1
--output-dir ./output/7B_0.256M_bs_1M_rope_10M_step_500_lr_2e-5
--seed 2023
--max-train-steps 500
--learning-rate 2e-5
--model /cfs/models/llama-7b-hf
--dataset /cfs/datasets/slimpajama-per-source-length-upsample
--seq-length 256000
--rope-theta 10000000
--parallel_mode usp_attn

use_cache=Trueis incompatible with gradient checkpointing. Settinguse_cache=False`.
[rank5]: Traceback (most recent call last):
[rank5]: File "/cfs/fjr2/EasyContextv2/train.py", line 212, in
[rank5]: main(args.parse_args())
[rank5]: File "/cfs/fjr2/EasyContextv2/train.py", line 132, in main
[rank5]: logits = model(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank5]: return forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank5]: ret_val = func(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank5]: loss = self.module(*inputs, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
[rank5]: outputs = self.model(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1005, in forward
[rank5]: layer_outputs = self._gradient_checkpointing_func(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/_compile.py", line 31, in inner
[rank5]: return disable_fn(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
[rank5]: return fn(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 481, in checkpoint
[rank5]: return CheckpointFunction.apply(function, preserve, *args)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
[rank5]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 255, in forward
[rank5]: outputs = run_function(*args)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/cfs/fjr2/EasyContextv2/easy_context/zigzag_ring_attn/monkey_patch.py", line 69, in new_decoder_forward
[rank5]: hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank5]: return self._call_impl(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
[rank5]: result = forward_call(*args, **kwargs)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 447, in forward
[rank5]: query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
[rank5]: File "/home/pjz/miniconda3/envs/easycontext/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 206, in apply_rotary_pos_emb
[rank5]: q_embed = (q * cos) + (rotate_half(q) * sin)
[rank5]: RuntimeError: The size of tensor a (16384) must match the size of tensor b (32000) at non-singleton dimension 2
[rank7]: Traceback (most recent call last):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant