Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lora和QLora是否支持多卡训练? #2

Open
acadaiaca opened this issue Jul 18, 2023 · 9 comments
Open

Lora和QLora是否支持多卡训练? #2

acadaiaca opened this issue Jul 18, 2023 · 9 comments

Comments

@acadaiaca
Copy link

微调BLOOM,使用Lora或QLora,是否支持多卡训练?

@zejunwang1
Copy link
Owner

使用 torchrun 可以开启多卡训练

@acadaiaca
Copy link
Author

acadaiaca commented Jul 19, 2023

使用 torchrun 可以开启多卡训练

Lora微调bloom,如果不开启int8_training,会报错如下,但是如果设置int8_training为True,则能正常运行
Traceback (most recent call last): File "/opt/projects/LLMTuner/train_lora.py", line 125, in <module> train() File "/opt/projects/LLMTuner/train_lora.py", line 120, in train trainer.train() File "/home/inspur/anaconda3/envs/lib/python3.9/site-packages/transformers/trainer.py", line 1526, in train return inner_training_loop( File "/home/inspur/anaconda3/envs/lib/python3.9/site-packages/transformers/trainer.py", line 1796, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/inspur/anaconda3/envs/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step self.accelerator.backward(loss) File "/home/inspur/anaconda3/envs/lib/python3.9/site-packages/accelerate/accelerator.py", line 1902, in backward loss.backward(**kwargs) File "/home/inspur/anaconda3/envs/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/inspur/anaconda3/envs/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

@zejunwang1
Copy link
Owner

在不开启 int8_training 下,你在训练前加上 model.enable_input_require_grads() 试试

@zejunwang1
Copy link
Owner

能跑起来的话留言告诉我一下

@acadaiaca
Copy link
Author

能跑起来的话留言告诉我一下

加上 model.enable_input_require_grads() 之后,不开启int8 training可以跑了,thx!
另外,都是7B的模型,超参设置也一样,bloom的lora微调比llama的lora微调速度慢很多,这个正常吗?bloom的train_samples_per_second是0.598,而llama的能达到将近3

@acadaiaca
Copy link
Author

还有现在的train_lora.py数据集直接通过DataCollator加载,对于单轮指令对话是不是不适用?我理解算Loss的时候应该只算response的loss,不计算human中问题或指令的Loss

@zejunwang1
Copy link
Owner

单轮对话指令需要处理成如下格式:

{
    "conversation_id": 4, 
    "category": "Brainstorming", 
    "num_turns": 1, 
    "conversation": [
        {
            "human": "如何制定一个有效的网页设计?", 
            "assistant": "\n\n要制定一个有效的网页设计,需要考虑以下几个方面:\n\n1. 目标受众:确定您的目标受众是谁,这将有助于您决定网站的设计和内容。\n\n2. 清晰的布局:网站的布局应该简单,易于使用和导航。所有元素应该有明确的标识和位置,以便用户快速找到他们需要的信息。\n\n3. 吸引人的视觉效果:网站的设计应该吸引人,使用清晰的图像、颜色和字体。但同时也要确保视觉效果不会影响网站的性能和导航。\n\n4. 有效的内容:网站的内容应该有价值,易于理解和使用。使用简单的语言和段落,避免使用过于复杂的术语和语言。\n\n5. 可访问性:网站应该易于访问,即使是对于有视觉或听力障碍的用户。确保网站符合标准的可访问性要求。\n\n6. 响应式设计:网站应该是响应式的,即适合各种不同尺寸的设备。这将确保用户在不同设备上都能够访问和使用您的网站。\n\n通过考虑这些方面,您可以制定一个有效的网页设计,吸引并留住您的受众,并提高您的网站的成功率。"
        }
    ]
}

@zejunwang1
Copy link
Owner

能跑起来的话留言告诉我一下

加上 model.enable_input_require_grads() 之后,不开启int8 training可以跑了,thx! 另外,都是7B的模型,超参设置也一样,bloom的lora微调比llama的lora微调速度慢很多,这个正常吗?bloom的train_samples_per_second是0.598,而llama的能达到将近3

两个训练的参数量差别大吗?

@acadaiaca
Copy link
Author

能跑起来的话留言告诉我一下

加上 model.enable_input_require_grads() 之后,不开启int8 training可以跑了,thx! 另外,都是7B的模型,超参设置也一样,bloom的lora微调比llama的lora微调速度慢很多,这个正常吗?bloom的train_samples_per_second是0.598,而llama的能达到将近3

两个训练的参数量差别大吗?

谢谢,已解决,之前测算有误,bloom和llama微调没有数量级的速度差别

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants