Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG FIX] place multi-processing init to main method #443

Merged
merged 1 commit into from
Feb 10, 2023

Conversation

lanking520
Copy link
Contributor

If spin a process and run with HuggingFace download at the same time, the error is appearing:

[WARN ] PyProcess - Downloading (?)"pytorch_model.bin";: 100%|??????????| 548M/548M [00:01<00:00, 358MB/s]
[WARN ] PyProcess - [1,0]<stderr>:Traceback (most recent call last):
[WARN ] PyProcess - [1,0]<stderr>:  File "/usr/local/lib/python3.9/dist-packages/fastertransformer/examples/gpt/huggingface_gpt_convert.py", line 203, in <module>
[WARN ] PyProcess - [1,0]<stderr>:    split_and_convert(args)
[WARN ] PyProcess - [1,0]<stderr>:  File "/usr/local/lib/python3.9/dist-packages/fastertransformer/examples/gpt/huggingface_gpt_convert.py", line 161, in split_and_convert
[WARN ] PyProcess - [1,0]<stderr>:    torch.multiprocessing.set_start_method("spawn")
[WARN ] PyProcess - [1,0]<stderr>:  File "/usr/lib/python3.9/multiprocessing/context.py", line 243, in set_start_method
[WARN ] PyProcess - [1,0]<stderr>:    raise RuntimeError('context has already been set')
[WARN ] PyProcess - [1,0]<stderr>:RuntimeError: context has already been set
[INFO ] PyProcess - [1,0]<stdout>:Failed invoke service.invoke_handler()
[INFO ] PyProcess - [1,0]<stdout>:  File "/usr/local/lib/python3.9/dist-packages/fastertransformer/utils/common_utils.py", line 20, in execute_command
[INFO ] PyProcess - [1,0]<stdout>:    subprocess.check_call(command, shell=True)
[INFO ] PyProcess - [1,0]<stdout>:  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
[INFO ] PyProcess - [1,0]<stdout>:    raise CalledProcessError(retcode, cmd)
[INFO ] PyProcess - [1,0]<stdout>:subprocess.CalledProcessError: Command 'python /usr/local/lib/python3.9/dist-packages/fastertransformer/examples/gpt/huggingface_gpt_convert.py -i gpt2 -o /opt/ml/model/test/ft_gpt_model/ -i_g 1 -weight_data_type fp32' returned non-zero exit status 1.

This is a fix meant to solve the huggingface download and pool spinning

@byshiue
Copy link
Collaborator

byshiue commented Feb 9, 2023

Can you explain how to reproduce this error?

@lanking520
Copy link
Contributor Author

Run the last line in the error message

@lanking520
Copy link
Contributor Author

huggingface_gpt_convert.py -i gpt2 -o /opt/ml/model/test/ft_gpt_model/ -i_g 1 -weight_data_type fp32

@byshiue
Copy link
Collaborator

byshiue commented Feb 10, 2023

Thank you. We have verified this issue and your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants