Fix Japanese Hugging face GPT conversion #460

noppayut · 2023-02-20T05:55:00Z

Fixes #459

Define missing variables, correct variable names.
Use GPT2LMHeadModel and state_dict() instead of GPT2Model and named_parameters() to collect model weights.
Dynamically select torch device instead of the hardocded GPU.
Add tensor_para_size to config.
Add docstring to recommend usage of huggingface_gpt_model.py instead of this file.

Signed-off-by: noppayut <noppayut@hotmail.com>

noppayut added 4 commits February 20, 2023 12:30

Fix: define missing variables

631302f

Signed-off-by: noppayut <noppayut@hotmail.com>

Use GPT2LMHeadModel and state_dict() for retriving named params

dc28542

Signed-off-by: noppayut <noppayut@hotmail.com>

Add tensor param size to gpt config

6dbc7d6

Dynamically select device instead of using GPU

9d1c62a

Signed-off-by: noppayut <noppayut@hotmail.com>

byshiue merged commit 43ea4f3 into NVIDIA:main Feb 21, 2023

noppayut deleted the fix_hf_gpt_conversion branch February 21, 2023 01:17

Provide feedback