Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't seem to run GPT-J in CPU mode: "LayerNormKernelImpl" not implemented for 'Half' #16378

Closed
monsieurpooh opened this issue Mar 23, 2022 · 4 comments
Assignees

Comments

@monsieurpooh
Copy link

Environment info

  • transformers version: 4.15.0
  • Platform: Windows-10-10.0.19041-SP0
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.10.2+cu113 (True)
  • Tensorflow version (GPU?): 2.5.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help

@patrickvonplaten @Narsil

Information

Model I am using (KoboldAI/GPT-J-6B-Adventure):

The problem arises when using:

  • A simple script that calls model.generate() after loading it via GPTJForCausalLM.from_pretrained and input_ids = tokenizer(prompt, return_tensors="pt").input_ids without using anything cuda-related.

The tasks I am working on is:

  • run GPT-J in CPU mode for calibration purposes for the game I am making called AI Roguelite (I am willing to wait a long time as this is a calibration preprocessing task rather than a real-time task).

To reproduce

Steps to reproduce the behavior:

  1. Call generate.py for gpt-j in cpu-only mode
  2. Observe the error was "LayerNormKernelImpl" not implemented for 'Half'

Expected behavior

Runs it without that error

@patil-suraj
Copy link
Contributor

Hey @monsieurpooh , this is because the model was saved in fp16 as you can see here https://huggingface.co/KoboldAI/GPT-J-6B-Adventure/blob/main/config.json#L34

You can pass the torch_dtype argument to from_pretrained, to convert it to fp32 for CPU.

model = GPTJForCausalLM.from_pretrained("KoboldAI/GPT-J-6B-Adventure", torch_dtype=torch.float32)

@monsieurpooh
Copy link
Author

Thanks for the quick response; however, I tried your suggestion and it did not work and I got the same error.

Here's the minimal repro code:


import os
import re
import random
from transformers import GPTNeoForCausalLM, GPTJForCausalLM, GPT2Tokenizer
import torch
from pynvml import *
import json
import sys


model = GPTJForCausalLM.from_pretrained("..\\gpt-neo-master\\saved_models_dir\\KoboldAI_GPT-J-6B-Adventure", low_cpu_mem_usage=True, torch_dtype=torch.float32)
tokenizer = GPT2Tokenizer.from_pretrained("..\\gpt-neo-master\\saved_models_dir\\KoboldAI_GPT-J-6B-Adventure", low_cpu_mem_usage=True, torch_dtype=torch.float32)


input_ids = tokenizer("test prompt", return_tensors="pt").input_ids
generated_outputs = model.generate(input_ids)

The output was:


C:\Max\gpt_calibration>python gpt-j-bug.py
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Traceback (most recent call last):
  File "gpt-j-bug.py", line 16, in <module>
    generated_outputs = model.generate(input_ids)
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\generation_utils.py", line 1109, in generate
    return self.greedy_search(
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\generation_utils.py", line 1406, in greedy_search
    outputs = self(
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 786, in forward
    transformer_outputs = self.transformer(
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 640, in forward
    outputs = block(
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 279, in forward
    hidden_states = self.ln_1(hidden_states)
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\normalization.py", line 189, in forward
    return F.layer_norm(
  File "C:\Users\jerkm\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\functional.py", line 2347, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

@monsieurpooh
Copy link
Author

Nevermind. I removed "low_cpu_mem_usage" arg. It seems to be working now. Thanks again.

@tahercoolguy
Copy link

@patil-suraj I have same problem for GPT-Neox model. Any quick treatment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants