Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-2 example is broken? #11034

Closed
1 task
ba305 opened this issue Apr 2, 2021 · 3 comments · Fixed by #11060
Closed
1 task

GPT-2 example is broken? #11034

ba305 opened this issue Apr 2, 2021 · 3 comments · Fixed by #11060

Comments

@ba305
Copy link

ba305 commented Apr 2, 2021

Environment info

  • transformers version: I have had this issue with both 4.3.0 and 4.4.2 (and probably other versions as well)
  • Python version: 3.7.6
  • PyTorch version (GPU?): 1.7.0
  • Using GPU in script?: No, I just tested it on the CPU, but it would probably also happen on the GPU
  • Using distributed or parallel set-up in script?: No

Who can help

Information

Model I am using (Bert, XLNet ...): gpt2

The problem arises when using:

  • [ x] the official example scripts: (give details below)
  • my own modified scripts: (give details below)

To reproduce

Hello, I am trying to run this example here: https://huggingface.co/transformers/task_summary.html#causal-language-modeling. When I run that code, exactly the same as it is on that page, I get strange/very bad results. Even when I change the input text, it still gives weird results (e.g., predicting empty spaces or strange characters). I also asked my coworker to try it on her computer, and she also got strange results.

I am planning to fine-tune GPT-2 for a different purpose later, but was a bit concerned because I couldn't even get this simple example demo to work. Thanks for your help!

Steps to reproduce the behavior:

  1. Just run the exact example code that I linked above
@LysandreJik
Copy link
Member

Hi! Sorry to hear the example doesn't work well for you. To be honest, it doesn't really make sense to try and generate a single token like it is done in that example. I have slightly modified the example so that it generates the 20 following tokens.

Also, I've removed the space at the end of the sequence because I believe it is there by mistake:

from transformers import AutoModelWithLMHead, AutoTokenizer, top_k_top_p_filtering
import torch
from torch.nn import functional as F
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelWithLMHead.from_pretrained("gpt2")
sequence = f"Hugging Face is based in DUMBO, New York City, and"
input_ids = tokenizer.encode(sequence, return_tensors="pt")
# get logits of last hidden state
generated = input_ids
for i in range(20):
    next_token_logits = model(generated).logits[:, -1, :]
    # filter
    filtered_next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
    # sample
    probs = F.softmax(filtered_next_token_logits, dim=-1)
    next_token = torch.multinomial(probs, num_samples=1)
    generated = torch.cat([generated, next_token], dim=-1)

resulting_string = tokenizer.decode(generated.tolist()[0])

print(resulting_string)

Running this gives me the following examples (not cherry-picked):

Hugging Face is based in DUMBO, New York City, and is produced by Eltas & Co., Inc. (a wholly owned subsidiary of Eltas
Hugging Face is based in DUMBO, New York City, and focuses primarily on the music and entertainment industry, and is funded by the Hudson River Chamber of Commerce.
Hugging Face is based in DUMBO, New York City, and has aired in dozens of local, national and foreign programs, including The Brady Bunch, The Colbert

@ba305
Copy link
Author

ba305 commented Apr 2, 2021

Thanks a lot for your help Lysandre!

Removing the space at the end of the example sequence solves the issue. Now I am getting normal results. It would be great if you could update the website since I imagine other people will run into the same issue at some point!

Also, thanks for adding the code to generate 20 tokens. That is helpful as well, although I believe the main problem was the space at the end of the input sequence.

Thanks again for your prompt reply. Feel free to close the issue whenever you want

@LysandreJik
Copy link
Member

Great, nice to hear this fixes the issue! I've updated the docs on the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants