Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPT2 sentiment notebook reward #1738

Merged
merged 7 commits into from
Aug 6, 2024
Merged

Conversation

cemiu
Copy link
Contributor

@cemiu cemiu commented Jun 14, 2024

I tried reproducing the notebook, but the model's performance barely improved, and after a bit of digging I found the issue.

The sentiment pipeline used to produce output in the order: [NEGATIVE, POSITIVE], whereas now the higher confidence class always comes first:

# before
[[{'label': 'NEGATIVE', 'score': -2.2947897911071777},
  {'label': 'POSITIVE', 'score': 2.557039737701416}]]

# now
[[{'label': 'POSITIVE', 'score': 2.557039737701416},
  {'label': 'NEGATIVE', 'score': -2.2947897911071777}]]

It used to select positive sentiment by index, but is now practically random. I've adjusted the training loop and eval functions to select positive sentiment again.

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Jun 18, 2024

At some point, I dug into this as well, realizing that lower scores do not necessarily mean negative, and higher scores do not necessarily mean positive. Could you prepare for a snippet and run for a few more examples?

@cemiu
Copy link
Contributor Author

cemiu commented Jun 19, 2024

Here's a snippet for running the output data in the notebook through the pipeline:

df = pd.read_csv(io.StringIO('''
query,response (after),rewards_src
"Oh dear,",I must say that I are hanging my head on this,-1.007609
I've seen,"three million dialogue throughout, and",2.240883
Hi:<br /><br,/>I also like that movie. It's so funny,2.415630
I'm a writer,", not a screenwriter. I've written",-0.724324
If you,"are looking at the cinematography, the acting,",0.148751
OMG this,"movie was totally wonderful, I it was the ide...",2.590190
'''), sep=",")
texts = [q + r for q, r in zip(df["query"], df["response (after)"])]

sent_kwargs = {"top_k": None, "function_to_apply": "none", "batch_size": 16}
sentiment_pipe = pipeline("sentiment-analysis", model="lvwerra/distilbert-imdb")

old_method = [output[1]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

pipe_outputs = sentiment_pipe(texts, **sent_kwargs)
new_method = [item["score"] for output in pipe_outputs for item in output if item["label"] == "POSITIVE"]

pd.DataFrame({'source_reward': df['rewards_src'], 'old_reward': old_method, 'new_reward': new_method})
source_reward old_reward new_reward
-1.007609 -1.007610 -1.007610
2.240883 -2.244383 2.473364
2.415630 -2.177052 2.415630
-0.724324 -0.724324 -0.724324
0.148751 -0.362388 0.245908
2.590190 -2.230330 2.476046

source_reward (the one pre-computed in the notebook) matches the method I committed but not the one currently in the notebook. And from my experimentation the sent pipeline seems to give expected results.

You can see that the existing method just selects the wrong class sometimes, by removing the ["score"]

@cemiu
Copy link
Contributor Author

cemiu commented Jun 19, 2024

I still have the results from full training runs I did on both versions of the notebook:

Using the existing notebook (both training and eval metric were wrong):
before

And the new one:

mean:
rewards (before)    0.239116
rewards (after)     2.475334
dtype: float64
median:
rewards (before)    0.422371
rewards (after)     2.701585
dtype: float64

I don't have the checkpoints anymore, but the jump in positive sentiment is in line with what would be expected (and roughly matches the results saved in the notebook)

@cemiu
Copy link
Contributor Author

cemiu commented Jun 24, 2024

@vwxyzjn I refreshed the output table using the new results I got; I think they show the delta between the models pretty well. Also, here are the logs for the training run: https://wandb.ai/cemiu-team/trl-gpt2-sentiment/runs/01ch6t6i

FYI, the wandb.ai link in the notebook is not publicly accessible. Someone with admin access should change the permissions.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@cemiu
Copy link
Contributor Author

cemiu commented Jul 19, 2024

@vwxyzjn have you had a change to look at this since?

@qgallouedec
Copy link
Member

Thanks for the PR! The wandb link has been updated.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit fb0b9ed into huggingface:main Aug 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants