Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalization experiments #96

Merged
merged 58 commits into from
Feb 6, 2024
Merged

Generalization experiments #96

merged 58 commits into from
Feb 6, 2024

Conversation

dtch1997
Copy link
Owner

@dtch1997 dtch1997 commented Feb 6, 2024

Adds code to run large-scale steering vector experiments on the Anthropic persona dataset.

@dtch1997 dtch1997 requested a review from chanind February 6, 2024 09:46
answer_b = data["answer_not_matching_behavior"]

# Construct A/B formatted question / answer pair
new_question = f"{question}\n\nChoices:\n(A): {answer_a}\n(B): {answer_b}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line needs to be moved below the _maybe_swap line, since currently the new_question doesn't get the swapped a/b answers.

@dtch1997
Copy link
Owner Author

dtch1997 commented Feb 6, 2024

@chanind thanks for spotting! Preprocessing is hard 😓 I've added tests describing how I want each preprocessing function to behave. Could you have another look?

@chanind
Copy link
Collaborator

chanind commented Feb 6, 2024

Nice! I fixed it as well in the PR for translation currently in main and also merged into this PR from main earlier

@chanind chanind merged commit 6680de7 into main Feb 6, 2024
2 checks passed
@chanind chanind deleted the generalization_experiments branch February 6, 2024 19:35
@dtch1997 dtch1997 mentioned this pull request Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants