Generalization experiments #96

dtch1997 · 2024-02-06T09:45:09Z

Adds code to run large-scale steering vector experiments on the Anthropic persona dataset.

Very ugly implementation but it works

chanind · 2024-02-06T14:25:34Z

repepo/experiments_2/download_datasets.py

+        answer_b = data["answer_not_matching_behavior"]
+
+        # Construct A/B formatted question / answer pair
+        new_question = f"{question}\n\nChoices:\n(A): {answer_a}\n(B): {answer_b}"


I think this line needs to be moved below the _maybe_swap line, since currently the new_question doesn't get the swapped a/b answers.

dtch1997 · 2024-02-06T17:49:01Z

@chanind thanks for spotting! Preprocessing is hard 😓 I've added tests describing how I want each preprocessing function to behave. Could you have another look?

chanind · 2024-02-06T18:50:57Z

Nice! I fixed it as well in the PR for translation currently in main and also merged into this PR from main earlier

dtch1997 added 30 commits January 31, 2024 10:38

Add functions to do translation

d69077f

Add TQA translate

37c5e39

Fix key name bug

6f694d5

WIP

774d712

Merge branch 'main' into tqa_translate

79d660c

Add script to generate TQA translated datasets

8d1475e

update expt name and dataset splits

9e19eb6

Add Llama chat formatter

36f2914

Minor fixes in caa_repro

9548a51

Add options to print output, save steering vectors

4891c1e

Set default experiment path by train / test datasets

01594ca

Add functionality to print examples

56b5201

Add script to plot results

2181328

Add title to plotting code

dce925a

Fix pdm lock

0a70e84

Add (very ugly) function to plot multiple results

11306b7

Very ugly implementation but it works

Ignore png files

88104e0

Enable translated system prompt

123407d

Add new experiments dir

f59b090

Add notebook to analyze TQA vectors

f198ab0

Add script to download datasets

e67c49f

Add script to download datasets

d54bdfd

WIP translate

8472eca

Add code to extract and save steering vectors

860c01b

Update experiments

f1b0f9e

Add more dataset names

ab47ac6

Improve dataset inspection

55e2ff7

Modify script to extract all SVs

434924f

Changes to notebooks

7e61c82

Update readme

f1deec4

dtch1997 added 12 commits February 6, 2024 00:00

Inspect dataset s dataframe

64a6b02

Clustering results

0caa0eb

Fix lint errors

d082071

Add script to extract concept vectors

ef1327e

WIP

a7f4b12

Refactoring

2cc603b

Refactoring

fd26ff7

Add script to run all experiments

72ac338

Fix bug with results suffix

69657f1

Uncomment some lines

dc4f791

Update README, bash script

9f0c645

Restore original experiments dir

24050dc

dtch1997 requested a review from chanind February 6, 2024 09:46

dtch1997 added 3 commits February 6, 2024 09:49

Merge branch 'main' into generalization_experiments

9555251

Fix lint

a9204b7

Fix lint

75844a9

chanind reviewed Feb 6, 2024

View reviewed changes

chanind and others added 6 commits February 6, 2024 15:36

Merge branch 'main' into generalization_experiments

373f501

Add more aggregations

dfc1fed

Fix bug in download

6c2b08b

Ignore html files

9e1320f

Add test for data preprocessing

5310167

Add tests for preprocessing

a27667a

chanind added 2 commits February 6, 2024 19:05

fixing black formatting issues

5c25afe

fixing typing

4fde15c

chanind merged commit 6680de7 into main Feb 6, 2024
2 checks passed

chanind deleted the generalization_experiments branch February 6, 2024 19:35

dtch1997 mentioned this pull request Feb 8, 2024

Tqa translate #92

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalization experiments #96

Generalization experiments #96

dtch1997 commented Feb 6, 2024

chanind Feb 6, 2024

dtch1997 commented Feb 6, 2024 •

edited

Loading

chanind commented Feb 6, 2024

Generalization experiments #96

Generalization experiments #96

Conversation

dtch1997 commented Feb 6, 2024

chanind Feb 6, 2024

Choose a reason for hiding this comment

dtch1997 commented Feb 6, 2024 • edited Loading

chanind commented Feb 6, 2024

dtch1997 commented Feb 6, 2024 •

edited

Loading