v0.0.0 (2024-03-04)

Chore

chore: add semantic release to ci (56be43f)

Unknown

Delete dead code (a1da6de)
translate the main persona datasets from @dtch1997's work (#108) (b0b13a2)
Translation experiments (#99)
adding helpers for generating and parsing language from dataset filename
adding compare_dataset_translations experiment
adding experiment helpers
tweaking pirate translation strings
adding translations for non-persona mwes
fixing up make mwe helper
adding a 'ctx' pseudo-style
Revert "adding a 'ctx' pseudo-style"

This reverts commit a0058c4.

refactoring to allow using arbitrary dataset variations, insead of the hacky pseudo-language stuff
fixing using existing results in cross-steering
adding helpers to calculate jenson-shannon and KL for bernoulli distributions
using js dist for steering deltas
adding more tests (46927b0)
adding translated persona MWE variants (#103)
adding translated persona MWE variants by pre-pending the generation ctx to each example
formatting translated_strings (1a42e96)
adding google translate and re-translating persona datasets (#102)
adding google translate and re-translating persona datasets
fixing linting
removing unused test (312b4ab)
standardizing dataset naming around language (#100) (8fda2f9)
Generalization experiments (#96)
Add functions to do translation
Add TQA translate
Fix key name bug
WIP
Add script to generate TQA translated datasets
update expt name and dataset splits
Add Llama chat formatter
Minor fixes in caa_repro
Add options to print output, save steering vectors
Set default experiment path by train / test datasets
Add functionality to print examples
Add script to plot results
Add title to plotting code
Fix pdm lock
Add (very ugly) function to plot multiple results

Very ugly implementation but it works

Ignore png files
Enable translated system prompt
Add new experiments dir
Add notebook to analyze TQA vectors
Add script to download datasets
Add script to download datasets
WIP translate
Add code to extract and save steering vectors
Update experiments
Add more dataset names
Improve dataset inspection
Modify script to extract all SVs
Changes to notebooks
Update readme
WIP
Fix download datasets
Enable 4-bit loading
WIP
Visualize pairwise cos similarities
Inspect dataset s dataframe
Clustering results
Fix lint errors
Add script to extract concept vectors
WIP
Refactoring
Refactoring
Add script to run all experiments
Fix bug with results suffix
Uncomment some lines
Update README, bash script
Restore original experiments dir
Fix lint
Fix lint
Add more aggregations
Fix bug in download
Ignore html files
Add test for data preprocessing
Add tests for preprocessing
fixing black formatting issues
fixing typing

Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (6680de7)

Translate mwe and sycophancy (#97)
importing raw persona MWE datasets from anthropic
adding translation for mwe persona datasets and translating the first 5
translating sycophancy datasets
make_sycophancy_caa parses translations, and adding translations for misc strings
adding a convenience wrapper to load_translation
adding a script to make MWE personas datasets
fix lint formatting
alternating every 2 samples for MWE, not every 1 (ce83a8d)
translating TQA into styles and languages with gpt4 (#94)
translating TQA into styles and languages with gpt4
dont force ascii, its not 1998
fixing test mocking
refactoring translations to make supporting more datasets easier (d6d241a)
Caa tqa (#91)
refactoring formatting and benchmarking to support CAA
adding a basic test for get_normalized_correct_probs()
fixing tests
increasing sft loss threshold to make test less flaky
adding a TQA CAA dataset / experiment

Co-authored-by: David Chanin <chanindav@gmail.com> (97b1236)

Refactoring formatting and benchmarking to support CAA (#87)
refactoring formatting and benchmarking to support CAA
adding a basic test for get_normalized_correct_probs()
fixing tests
increasing sft loss threshold to make test less flaky (c98f067)
Merge pull request #88 from dtch1997/openai-translators

Add openai as dependency, and translators notebook (c1fb281)

Add openai as dependency, and translators notebook

This notebook has a few simple functions for translating inputs and
dataframes using gpt-4. You will need an openai API key to run the code
(obviously). (dadf9f2)

fixing tests after CAA merge (#85) (6efbfaf)
Caa experiments 2 (#80)
Add script to generate CAA datasets
Add correct CAA datasets
Add gitignore for experiments
Modify default template
Add get_normalized_correct_probs function
Add script to generate vectors
Add scripts to prompt w/ SV, plot results
Add notebook to compare our vs their CAA vectors
Add instructions to reproduce results
Add plots
Add evaluator for normalized correct probs
Skip failing tests

Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (cca4b29)

Fix failing test (b7975b1)
refactoring prompting/formatting (#77)
refactoring prompting/formatting
fixing conflict in tests (dca53ac)
Merge pull request #79 from dtch1997/swap-in-steering-vecs

swapping in steering-vectors lib (f86d0d2)

Merge pull request #78 from dtch1997/verify-caa-steering

adding a test to assert our steering is identical to CAA steering (a5ea301)

swapping in steering-vectors lib (8e86019)
adding a test to assert our steering is identical to CAA steering (f26b2df)
Verify our code matches CAA (#76)
adding a llama chat formater and prompter based on CAA
testing that our reading vectors match CAA reading vectors
fixing linting
fixing test (aa1dd24)
cleaning up oddities with steering vecs and repe algo (#72) (773db50)
CAA tweaks / improvements (#70)
Add bitsandbytes, accelerate
Hardcode second-last token activation position for steering vectors
Add notebook diffmerge package for pretty git diffs
Add note on how to change RepE directions
Add note on how hooks work
Add options to decouple reading and control
fixing tests

Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (1487639)

CAA base (#69)
adding a record_activations() function to make it easy to collect model activations
replacing repe with our own CAA-esque implementation
only patch generated tokens
fix generating start index selection
fixing pyright error (55980ea)
Add CAA datasets (#68)
Add CAA datasets
Update makefile
Add test for make_ab_prompt

Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (c1ae7f1)

Sft hf trainer (#50)
Working HF trainer script
customize wandb logging
Remove unused keys from SFTDataset
Add unit test for SFT
Fix import
Fix lint
Fix lint (again)
Fix test
Fix benchmark, pipeline logic

Update the train_and_evaluate fn to be consistent
with algorithm.run now returning a dict

Fix pyright errors related to GenerationConfig

Modify ICL to match new semantics
Fix icl test
Fix tests on GPU
Fix device handling in tests
Fix pyright bugbears
Fix mutable default error in nested dataclass
Fix mutable dataclass fields in python 3.11
Fix nitpicks
Fix default dataset
Fix pyright
Fix icl test to use new Algorithm.run signature
Improve test cases

Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (ec9e914)

Logprobs eval (#62)
porting dataset handling code from tqa branch
adding logprob calculation and adding an evaluator for multiple choice questions (8d07688)
allow setting a direction coefficient for repe (#61) (2a2305e)
multiply direction by sign for reading vectors (#60) (308686b)
interleaving positive and negative prompts to match what the original repe code expects (#59) (8eb3b75)
limiting ICL max examples to avoid prepending entire dataset (#57) (856145e)
Fix device handling in tests (#52)
Fix device handling in tests
Fix pyright bugbears
Set device automatically for RepE pipeline
Specify device in test

Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (f46940a)

fixing bug where repe algo is patching in ndarray instead of tensors (#55) (7bdc280)
Implementing Repe reading control algorithm (#48)

setting up repe reading algorithm (f85b3d7)

Update lockfile (#49)
Bump python to 3.10
Update lockfile

Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (4ed1698)

updating make_truthfulqa to use mc1 targets (#47) (d04ea9f)
Polishing ModelPatcher layer guessing and fully replacing WrappedReadingVecModel (#40)
polishing layer guessing and fully replacing with WrappedReadingVecModel with ModelPatcher
adding a test for pipeline skipping patching (fb6ccf5)
Configurable model patching (#33)
adding configurable model patching
updating original RepE rep_control_reading_vec.py to add operators
renaming ModelPatcher.py to model_patcher.py
adding different patch operations to match paper
fixing comment doc for model patcher (2533e84)
Implement supervised fine-tuning (#31)
WIP
SFT working
remove duplicate AverageMeter implementation
Add ability to run on custom splits of dataset
Remove broken code
Fix Pyright issues
Fix Pyright issues, again
Update example, completion to dataclasses
fix data generation
Fix SFT inheritance
Remove unused string methods
Seeded random dataset shuffle
Rename BaseAlgorithm to Algorithm
fix pyright
Add test for SFT
minor
Fix nit
Fix tests
Update default huggingface cache dir
Modify tokenizer config in conftest
Abstract away logger

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (1d68251)

reload model for every test (#38) (3e9438e)
updating README based on new github flow (#30)
updating README based on new github flow
Update makefile, readme
Update README.md

Co-authored-by: David Chanin <chanindav@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (62658d2)

adding Benchmark and Eval classes (#25)
adding Benchmark and Eval classes
simplifying benchmarking
updating snapshot (e8dd92b)
Relax Python version requirements (#26)
Relax python requirements
Update dependencies
add more python version to CI
Remove Py3.9 support
Remove PDM caching

May be causing issues with CI workflow re-using same cache for different python versions
Likely doesn't result in much speedup

Change PDM to be installed by pip
Add libopenblas
Sudo add libopenblas
Update lock file
Only Python3.10
updating CI for 3.11, and adding pyright

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (bd01efb)

Switch to PDM managed; add pre-commit; run linters (#23)
Switch to PDM managed; add pre-commit; run linters
Update CI workflow
Use PDM in ci workflow
Fix tests
Add device-aware testing
adding snapshot test for test_pipeline
remove trailing whitespace hook, black does this already
try removing tensorboard to see if that makes things work?
try explicitly adding ml-dtypes to dev deps, to see if that helps?
...trying to add wrapt explicitly now...
undoing dep changes
try adding explicit tensorflow dev dep
try removing bleurt dep and tensorflow
fixing CI caching
use string for python version in CI

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (62ad2ed)

silencing repe litning errors (#21)
silencing repe litning errors
adding HF token to CI (a49c9b6)
setting up basic pipeline arc, and adding icl algo (#15) (97bcbaf)
repe is here (4e58c97)
Merge branch 'dev' of github.com:dtch1997/repepo into dev (ec7743e)
test (783577c)
simplifying core classes (21522aa)
Adding pyright and fixing type errors (#14)
adding pyright and fixing type errors
fixing linting
adding scikit-learn to deps
fixing types pylance (but not pyright) complains about
ignore reportPrivateImportUsage
replacing namedtuple with NamedTuple for better type inference (4915268)
working on it (ee1c2e7)
minor (22bcc4f)
Merge pull request #13 from dtch1997/ci-linting-tests

CI, linting, and tests (7e08718)

adding CI workflow for linting and tests (8ce39e5)
installing ruff and black and fixing formatting (59b4150)
Register datasets; other minor changes (413df15)
Add accelerate (4265238)
Improve formatting, printing (20d84f9)
ICL eval pipeline (d01bbb5)
Make ICL task vectors in standard format (e015a00)
Add scripts to make datasets from "In-context Learning Creates Task Vectors" (e0adc69)
Tests working (4038c16)
Add algorithms and metrics (10bb2f0)
Add basic abstractions (b919229)
Merge branch 'dev' into main (82feb1b)
Add major changes (#11)
SFT baseline (#5)
Update requirements
Add SFT algorithm
Add datasets, log dirs to gitignore
Demonstrate how to configure dataset
Update README

Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>

Daniel dev1 (#6)
WIP simple train script
Add full training pipeline
Modify SFT dataset to return reference completions
Add train_simple baseline
Add BLEURT, ROUGE scores
Add WandB logging
Update README; make lr configurable
Update requirements
Enable SFT to be used with HF dataset
Fix bug in lr scheduling

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>

Refactor out prompt formatting
Fix bug
Aengus dev2 (#10)
WIP simple train script
Add full training pipeline
Modify SFT dataset to return reference completions
Add train_simple baseline
Add BLEURT, ROUGE scores
Add WandB logging
Update README; make lr configurable
Update requirements
Enable SFT to be used with HF dataset
Fix bug in lr scheduling
adding repe
organising
damn things not being easy
works
working but weirdly

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Daniel Tan <25474937+dtch1997@users.noreply.github.com>

Integrate repe pipeline (#12)
Add AmbigPrompt datasets
Inject project variables into script
put prompts in another file

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: aengusl <aenguslynch@gmail.com>

Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Aengus Lynch <37474130+aengusl@users.noreply.github.com> (cb451dd)

Integrate repe pipeline (#12)
Add AmbigPrompt datasets
Inject project variables into script
put prompts in another file

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: aengusl <aenguslynch@gmail.com> (f8fef0d)

Aengus dev2 (#10)
WIP simple train script
Add full training pipeline
Modify SFT dataset to return reference completions
Add train_simple baseline
Add BLEURT, ROUGE scores
Add WandB logging
Update README; make lr configurable
Update requirements
Enable SFT to be used with HF dataset
Fix bug in lr scheduling
adding repe
organising
damn things not being easy
works
working but weirdly

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Daniel Tan <25474937+dtch1997@users.noreply.github.com> (dac8ece)

Fix bug (13e868f)
Refactor out prompt formatting (d66f9d8)
Daniel dev1 (#6)
WIP simple train script
Add full training pipeline
Modify SFT dataset to return reference completions
Add train_simple baseline
Add BLEURT, ROUGE scores
Add WandB logging
Update README; make lr configurable
Update requirements
Enable SFT to be used with HF dataset
Fix bug in lr scheduling

Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (0fdfec6)

SFT baseline (#5)
Update requirements
Add SFT algorithm
Add datasets, log dirs to gitignore
Demonstrate how to configure dataset
Update README

Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (3abba55)

Add examples (#2)
Add HF example for fine-tuning on QA
Add examples from RepEng repo
Add AlpacaFarm, datasets reqs

Co-authored-by: Daniel Tan <dtch1997@users.noreply.github.com> (573df55)

Update README.md (a093981)
Initial commit (9d16317)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.0

v0.0.0 (2024-03-04)

Chore

Unknown

Contributors