Skip to content

v0.0.0

Compare
Choose a tag to compare
@github-actions github-actions released this 04 Mar 17:32
· 107 commits to main since this release

v0.0.0 (2024-03-04)

Chore

  • chore: add semantic release to ci (56be43f)

Unknown

  • Delete dead code (a1da6de)

  • translate the main persona datasets from @dtch1997's work (#108) (b0b13a2)

  • Translation experiments (#99)

  • adding helpers for generating and parsing language from dataset filename

  • adding compare_dataset_translations experiment

  • adding experiment helpers

  • tweaking pirate translation strings

  • adding translations for non-persona mwes

  • fixing up make mwe helper

  • adding a 'ctx' pseudo-style

  • Revert "adding a 'ctx' pseudo-style"

This reverts commit a0058c4.

  • refactoring to allow using arbitrary dataset variations, insead of the hacky pseudo-language stuff

  • fixing using existing results in cross-steering

  • adding helpers to calculate jenson-shannon and KL for bernoulli distributions

  • using js dist for steering deltas

  • adding more tests (46927b0)

  • adding translated persona MWE variants (#103)

  • adding translated persona MWE variants by pre-pending the generation ctx to each example

  • formatting translated_strings (1a42e96)

  • adding google translate and re-translating persona datasets (#102)

  • adding google translate and re-translating persona datasets

  • fixing linting

  • removing unused test (312b4ab)

  • standardizing dataset naming around language (#100) (8fda2f9)

  • Generalization experiments (#96)

  • Add functions to do translation

  • Add TQA translate

  • Fix key name bug

  • WIP

  • Add script to generate TQA translated datasets

  • update expt name and dataset splits

  • Add Llama chat formatter

  • Minor fixes in caa_repro

  • Add options to print output, save steering vectors

  • Set default experiment path by train / test datasets

  • Add functionality to print examples

  • Add script to plot results

  • Add title to plotting code

  • Fix pdm lock

  • Add (very ugly) function to plot multiple results

Very ugly implementation but it works

  • Ignore png files

  • Enable translated system prompt

  • Add new experiments dir

  • Add notebook to analyze TQA vectors

  • Add script to download datasets

  • Add script to download datasets

  • WIP translate

  • Add code to extract and save steering vectors

  • Update experiments

  • Add more dataset names

  • Improve dataset inspection

  • Modify script to extract all SVs

  • Changes to notebooks

  • Update readme

  • WIP

  • Fix download datasets

  • Enable 4-bit loading

  • WIP

  • Visualize pairwise cos similarities

  • Inspect dataset s dataframe

  • Clustering results

  • Fix lint errors

  • Add script to extract concept vectors

  • WIP

  • Refactoring

  • Refactoring

  • Add script to run all experiments

  • Fix bug with results suffix

  • Uncomment some lines

  • Update README, bash script

  • Restore original experiments dir

  • Fix lint

  • Fix lint

  • Add more aggregations

  • Fix bug in download

  • Ignore html files

  • Add test for data preprocessing

  • Add tests for preprocessing

  • fixing black formatting issues

  • fixing typing


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (6680de7)

  • Translate mwe and sycophancy (#97)

  • importing raw persona MWE datasets from anthropic

  • adding translation for mwe persona datasets and translating the first 5

  • translating sycophancy datasets

  • make_sycophancy_caa parses translations, and adding translations for misc strings

  • adding a convenience wrapper to load_translation

  • adding a script to make MWE personas datasets

  • fix lint formatting

  • alternating every 2 samples for MWE, not every 1 (ce83a8d)

  • translating TQA into styles and languages with gpt4 (#94)

  • translating TQA into styles and languages with gpt4

  • dont force ascii, its not 1998

  • fixing test mocking

  • refactoring translations to make supporting more datasets easier (d6d241a)

  • Caa tqa (#91)

  • refactoring formatting and benchmarking to support CAA

  • adding a basic test for get_normalized_correct_probs()

  • fixing tests

  • increasing sft loss threshold to make test less flaky

  • adding a TQA CAA dataset / experiment


Co-authored-by: David Chanin <chanindav@gmail.com> (97b1236)

  • Refactoring formatting and benchmarking to support CAA (#87)

  • refactoring formatting and benchmarking to support CAA

  • adding a basic test for get_normalized_correct_probs()

  • fixing tests

  • increasing sft loss threshold to make test less flaky (c98f067)

  • Merge pull request #88 from dtch1997/openai-translators

Add openai as dependency, and translators notebook (c1fb281)

  • Add openai as dependency, and translators notebook

This notebook has a few simple functions for translating inputs and
dataframes using gpt-4. You will need an openai API key to run the code
(obviously). (dadf9f2)

  • fixing tests after CAA merge (#85) (6efbfaf)

  • Caa experiments 2 (#80)

  • Add script to generate CAA datasets

  • Add correct CAA datasets

  • Add gitignore for experiments

  • Modify default template

  • Add get_normalized_correct_probs function

  • Add script to generate vectors

  • Add scripts to prompt w/ SV, plot results

  • Add notebook to compare our vs their CAA vectors

  • Add instructions to reproduce results

  • Add plots

  • Add evaluator for normalized correct probs

  • Skip failing tests


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (cca4b29)

  • Fix failing test (b7975b1)

  • refactoring prompting/formatting (#77)

  • refactoring prompting/formatting

  • fixing conflict in tests (dca53ac)

  • Merge pull request #79 from dtch1997/swap-in-steering-vecs

swapping in steering-vectors lib (f86d0d2)

  • Merge pull request #78 from dtch1997/verify-caa-steering

adding a test to assert our steering is identical to CAA steering (a5ea301)

  • swapping in steering-vectors lib (8e86019)

  • adding a test to assert our steering is identical to CAA steering (f26b2df)

  • Verify our code matches CAA (#76)

  • adding a llama chat formater and prompter based on CAA

  • testing that our reading vectors match CAA reading vectors

  • fixing linting

  • fixing test (aa1dd24)

  • cleaning up oddities with steering vecs and repe algo (#72) (773db50)

  • CAA tweaks / improvements (#70)

  • Add bitsandbytes, accelerate

  • Hardcode second-last token activation position for steering vectors

  • Add notebook diffmerge package for pretty git diffs

  • Add note on how to change RepE directions

  • Add note on how hooks work

  • Add options to decouple reading and control

  • fixing tests


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (1487639)

  • CAA base (#69)

  • adding a record_activations() function to make it easy to collect model activations

  • replacing repe with our own CAA-esque implementation

  • only patch generated tokens

  • fix generating start index selection

  • fixing pyright error (55980ea)

  • Add CAA datasets (#68)

  • Add CAA datasets

  • Update makefile

  • Add test for make_ab_prompt


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (c1ae7f1)

  • Sft hf trainer (#50)

  • Working HF trainer script

  • customize wandb logging

  • Remove unused keys from SFTDataset

  • Add unit test for SFT

  • Fix import

  • Fix lint

  • Fix lint (again)

  • Fix test

  • Fix benchmark, pipeline logic

Update the train_and_evaluate fn to be consistent
with algorithm.run now returning a dict

Fix pyright errors related to GenerationConfig

  • Modify ICL to match new semantics

  • Fix icl test

  • Fix tests on GPU

  • Fix device handling in tests

  • Fix pyright bugbears

  • Fix mutable default error in nested dataclass

  • Fix mutable dataclass fields in python 3.11

  • Fix nitpicks

  • Fix default dataset

  • Fix pyright

  • Fix icl test to use new Algorithm.run signature

  • Improve test cases


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (ec9e914)

  • Logprobs eval (#62)

  • porting dataset handling code from tqa branch

  • adding logprob calculation and adding an evaluator for multiple choice questions (8d07688)

  • allow setting a direction coefficient for repe (#61) (2a2305e)

  • multiply direction by sign for reading vectors (#60) (308686b)

  • interleaving positive and negative prompts to match what the original repe code expects (#59) (8eb3b75)

  • limiting ICL max examples to avoid prepending entire dataset (#57) (856145e)

  • Fix device handling in tests (#52)

  • Fix device handling in tests

  • Fix pyright bugbears

  • Set device automatically for RepE pipeline

  • Specify device in test


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (f46940a)

  • fixing bug where repe algo is patching in ndarray instead of tensors (#55) (7bdc280)

  • Implementing Repe reading control algorithm (#48)

setting up repe reading algorithm (f85b3d7)

  • Update lockfile (#49)

  • Bump python to 3.10

  • Update lockfile


Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (4ed1698)

  • updating make_truthfulqa to use mc1 targets (#47) (d04ea9f)

  • Polishing ModelPatcher layer guessing and fully replacing WrappedReadingVecModel (#40)

  • polishing layer guessing and fully replacing with WrappedReadingVecModel with ModelPatcher

  • adding a test for pipeline skipping patching (fb6ccf5)

  • Configurable model patching (#33)

  • adding configurable model patching

  • updating original RepE rep_control_reading_vec.py to add operators

  • renaming ModelPatcher.py to model_patcher.py

  • adding different patch operations to match paper

  • fixing comment doc for model patcher (2533e84)

  • Implement supervised fine-tuning (#31)

  • WIP

  • SFT working

  • remove duplicate AverageMeter implementation

  • Add ability to run on custom splits of dataset

  • Remove broken code

  • Fix Pyright issues

  • Fix Pyright issues, again

  • Update example, completion to dataclasses

  • fix data generation

  • Fix SFT inheritance

  • Remove unused string methods

  • Seeded random dataset shuffle

  • Rename BaseAlgorithm to Algorithm

  • fix pyright

  • Add test for SFT

  • minor

  • Fix nit

  • Fix tests

  • Update default huggingface cache dir

  • Modify tokenizer config in conftest

  • Abstract away logger


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (1d68251)

  • reload model for every test (#38) (3e9438e)

  • updating README based on new github flow (#30)

  • updating README based on new github flow

  • Update makefile, readme

  • Update README.md


Co-authored-by: David Chanin <chanindav@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (62658d2)

  • adding Benchmark and Eval classes (#25)

  • adding Benchmark and Eval classes

  • simplifying benchmarking

  • updating snapshot (e8dd92b)

  • Relax Python version requirements (#26)

  • Relax python requirements

  • Update dependencies

  • add more python version to CI

  • Remove Py3.9 support

  • Remove PDM caching

May be causing issues with CI workflow re-using same cache for different python versions
Likely doesn't result in much speedup

  • Change PDM to be installed by pip

  • Add libopenblas

  • Sudo add libopenblas

  • Update lock file

  • Only Python3.10

  • updating CI for 3.11, and adding pyright


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (bd01efb)

  • Switch to PDM managed; add pre-commit; run linters (#23)

  • Switch to PDM managed; add pre-commit; run linters

  • Update CI workflow

  • Use PDM in ci workflow

  • Fix tests

  • Add device-aware testing

  • adding snapshot test for test_pipeline

  • remove trailing whitespace hook, black does this already

  • try removing tensorboard to see if that makes things work?

  • try explicitly adding ml-dtypes to dev deps, to see if that helps?

  • ...trying to add wrapt explicitly now...

  • undoing dep changes

  • try adding explicit tensorflow dev dep

  • try removing bleurt dep and tensorflow

  • fixing CI caching

  • use string for python version in CI


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (62ad2ed)

  • silencing repe litning errors (#21)

  • silencing repe litning errors

  • adding HF token to CI (a49c9b6)

  • setting up basic pipeline arc, and adding icl algo (#15) (97bcbaf)

  • repe is here (4e58c97)

  • Merge branch 'dev' of github.com:dtch1997/repepo into dev (ec7743e)

  • test (783577c)

  • simplifying core classes (21522aa)

  • Adding pyright and fixing type errors (#14)

  • adding pyright and fixing type errors

  • fixing linting

  • adding scikit-learn to deps

  • fixing types pylance (but not pyright) complains about

  • ignore reportPrivateImportUsage

  • replacing namedtuple with NamedTuple for better type inference (4915268)

  • working on it (ee1c2e7)

  • minor (22bcc4f)

  • Merge pull request #13 from dtch1997/ci-linting-tests

CI, linting, and tests (7e08718)

  • adding CI workflow for linting and tests (8ce39e5)

  • installing ruff and black and fixing formatting (59b4150)

  • Register datasets; other minor changes (413df15)

  • Add accelerate (4265238)

  • Improve formatting, printing (20d84f9)

  • ICL eval pipeline (d01bbb5)

  • Make ICL task vectors in standard format (e015a00)

  • Add scripts to make datasets from "In-context Learning Creates Task Vectors" (e0adc69)

  • Tests working (4038c16)

  • Add algorithms and metrics (10bb2f0)

  • Add basic abstractions (b919229)

  • Merge branch 'dev' into main (82feb1b)

  • Add major changes (#11)

  • SFT baseline (#5)

  • Update requirements

  • Add SFT algorithm

  • Add datasets, log dirs to gitignore

  • Demonstrate how to configure dataset

  • Update README


Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>

  • Daniel dev1 (#6)

  • WIP simple train script

  • Add full training pipeline

  • Modify SFT dataset to return reference completions

  • Add train_simple baseline

  • Add BLEURT, ROUGE scores

  • Add WandB logging

  • Update README; make lr configurable

  • Update requirements

  • Enable SFT to be used with HF dataset

  • Fix bug in lr scheduling


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>

  • Refactor out prompt formatting

  • Fix bug

  • Aengus dev2 (#10)

  • WIP simple train script

  • Add full training pipeline

  • Modify SFT dataset to return reference completions

  • Add train_simple baseline

  • Add BLEURT, ROUGE scores

  • Add WandB logging

  • Update README; make lr configurable

  • Update requirements

  • Enable SFT to be used with HF dataset

  • Fix bug in lr scheduling

  • adding repe

  • organising

  • damn things not being easy

  • works

  • working but weirdly


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Daniel Tan <25474937+dtch1997@users.noreply.github.com>

  • Integrate repe pipeline (#12)

  • Add AmbigPrompt datasets

  • Inject project variables into script

  • put prompts in another file


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: aengusl <aenguslynch@gmail.com>


Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Aengus Lynch <37474130+aengusl@users.noreply.github.com> (cb451dd)

  • Integrate repe pipeline (#12)

  • Add AmbigPrompt datasets

  • Inject project variables into script

  • put prompts in another file


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: aengusl <aenguslynch@gmail.com> (f8fef0d)

  • Aengus dev2 (#10)

  • WIP simple train script

  • Add full training pipeline

  • Modify SFT dataset to return reference completions

  • Add train_simple baseline

  • Add BLEURT, ROUGE scores

  • Add WandB logging

  • Update README; make lr configurable

  • Update requirements

  • Enable SFT to be used with HF dataset

  • Fix bug in lr scheduling

  • adding repe

  • organising

  • damn things not being easy

  • works

  • working but weirdly


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Daniel Tan <25474937+dtch1997@users.noreply.github.com> (dac8ece)

  • Fix bug (13e868f)

  • Refactor out prompt formatting (d66f9d8)

  • Daniel dev1 (#6)

  • WIP simple train script

  • Add full training pipeline

  • Modify SFT dataset to return reference completions

  • Add train_simple baseline

  • Add BLEURT, ROUGE scores

  • Add WandB logging

  • Update README; make lr configurable

  • Update requirements

  • Enable SFT to be used with HF dataset

  • Fix bug in lr scheduling


Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (0fdfec6)

  • SFT baseline (#5)

  • Update requirements

  • Add SFT algorithm

  • Add datasets, log dirs to gitignore

  • Demonstrate how to configure dataset

  • Update README


Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (3abba55)

  • Add examples (#2)

  • Add HF example for fine-tuning on QA

  • Add examples from RepEng repo

  • Add AlpacaFarm, datasets reqs


Co-authored-by: Daniel Tan <dtch1997@users.noreply.github.com> (573df55)