v0.0.0
v0.0.0 (2024-03-04)
Chore
- chore: add semantic release to ci (
56be43f
)
Unknown
-
Delete dead code (
a1da6de
) -
translate the main persona datasets from @dtch1997's work (#108) (
b0b13a2
) -
Translation experiments (#99)
-
adding helpers for generating and parsing language from dataset filename
-
adding compare_dataset_translations experiment
-
adding experiment helpers
-
tweaking pirate translation strings
-
adding translations for non-persona mwes
-
fixing up make mwe helper
-
adding a 'ctx' pseudo-style
-
Revert "adding a 'ctx' pseudo-style"
This reverts commit a0058c4.
-
refactoring to allow using arbitrary dataset variations, insead of the hacky pseudo-language stuff
-
fixing using existing results in cross-steering
-
adding helpers to calculate jenson-shannon and KL for bernoulli distributions
-
using js dist for steering deltas
-
adding more tests (
46927b0
) -
adding translated persona MWE variants (#103)
-
adding translated persona MWE variants by pre-pending the generation ctx to each example
-
formatting translated_strings (
1a42e96
) -
adding google translate and re-translating persona datasets (#102)
-
adding google translate and re-translating persona datasets
-
fixing linting
-
removing unused test (
312b4ab
) -
standardizing dataset naming around language (#100) (
8fda2f9
) -
Generalization experiments (#96)
-
Add functions to do translation
-
Add TQA translate
-
Fix key name bug
-
WIP
-
Add script to generate TQA translated datasets
-
update expt name and dataset splits
-
Add Llama chat formatter
-
Minor fixes in caa_repro
-
Add options to print output, save steering vectors
-
Set default experiment path by train / test datasets
-
Add functionality to print examples
-
Add script to plot results
-
Add title to plotting code
-
Fix pdm lock
-
Add (very ugly) function to plot multiple results
Very ugly implementation but it works
-
Ignore png files
-
Enable translated system prompt
-
Add new experiments dir
-
Add notebook to analyze TQA vectors
-
Add script to download datasets
-
Add script to download datasets
-
WIP translate
-
Add code to extract and save steering vectors
-
Update experiments
-
Add more dataset names
-
Improve dataset inspection
-
Modify script to extract all SVs
-
Changes to notebooks
-
Update readme
-
WIP
-
Fix download datasets
-
Enable 4-bit loading
-
WIP
-
Visualize pairwise cos similarities
-
Inspect dataset s dataframe
-
Clustering results
-
Fix lint errors
-
Add script to extract concept vectors
-
WIP
-
Refactoring
-
Refactoring
-
Add script to run all experiments
-
Fix bug with results suffix
-
Uncomment some lines
-
Update README, bash script
-
Restore original experiments dir
-
Fix lint
-
Fix lint
-
Add more aggregations
-
Fix bug in download
-
Ignore html files
-
Add test for data preprocessing
-
Add tests for preprocessing
-
fixing black formatting issues
-
fixing typing
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (6680de7
)
-
Translate mwe and sycophancy (#97)
-
importing raw persona MWE datasets from anthropic
-
adding translation for mwe persona datasets and translating the first 5
-
translating sycophancy datasets
-
make_sycophancy_caa parses translations, and adding translations for misc strings
-
adding a convenience wrapper to load_translation
-
adding a script to make MWE personas datasets
-
fix lint formatting
-
alternating every 2 samples for MWE, not every 1 (
ce83a8d
) -
translating TQA into styles and languages with gpt4 (#94)
-
translating TQA into styles and languages with gpt4
-
dont force ascii, its not 1998
-
fixing test mocking
-
refactoring translations to make supporting more datasets easier (
d6d241a
) -
Caa tqa (#91)
-
refactoring formatting and benchmarking to support CAA
-
adding a basic test for get_normalized_correct_probs()
-
fixing tests
-
increasing sft loss threshold to make test less flaky
-
adding a TQA CAA dataset / experiment
Co-authored-by: David Chanin <chanindav@gmail.com> (97b1236
)
-
Refactoring formatting and benchmarking to support CAA (#87)
-
refactoring formatting and benchmarking to support CAA
-
adding a basic test for get_normalized_correct_probs()
-
fixing tests
-
increasing sft loss threshold to make test less flaky (
c98f067
) -
Merge pull request #88 from dtch1997/openai-translators
Add openai
as dependency, and translators notebook (c1fb281
)
- Add
openai
as dependency, and translators notebook
This notebook has a few simple functions for translating inputs and
dataframes using gpt-4. You will need an openai API key to run the code
(obviously). (dadf9f2
)
-
Caa experiments 2 (#80)
-
Add script to generate CAA datasets
-
Add correct CAA datasets
-
Add gitignore for experiments
-
Modify default template
-
Add get_normalized_correct_probs function
-
Add script to generate vectors
-
Add scripts to prompt w/ SV, plot results
-
Add notebook to compare our vs their CAA vectors
-
Add instructions to reproduce results
-
Add plots
-
Add evaluator for normalized correct probs
-
Skip failing tests
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (cca4b29
)
-
Fix failing test (
b7975b1
) -
refactoring prompting/formatting (#77)
-
refactoring prompting/formatting
-
fixing conflict in tests (
dca53ac
) -
Merge pull request #79 from dtch1997/swap-in-steering-vecs
swapping in steering-vectors lib (f86d0d2
)
- Merge pull request #78 from dtch1997/verify-caa-steering
adding a test to assert our steering is identical to CAA steering (a5ea301
)
-
swapping in steering-vectors lib (
8e86019
) -
adding a test to assert our steering is identical to CAA steering (
f26b2df
) -
Verify our code matches CAA (#76)
-
adding a llama chat formater and prompter based on CAA
-
testing that our reading vectors match CAA reading vectors
-
fixing linting
-
fixing test (
aa1dd24
) -
cleaning up oddities with steering vecs and repe algo (#72) (
773db50
) -
CAA tweaks / improvements (#70)
-
Add bitsandbytes, accelerate
-
Hardcode second-last token activation position for steering vectors
-
Add notebook diffmerge package for pretty git diffs
-
Add note on how to change RepE directions
-
Add note on how hooks work
-
Add options to decouple reading and control
-
fixing tests
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (1487639
)
-
CAA base (#69)
-
adding a record_activations() function to make it easy to collect model activations
-
replacing repe with our own CAA-esque implementation
-
only patch generated tokens
-
fix generating start index selection
-
fixing pyright error (
55980ea
) -
Add CAA datasets (#68)
-
Add CAA datasets
-
Update makefile
-
Add test for make_ab_prompt
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (c1ae7f1
)
-
Sft hf trainer (#50)
-
Working HF trainer script
-
customize wandb logging
-
Remove unused keys from SFTDataset
-
Add unit test for SFT
-
Fix import
-
Fix lint
-
Fix lint (again)
-
Fix test
-
Fix benchmark, pipeline logic
Update the train_and_evaluate fn to be consistent
with algorithm.run now returning a dict
Fix pyright errors related to GenerationConfig
-
Modify ICL to match new semantics
-
Fix icl test
-
Fix tests on GPU
-
Fix device handling in tests
-
Fix pyright bugbears
-
Fix mutable default error in nested dataclass
-
Fix mutable dataclass fields in python 3.11
-
Fix nitpicks
-
Fix default dataset
-
Fix pyright
-
Fix icl test to use new Algorithm.run signature
-
Improve test cases
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (ec9e914
)
-
Logprobs eval (#62)
-
porting dataset handling code from tqa branch
-
adding logprob calculation and adding an evaluator for multiple choice questions (
8d07688
) -
allow setting a direction coefficient for repe (#61) (
2a2305e
) -
multiply direction by sign for reading vectors (#60) (
308686b
) -
interleaving positive and negative prompts to match what the original repe code expects (#59) (
8eb3b75
) -
limiting ICL max examples to avoid prepending entire dataset (#57) (
856145e
) -
Fix device handling in tests (#52)
-
Fix device handling in tests
-
Fix pyright bugbears
-
Set device automatically for RepE pipeline
-
Specify device in test
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (f46940a
)
-
fixing bug where repe algo is patching in ndarray instead of tensors (#55) (
7bdc280
) -
Implementing Repe reading control algorithm (#48)
setting up repe reading algorithm (f85b3d7
)
-
Update lockfile (#49)
-
Bump python to 3.10
-
Update lockfile
Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com> (4ed1698
)
-
Polishing ModelPatcher layer guessing and fully replacing WrappedReadingVecModel (#40)
-
polishing layer guessing and fully replacing with WrappedReadingVecModel with ModelPatcher
-
adding a test for pipeline skipping patching (
fb6ccf5
) -
Configurable model patching (#33)
-
adding configurable model patching
-
updating original RepE rep_control_reading_vec.py to add operators
-
renaming ModelPatcher.py to model_patcher.py
-
adding different patch operations to match paper
-
fixing comment doc for model patcher (
2533e84
) -
Implement supervised fine-tuning (#31)
-
WIP
-
SFT working
-
remove duplicate AverageMeter implementation
-
Add ability to run on custom splits of dataset
-
Remove broken code
-
Fix Pyright issues
-
Fix Pyright issues, again
-
Update example, completion to dataclasses
-
fix data generation
-
Fix SFT inheritance
-
Remove unused string methods
-
Seeded random dataset shuffle
-
Rename BaseAlgorithm to Algorithm
-
fix pyright
-
Add test for SFT
-
minor
-
Fix nit
-
Fix tests
-
Update default huggingface cache dir
-
Modify tokenizer config in conftest
-
Abstract away logger
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (1d68251
)
-
updating README based on new github flow (#30)
-
updating README based on new github flow
-
Update makefile, readme
-
Update README.md
Co-authored-by: David Chanin <chanindav@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (62658d2
)
-
adding Benchmark and Eval classes (#25)
-
adding Benchmark and Eval classes
-
simplifying benchmarking
-
updating snapshot (
e8dd92b
) -
Relax Python version requirements (#26)
-
Relax python requirements
-
Update dependencies
-
add more python version to CI
-
Remove Py3.9 support
-
Remove PDM caching
May be causing issues with CI workflow re-using same cache for different python versions
Likely doesn't result in much speedup
-
Change PDM to be installed by pip
-
Add libopenblas
-
Sudo add libopenblas
-
Update lock file
-
Only Python3.10
-
updating CI for 3.11, and adding pyright
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (bd01efb
)
-
Switch to PDM managed; add pre-commit; run linters (#23)
-
Switch to PDM managed; add pre-commit; run linters
-
Update CI workflow
-
Use PDM in ci workflow
-
Fix tests
-
Add device-aware testing
-
adding snapshot test for test_pipeline
-
remove trailing whitespace hook, black does this already
-
try removing tensorboard to see if that makes things work?
-
try explicitly adding ml-dtypes to dev deps, to see if that helps?
-
...trying to add wrapt explicitly now...
-
undoing dep changes
-
try adding explicit tensorflow dev dep
-
try removing bleurt dep and tensorflow
-
fixing CI caching
-
use string for python version in CI
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: David Chanin <chanindav@gmail.com> (62ad2ed
)
-
silencing repe litning errors (#21)
-
silencing repe litning errors
-
adding HF token to CI (
a49c9b6
) -
setting up basic pipeline arc, and adding icl algo (#15) (
97bcbaf
) -
repe is here (
4e58c97
) -
Merge branch 'dev' of github.com:dtch1997/repepo into dev (
ec7743e
) -
test (
783577c
) -
simplifying core classes (
21522aa
) -
Adding pyright and fixing type errors (#14)
-
adding pyright and fixing type errors
-
fixing linting
-
adding scikit-learn to deps
-
fixing types pylance (but not pyright) complains about
-
ignore reportPrivateImportUsage
-
replacing namedtuple with NamedTuple for better type inference (
4915268
) -
working on it (
ee1c2e7
) -
minor (
22bcc4f
) -
Merge pull request #13 from dtch1997/ci-linting-tests
CI, linting, and tests (7e08718
)
-
adding CI workflow for linting and tests (
8ce39e5
) -
installing ruff and black and fixing formatting (
59b4150
) -
Register datasets; other minor changes (
413df15
) -
Add accelerate (
4265238
) -
Improve formatting, printing (
20d84f9
) -
ICL eval pipeline (
d01bbb5
) -
Make ICL task vectors in standard format (
e015a00
) -
Add scripts to make datasets from "In-context Learning Creates Task Vectors" (
e0adc69
) -
Tests working (
4038c16
) -
Add algorithms and metrics (
10bb2f0
) -
Add basic abstractions (
b919229
) -
Merge branch 'dev' into main (
82feb1b
) -
Add major changes (#11)
-
SFT baseline (#5)
-
Update requirements
-
Add SFT algorithm
-
Add datasets, log dirs to gitignore
-
Demonstrate how to configure dataset
-
Update README
Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
-
Daniel dev1 (#6)
-
WIP simple train script
-
Add full training pipeline
-
Modify SFT dataset to return reference completions
-
Add train_simple baseline
-
Add BLEURT, ROUGE scores
-
Add WandB logging
-
Update README; make lr configurable
-
Update requirements
-
Enable SFT to be used with HF dataset
-
Fix bug in lr scheduling
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
-
Refactor out prompt formatting
-
Fix bug
-
Aengus dev2 (#10)
-
WIP simple train script
-
Add full training pipeline
-
Modify SFT dataset to return reference completions
-
Add train_simple baseline
-
Add BLEURT, ROUGE scores
-
Add WandB logging
-
Update README; make lr configurable
-
Update requirements
-
Enable SFT to be used with HF dataset
-
Fix bug in lr scheduling
-
adding repe
-
organising
-
damn things not being easy
-
works
-
working but weirdly
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Daniel Tan <25474937+dtch1997@users.noreply.github.com>
-
Integrate repe pipeline (#12)
-
Add AmbigPrompt datasets
-
Inject project variables into script
-
put prompts in another file
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Aengus Lynch <37474130+aengusl@users.noreply.github.com> (cb451dd
)
-
Integrate repe pipeline (#12)
-
Add AmbigPrompt datasets
-
Inject project variables into script
-
put prompts in another file
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: aengusl <aenguslynch@gmail.com> (f8fef0d
)
-
Aengus dev2 (#10)
-
WIP simple train script
-
Add full training pipeline
-
Modify SFT dataset to return reference completions
-
Add train_simple baseline
-
Add BLEURT, ROUGE scores
-
Add WandB logging
-
Update README; make lr configurable
-
Update requirements
-
Enable SFT to be used with HF dataset
-
Fix bug in lr scheduling
-
adding repe
-
organising
-
damn things not being easy
-
works
-
working but weirdly
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com>
Co-authored-by: Daniel Tan <25474937+dtch1997@users.noreply.github.com> (dac8ece
)
-
Fix bug (
13e868f
) -
Refactor out prompt formatting (
d66f9d8
) -
Daniel dev1 (#6)
-
WIP simple train script
-
Add full training pipeline
-
Modify SFT dataset to return reference completions
-
Add train_simple baseline
-
Add BLEURT, ROUGE scores
-
Add WandB logging
-
Update README; make lr configurable
-
Update requirements
-
Enable SFT to be used with HF dataset
-
Fix bug in lr scheduling
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (0fdfec6
)
-
SFT baseline (#5)
-
Update requirements
-
Add SFT algorithm
-
Add datasets, log dirs to gitignore
-
Demonstrate how to configure dataset
-
Update README
Co-authored-by: aengusl <aenguslynch@gmail.com>
Co-authored-by: Daniel CH Tan <dtch1997@users.noreply.github.com> (3abba55
)
-
Add examples (#2)
-
Add HF example for fine-tuning on QA
-
Add examples from RepEng repo
-
Add AlpacaFarm, datasets reqs
Co-authored-by: Daniel Tan <dtch1997@users.noreply.github.com> (573df55
)