CAA #66

chanind · 2024-01-16T20:10:05Z

This PR removes all REPE code and instead replaces it with CAA-style steering vectors, where the steering vectors are found by simply subtracting pos - neg and then taking the mean.

This PR is large because it removes the old Repe stuff, and also moves some of the existing code into a steering_vectors module. This PR introduces the following ideas:

Steering Vectors

The steering_vectors module is separated out from the rest of the code, since this can be published as its own library. This consists of 2 main components for the public API: train_steering_vector() and SteeringVector. The train_steering_vector() function takes a list of paired pos and neg prompts, and returns a steering vector instance. The steering vector can be used to steer generation in a LLM.

Basic usage:

from steering_vectors import train_steering_vector

steering_vector = train_steering_vector(model, tokenizer, paired_prompts)

with steering_vector.apply(model):
    output = model.generate(inputs)

There are a number of improvements we can make to this in the future, such as:

supporting batching during training
setting a magnitude multipler per layer rather than just 1 for all layers
Allow custom masking options instead of only allowing masking of all indices before a given token

That being said, it's probably already publishable as a standalone Python library

Pipeline hooks

Since CAA requires that we only patch activations after the prompt, we need a way for us to tell the steering vector which token in the given prompt should be patched. Our current implementation of Pipeline doesn't have a way to feed this information into the steering vector, so to get around this, this PR adds a concept of hook in the Pipeline class. These hooks take in a context object which contains info about what the pipeline is doing (which example is being parsed, what's the base prompt text, what's the full prompt text, etc...), and then wraps the generation/logprobs calculation. This way, repe can have enough information about what the pipeline is currently running in order to correctly patch activations.

…el activations

* Add CAA datasets * Update makefile * Add test for make_ab_prompt --------- Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com>

dtch1997 · 2024-01-17T18:49:02Z

repepo/algorithms/repe.py

+                    layer_config=self.layer_config,
+                    # NOTE: if the direction multiplier is changed,
+                    # subsequent generations will use the new value
+                    # because this is a reference to the outer scope.
+                    # This is probably counterintuitive
+                    # NOTE: Same goes for layer_config above,
+                    # but this is less critical because layer config is likely static
+                    # TODO: change at some point.
+                    multiplier=self.direction_multiplier,


This behaviour is highly unintuitive, as the hooks are stored in pipeline but they still read the state from the RepeReadingControl algorithm after .run terminates.

We should refactor this before merging.

dtch1997 · 2024-01-17T19:06:11Z

Generally, we should try to ensure all relevant state that the hooks will reference, is encapsulated within the Pipeline class.
This could entail adding a separate HookState field. Or it could involve making each hook an object with its own state.

The focus should be on making it easy to modify:

which layers we apply the vectors at
the coefficients of the vectors
the vectors themselves (e.g. to test transferring vectors derived from another model)

dtch1997 · 2024-01-17T19:12:26Z

repepo/algorithms/repe.py

+        # Steering vector reading
+        # NOTE: The hooks read from this steering vector.
+        steering_vector = self._get_steering_vector(pipeline, dataset)
+
+        # Creating the hooks that will do steering vector control
+        # NOTE: How this works is that we create a context manager that creates a hook
+        # whenever we are in a `PipelineContext`'s scope.
+        # After exiting the context, the hook is deleted.
+
+        # The PipelineContext is created in both `pipeline.generate` or `pipeline.calculate_output_logprobs`


@chanind could you comment on whether I've described the logic here accurately?

It's not correct that the hook is deleted after exiting the context, but it could be a confusion between the Pipeline hook and the Pytorch hook. The pipeline hook is just in an array on the pipeline, and stays there until it's removed. The hook only gets applied to the model during pipeline.generate or pipeline.calculate_output_logprobs.

chanind · 2024-01-18T16:09:48Z

Closing, as this is now superceded by #69, #70, and #71

chanind added 2 commits January 8, 2024 23:25

adding a record_activations() function to make it easy to collect mod…

e9f56a4

…el activations

Merge branch 'rep-reading-hooks' into caaification

092593d

chanind added the WIP Temporarily not yet ready for review, more work required label Jan 16, 2024

replacing repe with our own CAA-esque implementation

7102692

chanind force-pushed the caaification branch from f79d844 to 7102692 Compare January 16, 2024 20:12

only patch generated tokens

c1b32b5

chanind requested a review from dtch1997 January 17, 2024 12:24

chanind added 2 commits January 17, 2024 15:46

fix generating start index selection

6b45b49

fixing pyright error

e66d6a4

chanind removed the WIP Temporarily not yet ready for review, more work required label Jan 17, 2024

chanind changed the title ~~WIP: CAA~~ CAA Jan 17, 2024

dtch1997 and others added 7 commits January 17, 2024 18:03

Add CAA datasets (#68)

64b7b19

* Add CAA datasets * Update makefile * Add test for make_ab_prompt --------- Co-authored-by: dtch1997 <dtch1997@users.noreply.github.com>

Add bitsandbytes, accelerate

f31a22c

WIP

d8c8b79

Hardcode second-last token activation position for steering vectors

9453875

Add notebook diffmerge package for pretty git diffs

345e3dd

Add CAA sanity check

dac52ae

Add note on how to change RepE directions

97977f1

dtch1997 reviewed Jan 17, 2024

View reviewed changes

Add note on how hooks work

05e9e1c

dtch1997 reviewed Jan 17, 2024

View reviewed changes

dtch1997 added 4 commits January 17, 2024 19:17

Add options to decouple reading and control

d6430dd

Add code to run CAA generate vectors

58a33b1

Add prompting with steering script

60f359a

Note TODOs

72f39e8

This was referenced Jan 18, 2024

CAA base #69

Merged

CAA tweaks / improvements #70

Merged

WIP: CAA experiments #71

Closed

chanind closed this Jan 18, 2024

chanind mentioned this pull request Jan 18, 2024

Cleaning up oddities with steering vecs and repe algo #72

Merged

dtch1997 deleted the caaification branch January 31, 2024 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAA #66

CAA #66

chanind commented Jan 16, 2024 •

edited

Loading

dtch1997 Jan 17, 2024

dtch1997 commented Jan 17, 2024 •

edited

Loading

dtch1997 Jan 17, 2024

chanind Jan 18, 2024

chanind commented Jan 18, 2024

CAA #66

CAA #66

Conversation

chanind commented Jan 16, 2024 • edited Loading

Steering Vectors

Pipeline hooks

dtch1997 Jan 17, 2024

Choose a reason for hiding this comment

dtch1997 commented Jan 17, 2024 • edited Loading

dtch1997 Jan 17, 2024

Choose a reason for hiding this comment

chanind Jan 18, 2024

Choose a reason for hiding this comment

chanind commented Jan 18, 2024

chanind commented Jan 16, 2024 •

edited

Loading

dtch1997 commented Jan 17, 2024 •

edited

Loading