Added max sample length filter to unlearning and retaining datasets #156

TheRootOf3 · 2024-09-23T23:06:19Z

Added a way to set the maximum sample length (in tokens) for both the unlearning and retaining datasets. This was implemented to limit the GPU memory usage on longer samples.

This change introduces another argument to the unlearn_harm.py script: max_sample_length, which represents the maximum number of tokens (after adding the prompt) of a sample. Essentially, this will be the maximum dimension of a tensor that goes through the model.

Signed-off-by: TheRootOf3 <aceszablewski@gmail.com>

Signed-off-by: Szymon Duchniewicz <szymon.duchniewicz.20@ucl.ac.uk>

…tructor. Signed-off-by: Szymon Duchniewicz <szymon.duchniewicz.20@ucl.ac.uk>

TheRootOf3 and others added 3 commits September 23, 2024 17:30

Added max sample length functionality.

8815322

Signed-off-by: TheRootOf3 <aceszablewski@gmail.com>

Remove code for printing sample length stats in unlearn dataset.

f133847

Signed-off-by: Szymon Duchniewicz <szymon.duchniewicz.20@ucl.ac.uk>

Add parameters for loading normal dataset with updated DataloaderCons…

58a9352

…tructor. Signed-off-by: Szymon Duchniewicz <szymon.duchniewicz.20@ucl.ac.uk>

TheRootOf3 requested a review from Willmish September 23, 2024 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added max sample length filter to unlearning and retaining datasets #156

Added max sample length filter to unlearning and retaining datasets #156

TheRootOf3 commented Sep 23, 2024

Added max sample length filter to unlearning and retaining datasets #156

Are you sure you want to change the base?

Added max sample length filter to unlearning and retaining datasets #156

Conversation

TheRootOf3 commented Sep 23, 2024