Implement supervised fine-tuning #31

dtch1997 · 2023-12-06T14:29:14Z

Implement supervised fine-tuning within our framework in repepo/algorithms/sft.py
Add example usage in same script under main guard
Added some other utilities like a WandbLogger instance
Laid some groundwork for eval callbacks within the run loop, though I think maybe this should be derived from the Benchmark and Evaluator classes
Implemented make_dataset from a DatasetSpec which allows using a custom split of the dataset.

Notes

Training configuration can be updated from command line. See example in comment below.
Had to slightly modify algorithm.run API to allow for logging and eval callbacks. Maybe we can just allow algorithm.run to accept arbitrary kwargs.
Added to_dict and from_dict methods to Example and Completion in repepo.core.types.

dtch1997 · 2023-12-06T14:54:24Z

SFT overfits a small dataset.

To reproduce:

python repepo/algorithms/sft.py --sft.batch_size 4 --wandb.track True

Reference: https://wandb.ai/ucl-dark/Rep-Eng/runs/zuuj7mvq?workspace=user-dtch1997

Notes:

memory requirements seem absurdly high. On a RTX 3080, I was only able to run with Pythia70m and a batch size of 4. I'm sure this is wrong as I've run with much higher batch sizes in the past.

repepo/algorithms/callback.py

dtch1997 · 2023-12-06T15:08:14Z

Rebased onto main which has updated README

chanind

This could use some basic test coverage as well to ensure it works as expected

chanind · 2023-12-06T14:57:44Z

repepo/algorithms/callback.py

+
+class EvalCallback:
+    def __init__(self, val_datasets: Dict[str, BaseDataset]):
+        self.metric_fns = Metrics()


Can we use the same Evaluator objects that we use for benchmarking? The val_dataset should respond to the same metrics as the test_dataset I would think. Unless the idea is that we can set a specific validator that should be used by SFT to pick the best performing result? Regardless, the Evaluator type already returns a float so would be suited to this

Sounds good, I haven't looked closely at Evaluator class yet but will do so.

repepo/algorithms/sft.py

repepo/core/types.py

repepo/data/__init__.py

repepo/algorithms/sft.py

chanind · 2023-12-06T15:28:04Z

repepo/algorithms/sft.py

+                torch.nn.utils.clip_grad.clip_grad_norm_(model.parameters(), 1.0)
+                optimizer.step()
+                scheduler.step()
+                print(f"epoch : {epoch} | step: {step} | loss: {loss}")


nit: would be better to use tqdm to get an updating progress bar rather than printing directly. It would also be good to add a way to disable this outputting to the screen, maybe with a param to run() called verbose: bool? We can figure that out later though potentially, as it's more polish than core functionality

yeah let's do this later. I also want to figure out how to separate the display from the core functionality.

dtch1997 · 2023-12-06T18:11:31Z

@chanind do you have any idea why tests/run/test_run_repe.py is failing in CI? I can't reproduce the failure locally from this branch

tests/conftest.py

dtch1997 · 2023-12-13T17:11:32Z

Will think about how to integrate Evaluator into a callback in a subsequent PR

chanind · 2023-12-13T17:28:54Z

tests/algorithms/test_sft.py

+
+    new_pipeline = algorithm.run(pipeline, dataset=dataset)
+
+    # Skip testing outputs as they will be gibberish


You can try testing with larger_model instead of model - that's pythia 360m and seems to generate real answers, e.g. https://github.com/dtch1997/repepo/blob/main/tests/core/test_benchmark.py#L10. Alternatively, we could try overfitting on a few examples and just asserting that it does output the stuff we overfit on. For instance, we could overfit on a single wrong example, like "Paris is in" "Germany", just to verify that it is training.

Okay. but not gonna do it immediately. Will create an issue in backlog

chanind · 2023-12-13T17:29:47Z

tests/conftest.py

-    return AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")
+    return AutoTokenizer.from_pretrained(
+        "EleutherAI/pythia-70m",
+        model_max_length=128,  # Required to avoid overflow error in SFT


👍 this is a good call, I feel like this has come up in other places too

chanind

The test could be improved, but otherwise LGTM

chanind reviewed Dec 6, 2023

View reviewed changes

repepo/algorithms/callback.py Outdated Show resolved Hide resolved

dtch1997 added 6 commits December 6, 2023 15:05

WIP

6c3b9a9

SFT working

3f6cc30

remove duplicate AverageMeter implementation

b61d4fc

Add ability to run on custom splits of dataset

004e417

Remove broken code

966185b

Fix Pyright issues

0bf81b0

dtch1997 force-pushed the sft branch from 4b2f942 to 0bf81b0 Compare December 6, 2023 15:07

chanind reviewed Dec 6, 2023

View reviewed changes

Fix Pyright issues, again

d16dfdb

chanind reviewed Dec 6, 2023

View reviewed changes

repepo/data/__init__.py Show resolved Hide resolved

chanind reviewed Dec 6, 2023

View reviewed changes

repepo/data/__init__.py Outdated Show resolved Hide resolved

chanind reviewed Dec 6, 2023

View reviewed changes

repepo/data/__init__.py Show resolved Hide resolved

chanind reviewed Dec 6, 2023

View reviewed changes

repepo/data/__init__.py Outdated Show resolved Hide resolved

chanind reviewed Dec 6, 2023

View reviewed changes

repepo/algorithms/sft.py Outdated Show resolved Hide resolved

chanind reviewed Dec 6, 2023

View reviewed changes

dtch1997 added 11 commits December 6, 2023 15:41

Update example, completion to dataclasses

3083aa4

fix data generation

9a6fefe

Fix SFT inheritance

1e0196a

Remove unused string methods

99a1495

Seeded random dataset shuffle

2c84370

Rename BaseAlgorithm to Algorithm

ed81657

fix pyright

36f0b04

Add test for SFT

9f71d7a

minor

5752166

Fix nit

6fa87b5

Fix tests

2d04144

chanind reviewed Dec 11, 2023

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

dtch1997 added 3 commits December 13, 2023 16:56

Merge branch 'main' into sft

4adab46

Update default huggingface cache dir

59c9484

Modify tokenizer config in conftest

63f6b28

Abstract away logger

39dfe4c

chanind reviewed Dec 13, 2023

View reviewed changes

chanind approved these changes Dec 13, 2023

View reviewed changes

dtch1997 merged commit 1d68251 into main Dec 13, 2023
2 checks passed

dtch1997 deleted the sft branch January 31, 2024 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement supervised fine-tuning #31

Implement supervised fine-tuning #31

dtch1997 commented Dec 6, 2023 •

edited

Loading

dtch1997 commented Dec 6, 2023 •

edited

Loading

dtch1997 commented Dec 6, 2023

chanind left a comment

chanind Dec 6, 2023

dtch1997 Dec 6, 2023

chanind Dec 6, 2023

dtch1997 Dec 6, 2023 •

edited

Loading

dtch1997 commented Dec 6, 2023 •

edited

Loading

dtch1997 commented Dec 13, 2023

chanind Dec 13, 2023

dtch1997 Dec 13, 2023

chanind Dec 13, 2023

chanind left a comment


		new_pipeline = algorithm.run(pipeline, dataset=dataset)

		# Skip testing outputs as they will be gibberish

Implement supervised fine-tuning #31

Implement supervised fine-tuning #31

Conversation

dtch1997 commented Dec 6, 2023 • edited Loading

dtch1997 commented Dec 6, 2023 • edited Loading

dtch1997 commented Dec 6, 2023

chanind left a comment

Choose a reason for hiding this comment

chanind Dec 6, 2023

Choose a reason for hiding this comment

dtch1997 Dec 6, 2023

Choose a reason for hiding this comment

chanind Dec 6, 2023

Choose a reason for hiding this comment

dtch1997 Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

dtch1997 commented Dec 6, 2023 • edited Loading

dtch1997 commented Dec 13, 2023

chanind Dec 13, 2023

Choose a reason for hiding this comment

dtch1997 Dec 13, 2023

Choose a reason for hiding this comment

chanind Dec 13, 2023

Choose a reason for hiding this comment

chanind left a comment

Choose a reason for hiding this comment

dtch1997 commented Dec 6, 2023 •

edited

Loading

dtch1997 commented Dec 6, 2023 •

edited

Loading

dtch1997 Dec 6, 2023 •

edited

Loading

dtch1997 commented Dec 6, 2023 •

edited

Loading