Validation #1033

XiaohanZhangCMU · 2024-03-14T20:19:39Z

No description provided.

…into validation

Red button because CI running jobs it doesn't need. Tests passed on main.

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

… finetuning (mosaicml#985) The main purpose of this PR is to support training on non-terminal responses in multi-round chats. This is achieved by tokenizing at the level of conversation "turns" and exposing some options for what turns are used as training targets (i.e. generate loss). This also adds support for treating prompt tokens as loss-generating. The script for converting a finetuning dataset to streaming has also been updated (with some bug fixes).

Co-authored-by: Max Marion <mmarion538@gmail.com>

* Fix typo in monolithic chkpt callback docs * reorder to match function signature

Reverts part of the change made in https://github.com/mosaicml/llm-foundry/pull/1000/files#diff-4a2765c2cfcbd3804a66aab805cb92ddda74de1730923cc5bf53671d0beccf06L11

Validation (mosaicml#1027)

xiaohanzhan-db and others added 30 commits December 22, 2023 22:18

add validation script

8cb6522

update

c59c11f

change token count function

66f34eb

reorganize cells

2cd387b

Add unit tests

3eac3bf

Add a printout for CPT

d2d9767

update question

be25591

Add questions

4651be7

Fix lints

5cd6a94

Merge branch 'main' into validation

8e2c1f4

update format

e6e4a81

Merge branch 'validation' of github.com:XiaohanZhangCMU/llm-foundryX …

34c5690

…into validation

update

1668b9a

nb source

2219135

add validation script

86c6e87

update

678b376

change token count function

297e057

reorganize cells

09d0ebb

Add unit tests

460df65

Add a printout for CPT

3ffd200

update question

9362886

Add questions

898e5ac

Fix lints

a4bef71

update format

4ca9cc6

update

d636a0f

nb source

827d155

Remove license insert for validation notebook

6bbf3fc

Merge branch 'validation' of github.com:XiaohanZhangCMU/llm-foundryX …

4f6a4fb

…into validation

Add validation utils

5966b68

Merge branch 'main' into validation

da17813

dakinggg and others added 28 commits March 5, 2024 11:06

Build torch 2.2.1 images (mosaicml#1010)

fd8cbaf

add 2.2.1 tests (mosaicml#1011)

5728969

Bump min torch pin (mosaicml#1013)

f4f6414

Red button because CI running jobs it doesn't need. Tests passed on main.

Fix extra BOS token in front of response for some tokenizers (mosaicm…

cf0f5e5

…l#1003)

Bump min composer pin (mosaicml#1015)

86c8746

add default for eval interval (mosaicml#987)

5261a55

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

Add support for olmo (mosaicml#1016)

93d7a05

Fix profiling packing ratio to explicitly say 1 (mosaicml#1019)

c2aec30

Bump transformers to 4.38.2 (mosaicml#1018)

2b17497

that kwargs (mosaicml#1020)

36ab1ba

Update readme with pytorch 2.2.1 (mosaicml#1021)

2fc5d33

Add code import to train/eval scripts (mosaicml#1002)

d61c53d

finish (mosaicml#1022)

4e43792

Co-authored-by: Max Marion <mmarion538@gmail.com>

Bump version to 0.6.0 (mosaicml#1023)

257c25d

Fix typo in monolithic chkpt callback docs (mosaicml#1024)

4e8a875

* Fix typo in monolithic chkpt callback docs * reorder to match function signature

update pip install link

1a510ff

Change done file location

530a55a

Create the dest folder

81c3757

Allow code-quality workflow to be callable (mosaicml#1026)

14e2dec

Reverts part of the change made in https://github.com/mosaicml/llm-foundry/pull/1000/files#diff-4a2765c2cfcbd3804a66aab805cb92ddda74de1730923cc5bf53671d0beccf06L11

update notebook

f88917d

update

4c86f74

Merge branch 'byod/data_validation' into validation

962974b

Merge pull request #1 from mosaicml/byod/data_validation

67f7b4c

Validation (mosaicml#1027)

update notebook

28cd2e6

fix conflict

9a19d8a

update token_counts

de90934

update pip install list

61adb43

XiaohanZhangCMU requested a review from a team as a code owner March 14, 2024 20:19

XiaohanZhangCMU merged commit c404dc7 into mosaicml:byod/data_validation Mar 14, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation #1033

Validation #1033

XiaohanZhangCMU commented Mar 14, 2024

Validation #1033

Validation #1033

Conversation

XiaohanZhangCMU commented Mar 14, 2024