[CICL] cache flash_attn #223 #215

reymondzzzz · 2023-11-08T11:53:55Z

* Print statements for debugging and initial support for Code Llama * Added multiple print statements for debugging fine tuning * Added support for Code Llama 7b * Depending on the training parameters I set I either get an out of memory GPU error or ValueError(“optimizer got an empty parameter list”) * Code Llama fine-tuning but fails on checkpoint * commenting print statements * updating default config behavior * Begin adding encoding for Code Llama * adding BOS and EOS tokens for Code Llama, model running properly * getting rid of #? * Print statements for debugging and initial support for Code Llama * Added multiple print statements for debugging fine tuning * Added support for Code Llama 7b * Depending on the training parameters I set I either get an out of memory GPU error or ValueError(“optimizer got an empty parameter list”) * Code Llama fine-tuning but fails on checkpoint * commenting print statements * updating default config behavior * Begin adding encoding for Code Llama * adding BOS and EOS tokens for Code Llama, model running properly * getting rid of #?

saving in safe_tensors format

…for some models to set a lora to them)

TOKENIZERS_PARALLELISM=false while finetuning

add inference fixes for codellama

* add deepseek inference and finetuning * no extra kwargs * add deepseek-ai/deepseek-coder-5.7bmqa-base

Revert "move caps to the root" This reverts commit b440517.

* fixes * add extra ENVS to use fork() method inside the container

Revert "add base" This reverts commit af0d05a.

mitya52 and others added 30 commits November 1, 2023 10:02

sometimes nvidia-smi returns [N/A] temperature, so we need to handle it

44fcdf7

container fix

92bbf97

finetune filtering is almost working, no it is finetune's turn

b22b6c8

new datasource

23d3437

rm finetune filter from data pipeline

60c87f0

massive datapipeline refactoring

9acdb65

kind of working version

b1a69f2

flash sa

3091c30

flash attention for starcoders

f199232

relaxing requirements

2dcfb9f

cherry-picking fixes

ff31fc2

more logging

d17b1db

trainable_embeddings

31cb823

goodbye codify model

968a33c

saving in safe_tensors format

being able to load safetensors

4771dc9

fix flashattn dependency

dc92779

fix flashattn dependency

0007881

fixing layer retrieval for some models (now we correctly find layers …

3d3901f

…for some models to set a lora to them)

removed flash-att installation from docker file

6abd0a1

correct saving shared weights

af5546a

TOKENIZERS_PARALLELISM=false while finetuning

dump status each iteration

5e94cfe

log fix

a4835ac

forward packages explicitly

96be51f

rm encodings, legacy scratchpads

e6294bf

rm legacy code contrast diff format

c5caed6

codellama/7b fixes

687a524

add flash attention to codellama

122833a

add inference fixes for codellama

Deepseek models (#209)

c9d388a

* add deepseek inference and finetuning * no extra kwargs * add deepseek-ai/deepseek-coder-5.7bmqa-base

do not install flash-attention by default

9a4ba6a

mitya52 and others added 6 commits November 6, 2023 15:48

rename deepseek models

594b4b0

move caps to the root

1184ea3

Revert "move caps to the root" This reverts commit b440517.

Nightly fixes (#213)

c51dc14

* fixes * add extra ENVS to use fork() method inside the container

support HTTP_PROXY=http://x.x.x.x

bd8cbac

HTTP_PROXY fix async too

c2e2fd3

shorter loop local_files_only

20931af

reymondzzzz force-pushed the v1.2.0-cicl branch 10 times, most recently from 5586414 to 354f962 Compare November 8, 2023 18:59

add base

a65ee90

Revert "add base" This reverts commit af0d05a.

reymondzzzz force-pushed the v1.2.0-cicl branch from 354f962 to a65ee90 Compare November 8, 2023 19:11

reymondzzzz changed the title ~~add base~~ [CICL] cache flash_attn #223 Nov 9, 2023

reymondzzzz requested a review from mitya52 November 9, 2023 09:26

mitya52 approved these changes Nov 10, 2023

View reviewed changes

mitya52 changed the base branch from v1.2.0 to dev November 17, 2023 15:18

mitya52 changed the base branch from dev to main January 3, 2024 16:13

mitya52 changed the base branch from main to dev January 3, 2024 16:17

mitya52 closed this Jan 16, 2024

mitya52 deleted the v1.2.0-cicl branch January 16, 2024 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CICL] cache flash_attn #223 #215

[CICL] cache flash_attn #223 #215

reymondzzzz commented Nov 8, 2023 •

edited

Loading

[CICL] cache flash_attn #223 #215

[CICL] cache flash_attn #223 #215

Conversation

reymondzzzz commented Nov 8, 2023 • edited Loading

reymondzzzz commented Nov 8, 2023 •

edited

Loading