Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CICL] cache flash_attn #223 #215

Closed
wants to merge 37 commits into from
Closed

[CICL] cache flash_attn #223 #215

wants to merge 37 commits into from

Conversation

reymondzzzz
Copy link
Member

@reymondzzzz reymondzzzz commented Nov 8, 2023

mitya52 and others added 30 commits November 1, 2023 10:02
* Print statements for debugging and initial support for Code Llama

* Added multiple print statements for debugging fine tuning
* Added support for Code Llama 7b
* Depending on the training parameters I set I either get an out of memory GPU error or ValueError(“optimizer got an empty parameter list”)

* Code Llama fine-tuning but fails on checkpoint

* commenting print statements

* updating default config behavior

* Begin adding encoding for Code Llama

* adding BOS and EOS tokens for Code Llama, model running properly

* getting rid of #?

* Print statements for debugging and initial support for Code Llama

* Added multiple print statements for debugging fine tuning
* Added support for Code Llama 7b
* Depending on the training parameters I set I either get an out of memory GPU error or ValueError(“optimizer got an empty parameter list”)

* Code Llama fine-tuning but fails on checkpoint

* commenting print statements

* updating default config behavior

* Begin adding encoding for Code Llama

* adding BOS and EOS tokens for Code Llama, model running properly

* getting rid of #?
saving in safe_tensors format
TOKENIZERS_PARALLELISM=false while finetuning
add inference fixes for codellama
* add deepseek inference and finetuning

* no extra kwargs

* add deepseek-ai/deepseek-coder-5.7bmqa-base
mitya52 and others added 6 commits November 6, 2023 15:48
@reymondzzzz reymondzzzz force-pushed the v1.2.0-cicl branch 10 times, most recently from 5586414 to 354f962 Compare November 8, 2023 18:59
Revert "add base"

This reverts commit af0d05a.
@reymondzzzz reymondzzzz changed the title add base [CICL] cache flash_attn #223 Nov 9, 2023
@mitya52 mitya52 changed the base branch from v1.2.0 to dev November 17, 2023 15:18
@mitya52 mitya52 changed the base branch from dev to main January 3, 2024 16:13
@mitya52 mitya52 changed the base branch from main to dev January 3, 2024 16:17
@mitya52 mitya52 closed this Jan 16, 2024
@mitya52 mitya52 deleted the v1.2.0-cicl branch January 16, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants