Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump exllama to 0.1.17 #3847

Merged
merged 1 commit into from
Sep 11, 2023
Merged

Bump exllama to 0.1.17 #3847

merged 1 commit into from
Sep 11, 2023

Conversation

jllllll
Copy link
Contributor

@jllllll jllllll commented Sep 8, 2023

Checklist:


turboderp/exllama@f8e9d7e...8a1d330


Requires #3852 for proper functionality of CodeLlama models.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Sep 8, 2023

IMO it's needed because now the behavior is different. Setting of 1e6 which reads from the yaml may cause it to double.

My idea is to have alpha work on read rope base and the actual base setting to override it. Some of the codellama tunes produce higher perplexity with a different base depending on how they were trained. It's a difference of at least 2 points for PTB_NEW, you can check yourself.

Something like


        if shared.args.alpha_value > 1 and shared.args.rope_freq_base == 0:
          config.alpha_value = shared.args.alpa_value
          config.calculate_rotary_embedding_base()
       elif shared.args.rope_freq_base > 0:
           config.rotaty_embedding_base = shared.args.rope_freq_base

Plus the YAML has to be changed to not set the base anymore.

Also next update will be new kernels so might be good to wait. There is a PR right now that will be merged soon.

@jllllll jllllll marked this pull request as draft September 9, 2023 01:11
@jllllll
Copy link
Contributor Author

jllllll commented Sep 9, 2023

@Ph0rk0z
This is tricky since the other loaders also use shared.args.rope_freq_base.
Do you know if the other back-ends use the rope_theta value from config.json?
If they don't, then removing the rope_freq_base value from config.yaml will disrupt the functionality of CodeLlama models with the other loaders. At minimum, it will cause issues with GGML versions of the model. I think that GGUF has the relevant values hard-coded into the model file.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Sep 9, 2023

Hmm.. true it is tricky which is why I wasn't a fan on him reading the theta from the config. Now that I slept on it, I think

 elif shared.args.rope_freq_base > 0:
           config.rotaty_embedding_base = shared.args.rope_freq_base

Will solve it since the 1e6 will just get applied without an alpha and be the same as what got read. The way the code is now, it will read 1e6 from theta and then convert the 1e6 from the YAML to alpha and add it for a total of 200k context. So definitely needs to be fixed. People already dislike the 34b and tunes due to poor performance caused by incorrect rope scaling.

It's merged though: turboderp/exllama#275

I'll have to try my code and do some perplexity tests to make sure it works.

fixed all the typos: #3852

@jllllll jllllll changed the title Bump exllama to 0.1.16 Bump exllama to 0.1.17 Sep 9, 2023
@jllllll jllllll marked this pull request as ready for review September 9, 2023 17:40
@oobabooga oobabooga merged commit 859b4fd into oobabooga:main Sep 11, 2023
@jllllll jllllll deleted the patch-7 branch September 11, 2023 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants