Bump exllama to 0.1.17 #3847

jllllll · 2023-09-08T17:17:32Z

Checklist:

I have read the Contributing guidelines.

turboderp/exllama@f8e9d7e...8a1d330

Requires #3852 for proper functionality of CodeLlama models.

Ph0rk0z · 2023-09-08T21:40:40Z

IMO it's needed because now the behavior is different. Setting of 1e6 which reads from the yaml may cause it to double.

My idea is to have alpha work on read rope base and the actual base setting to override it. Some of the codellama tunes produce higher perplexity with a different base depending on how they were trained. It's a difference of at least 2 points for PTB_NEW, you can check yourself.

Something like


        if shared.args.alpha_value > 1 and shared.args.rope_freq_base == 0:
          config.alpha_value = shared.args.alpa_value
          config.calculate_rotary_embedding_base()
       elif shared.args.rope_freq_base > 0:
           config.rotaty_embedding_base = shared.args.rope_freq_base

Plus the YAML has to be changed to not set the base anymore.

Also next update will be new kernels so might be good to wait. There is a PR right now that will be merged soon.

jllllll · 2023-09-09T02:05:30Z

@Ph0rk0z
This is tricky since the other loaders also use shared.args.rope_freq_base.
Do you know if the other back-ends use the rope_theta value from config.json?
If they don't, then removing the rope_freq_base value from config.yaml will disrupt the functionality of CodeLlama models with the other loaders. At minimum, it will cause issues with GGML versions of the model. I think that GGUF has the relevant values hard-coded into the model file.

Ph0rk0z · 2023-09-09T11:19:12Z

Hmm.. true it is tricky which is why I wasn't a fan on him reading the theta from the config. Now that I slept on it, I think

 elif shared.args.rope_freq_base > 0:
           config.rotaty_embedding_base = shared.args.rope_freq_base

Will solve it since the 1e6 will just get applied without an alpha and be the same as what got read. The way the code is now, it will read 1e6 from theta and then convert the 1e6 from the YAML to alpha and add it for a total of 200k context. So definitely needs to be fixed. People already dislike the 34b and tunes due to poor performance caused by incorrect rope scaling.

It's merged though: turboderp/exllama#275

I'll have to try my code and do some perplexity tests to make sure it works.

fixed all the typos: #3852

turboderp/exllama@f8e9d7e...8a1d330

jllllll marked this pull request as draft September 9, 2023 01:11

Bump exllama to 0.1.17

3967328

turboderp/exllama@f8e9d7e...8a1d330

jllllll force-pushed the patch-7 branch from 8bd6540 to 3967328 Compare September 9, 2023 17:35

jllllll changed the title ~~Bump exllama to 0.1.16~~ Bump exllama to 0.1.17 Sep 9, 2023

jllllll marked this pull request as ready for review September 9, 2023 17:40

oobabooga merged commit 859b4fd into oobabooga:main Sep 11, 2023

jllllll deleted the patch-7 branch September 11, 2023 04:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump exllama to 0.1.17 #3847

Bump exllama to 0.1.17 #3847

jllllll commented Sep 8, 2023 •

edited

Loading

Ph0rk0z commented Sep 8, 2023 •

edited

Loading

jllllll commented Sep 9, 2023 •

edited

Loading

Ph0rk0z commented Sep 9, 2023 •

edited

Loading

Bump exllama to 0.1.17 #3847

Bump exllama to 0.1.17 #3847

Conversation

jllllll commented Sep 8, 2023 • edited Loading

Checklist:

Ph0rk0z commented Sep 8, 2023 • edited Loading

jllllll commented Sep 9, 2023 • edited Loading

Ph0rk0z commented Sep 9, 2023 • edited Loading

jllllll commented Sep 8, 2023 •

edited

Loading

Ph0rk0z commented Sep 8, 2023 •

edited

Loading

jllllll commented Sep 9, 2023 •

edited

Loading

Ph0rk0z commented Sep 9, 2023 •

edited

Loading