Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Min P as a sampler option in HF loaders #4449

Merged
merged 4 commits into from
Nov 2, 2023

Conversation

kalomaze
Copy link
Contributor

@kalomaze kalomaze commented Nov 2, 2023

Checklist:

A backport of the Min P sampler option that was merged into llama.cpp recently. I think this is a suitable alternative to accomodate for the flaws of both Top P (Top P doesn't work well if there's not good concentration of high probability for top tokens) and Top K (can limit diversity unnecessarily), and it has gained a very positive reception.

image

@kalomaze
Copy link
Contributor Author

kalomaze commented Nov 2, 2023

Technically the min_tokens_to_keep is not being properly followed here, even if that option usually goes unused, which probably needs to be changed before it gets merged.

@kalomaze
Copy link
Contributor Author

kalomaze commented Nov 2, 2023

Should be good to merge now.

EDIT: Somehow, Temperature is affecting how this chooses from the lists of considered tokens, but in llama.cpp, this would typically be measured and ran before Temperature which comes last, so the fact it's impacting the measurement means the HF samplers are coming after temp randomization, I believe.
That is potentially worrying since I assumed the standard was to apply temperature last? Though this might just be a design decision that hasn't been discussed. I know in koboldcpp you can customize a sampler order. Do we have that at all?

@kalomaze kalomaze changed the title Implement Min P as a sampler option in HF sampler hijack Implement Min P as a sampler option in HF loaders Nov 2, 2023
@oobabooga
Copy link
Owner

I was about to add min p manually since it looks cool. Thanks for the PR.

About the order, temperature does come before top p/top k in the transformers library:

https://github.com/huggingface/transformers/blob/acc394c4f5e1283c19783581790b3dc3105a3697/src/transformers/generation/utils.py#L825

I couldn't find any information as to how OpenAI does it.

Personally, I think that it makes sense to use temperature first.

@kalomaze
Copy link
Contributor Author

kalomaze commented Nov 2, 2023

I was about to add min p manually since it looks cool. Thanks for the PR.

About the order, temperature does come before top p/top k in the transformers library:

https://github.com/huggingface/transformers/blob/acc394c4f5e1283c19783581790b3dc3105a3697/src/transformers/generation/utils.py#L825

I couldn't find any information as to how OpenAI does it.

Personally, I think that it makes sense to use temperature first.

I can see arguments for both applying it last like how it is done in llama.cpp, and applying it like how it's done 'officially', so I think having this configurable / customizable to the end user would be helpful. Is that possible without changing how the transformers library functions?

Btw this is how it looks in GPT2, temp comes first there:

image

@kalomaze
Copy link
Contributor Author

kalomaze commented Nov 2, 2023

Changing how the sampler order logic works is out of scope for this PR though, so I'm ready for this to be merged in the meantime

@oobabooga oobabooga merged commit 367e5e6 into oobabooga:dev Nov 2, 2023
@BadisG
Copy link
Contributor

BadisG commented Nov 3, 2023

I think you should put the default min_p at 0 instead of 1, having it at 1 makes everything else useless because it's the equivalent to top_k = 1

@kalomaze
Copy link
Contributor Author

kalomaze commented Nov 3, 2023

I think you should put the default min_p at 0 instead of 1, having it at 1 makes everything else useless because it's the equivalent to top_k = 1

Is it not 0 by default? I thought I set it to that

@oobabooga
Copy link
Owner

Should be fixed now 4537853

@BadisG
Copy link
Contributor

BadisG commented Nov 3, 2023

It's still at min_p = 1 by default for me

@oobabooga
Copy link
Owner

I messed up again, should be good after 7f9c1cb

@BadisG
Copy link
Contributor

BadisG commented Nov 3, 2023

All right it's working now, thanks!

@BadisG
Copy link
Contributor

BadisG commented Nov 3, 2023

I've also noticed that min_p and presence_penalty don't mix well, from time to time my generation stops abruptly because min_p removes some tokens and presence_penalty then removes the remaining tokens, leaving nothing to go on. I think this can be fixed by reversing the order of the samplers, presence_penalty should be applied before min_p. Maybe we should have the choice of sampler order, that would be a good feature to add, a bit like KoboldAI actually.

Vechtomov pushed a commit to Vechtomov/text-generation-webui that referenced this pull request Nov 9, 2023
commit e18a046
Author: kabachuha <artemkhrapov2001@yandex.ru>
Date:   Sat Nov 4 22:12:51 2023 +0300

    fix openai extension not working because of absent new defaults (oobabooga#4477)

commit b7a409e
Merge: b5c5304 fb3bd02
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 15:04:43 2023 -0300

    Merge pull request oobabooga#4476 from oobabooga/dev

    Merge dev branch

commit fb3bd02
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 11:02:24 2023 -0700

    Update docs

commit 1d8c7c1
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 11:01:15 2023 -0700

    Update docs

commit b5c5304
Merge: 262f8ae 40f7f37
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 14:19:55 2023 -0300

    Merge pull request oobabooga#4475 from oobabooga/dev

    Merge dev branch

commit 40f7f37
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 10:12:06 2023 -0700

    Update requirements

commit 2081f43
Author: Orang <51061118+Soefati@users.noreply.github.com>
Date:   Sun Nov 5 00:00:24 2023 +0700

    Bump transformers to 4.35.* (oobabooga#4474)

commit 4766a57
Author: feng lui <3090641@qq.com>
Date:   Sun Nov 5 00:59:33 2023 +0800

    transformers: add use_flash_attention_2 option (oobabooga#4373)

commit add3593
Author: wouter van der plas <2423856+wvanderp@users.noreply.github.com>
Date:   Sat Nov 4 17:41:42 2023 +0100

    fixed two links in the ui (oobabooga#4452)

commit cfbd108
Author: Casper <casperbh.96@gmail.com>
Date:   Sat Nov 4 17:09:41 2023 +0100

    Bump AWQ to 0.1.6 (oobabooga#4470)

commit aa5d671
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 13:09:07 2023 -0300

    Add temperature_last parameter (oobabooga#4472)

commit 1ab8700
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 17:38:19 2023 -0700

    Change frequency/presence penalty ranges

commit 45fcb60
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 11:29:31 2023 -0700

    Make truncation_length_max apply to max_seq_len/n_ctx

commit 7f9c1cb
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 08:25:22 2023 -0700

    Change min_p default to 0.0

commit 4537853
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 08:13:50 2023 -0700

    Change min_p default to 1.0

commit 367e5e6
Author: kalomaze <66376113+kalomaze@users.noreply.github.com>
Date:   Thu Nov 2 14:32:51 2023 -0500

    Implement Min P as a sampler option in HF loaders (oobabooga#4449)

commit fcb7017
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 12:24:09 2023 -0700

    Remove a checkbox

commit fdcaa95
Author: Julien Chaumond <julien@huggingface.co>
Date:   Thu Nov 2 20:20:54 2023 +0100

    transformers: Add a flag to force load from safetensors (oobabooga#4450)

commit c065547
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 11:23:04 2023 -0700

    Add cache_8bit option

commit 42f8163
Merge: 77abd9b a56ef2a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 11:09:26 2023 -0700

    Merge remote-tracking branch 'refs/remotes/origin/dev' into dev

commit 77abd9b
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 08:19:42 2023 -0700

    Add no_flash_attn option

commit a56ef2a
Author: Julien Chaumond <julien@huggingface.co>
Date:   Thu Nov 2 18:07:08 2023 +0100

    make torch.load a bit safer (oobabooga#4448)

commit deba039
Author: deevis <darren.hicks@gmail.com>
Date:   Tue Oct 31 22:51:00 2023 -0600

    (fix): OpenOrca-Platypus2 models should use correct instruction_template and custom_stopping_strings (oobabooga#4435)

commit aaf726d
Author: Mehran Ziadloo <mehranziadloo@gmail.com>
Date:   Tue Oct 31 21:29:57 2023 -0700

    Updating the shared settings object when loading a model (oobabooga#4425)

commit 9bd0724
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Oct 31 20:57:56 2023 -0700

    Change frequency/presence penalty ranges

commit 6b7fa45
Author: Orang <51061118+Soefati@users.noreply.github.com>
Date:   Wed Nov 1 05:12:14 2023 +0700

    Update exllamav2 version (oobabooga#4417)

commit 41e159e
Author: Casper <casperbh.96@gmail.com>
Date:   Tue Oct 31 23:11:22 2023 +0100

    Bump AutoAWQ to v0.1.5 (oobabooga#4410)

commit 0707ed7
Author: Meheret <101792782+senadev42@users.noreply.github.com>
Date:   Wed Nov 1 01:09:05 2023 +0300

    updated wiki link (oobabooga#4415)

commit 262f8ae
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Oct 27 06:49:14 2023 -0700

    Use default gr.Dataframe for evaluation table

commit f481ce3
Author: James Braza <jamesbraza@gmail.com>
Date:   Thu Oct 26 21:02:28 2023 -0700

    Adding `platform_system` to `autoawq` (oobabooga#4390)

commit af98587
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Oct 27 00:46:16 2023 -0300

    Update accelerate requirement from ==0.23.* to ==0.24.* (oobabooga#4400)

commit 839a87b
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Oct 26 20:26:25 2023 -0700

    Fix is_ccl_available & is_xpu_available imports

commit 778a010
Author: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Date:   Fri Oct 27 08:09:51 2023 +0530

    Intel Gpu support initialization (oobabooga#4340)

commit 317e2c8
Author: GuizzyQC <86683381+GuizzyQC@users.noreply.github.com>
Date:   Thu Oct 26 22:03:21 2023 -0400

    sd_api_pictures: fix Gradio warning message regarding custom value (oobabooga#4391)

commit 92b2f57
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Oct 26 18:57:32 2023 -0700

    Minor metadata bug fix (second attempt)

commit 2d97897
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Oct 25 11:21:18 2023 -0700

    Don't install flash-attention on windows + cuda 11

commit 0ced78f
Author: LightningDragon <lightningdragon96@gmail.com>
Date:   Wed Oct 25 09:15:34 2023 -0600

    Replace hashlib.sha256 with hashlib.file_digest so we don't need to load entire files into ram before hashing them. (oobabooga#4383)

commit 72f6fc6
Author: tdrussell <6509934+tdrussell@users.noreply.github.com>
Date:   Wed Oct 25 10:10:28 2023 -0500

    Rename additive_repetition_penalty to presence_penalty, add frequency_penalty (oobabooga#4376)

commit ef1489c
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 20:45:43 2023 -0700

    Remove unused parameter in AutoAWQ

commit 1edf321
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 13:09:03 2023 -0700

    Lint

commit 280ae72
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 13:07:17 2023 -0700

    Organize

commit 49e5eec
Merge: 82c11be 4bc4113
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 12:54:05 2023 -0700

    Merge remote-tracking branch 'refs/remotes/origin/main'

commit 82c11be
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 12:49:07 2023 -0700

    Update 04 - Model Tab.md

commit 306d764
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 12:46:24 2023 -0700

    Minor metadata bug fix

commit 4bc4113
Author: adrianfiedler <adrian_fiedler@msn.com>
Date:   Mon Oct 23 19:09:57 2023 +0200

    Fix broken links (oobabooga#4367)

    ---------

    Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>

commit 92691ee
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 09:57:44 2023 -0700

    Disable trust_remote_code by default
@kalomaze
Copy link
Contributor Author

kalomaze commented Nov 11, 2023

I've also noticed that min_p and presence_penalty don't mix well, from time to time my generation stops abruptly because min_p removes some tokens and presence_penalty then removes the remaining tokens, leaving nothing to go on. I think this can be fixed by reversing the order of the samplers, presence_penalty should be applied before min_p. Maybe we should have the choice of sampler order, that would be a good feature to add, a bit like KoboldAI actually.

Presence penalty... removes the tokens? I thought the rep pen (and associated variations of it) are biasing the logits and not actually truncating them, that's weird. And in koboldcpp rep pen comes first, so it being applied after the other truncation samplers is also foreign to me. I think the default kobold sampler order should be adapted + customizable

@ariez-xyz
Copy link

As far as I can tell, minP is missing from the generation parameters panel when llama.cpp is selected as a loader, though it works with llamacpp_HF. Could it be added to the llama.cpp settings as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants