Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit e18a046
Author: kabachuha <artemkhrapov2001@yandex.ru>
Date:   Sat Nov 4 22:12:51 2023 +0300

    fix openai extension not working because of absent new defaults (oobabooga#4477)

commit b7a409e
Merge: b5c5304 fb3bd02
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 15:04:43 2023 -0300

    Merge pull request oobabooga#4476 from oobabooga/dev

    Merge dev branch

commit fb3bd02
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 11:02:24 2023 -0700

    Update docs

commit 1d8c7c1
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 11:01:15 2023 -0700

    Update docs

commit b5c5304
Merge: 262f8ae 40f7f37
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 14:19:55 2023 -0300

    Merge pull request oobabooga#4475 from oobabooga/dev

    Merge dev branch

commit 40f7f37
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 10:12:06 2023 -0700

    Update requirements

commit 2081f43
Author: Orang <51061118+Soefati@users.noreply.github.com>
Date:   Sun Nov 5 00:00:24 2023 +0700

    Bump transformers to 4.35.* (oobabooga#4474)

commit 4766a57
Author: feng lui <3090641@qq.com>
Date:   Sun Nov 5 00:59:33 2023 +0800

    transformers: add use_flash_attention_2 option (oobabooga#4373)

commit add3593
Author: wouter van der plas <2423856+wvanderp@users.noreply.github.com>
Date:   Sat Nov 4 17:41:42 2023 +0100

    fixed two links in the ui (oobabooga#4452)

commit cfbd108
Author: Casper <casperbh.96@gmail.com>
Date:   Sat Nov 4 17:09:41 2023 +0100

    Bump AWQ to 0.1.6 (oobabooga#4470)

commit aa5d671
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Sat Nov 4 13:09:07 2023 -0300

    Add temperature_last parameter (oobabooga#4472)

commit 1ab8700
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 17:38:19 2023 -0700

    Change frequency/presence penalty ranges

commit 45fcb60
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 11:29:31 2023 -0700

    Make truncation_length_max apply to max_seq_len/n_ctx

commit 7f9c1cb
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 08:25:22 2023 -0700

    Change min_p default to 0.0

commit 4537853
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Nov 3 08:13:50 2023 -0700

    Change min_p default to 1.0

commit 367e5e6
Author: kalomaze <66376113+kalomaze@users.noreply.github.com>
Date:   Thu Nov 2 14:32:51 2023 -0500

    Implement Min P as a sampler option in HF loaders (oobabooga#4449)

commit fcb7017
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 12:24:09 2023 -0700

    Remove a checkbox

commit fdcaa95
Author: Julien Chaumond <julien@huggingface.co>
Date:   Thu Nov 2 20:20:54 2023 +0100

    transformers: Add a flag to force load from safetensors (oobabooga#4450)

commit c065547
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 11:23:04 2023 -0700

    Add cache_8bit option

commit 42f8163
Merge: 77abd9b a56ef2a
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 11:09:26 2023 -0700

    Merge remote-tracking branch 'refs/remotes/origin/dev' into dev

commit 77abd9b
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Nov 2 08:19:42 2023 -0700

    Add no_flash_attn option

commit a56ef2a
Author: Julien Chaumond <julien@huggingface.co>
Date:   Thu Nov 2 18:07:08 2023 +0100

    make torch.load a bit safer (oobabooga#4448)

commit deba039
Author: deevis <darren.hicks@gmail.com>
Date:   Tue Oct 31 22:51:00 2023 -0600

    (fix): OpenOrca-Platypus2 models should use correct instruction_template and custom_stopping_strings (oobabooga#4435)

commit aaf726d
Author: Mehran Ziadloo <mehranziadloo@gmail.com>
Date:   Tue Oct 31 21:29:57 2023 -0700

    Updating the shared settings object when loading a model (oobabooga#4425)

commit 9bd0724
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Tue Oct 31 20:57:56 2023 -0700

    Change frequency/presence penalty ranges

commit 6b7fa45
Author: Orang <51061118+Soefati@users.noreply.github.com>
Date:   Wed Nov 1 05:12:14 2023 +0700

    Update exllamav2 version (oobabooga#4417)

commit 41e159e
Author: Casper <casperbh.96@gmail.com>
Date:   Tue Oct 31 23:11:22 2023 +0100

    Bump AutoAWQ to v0.1.5 (oobabooga#4410)

commit 0707ed7
Author: Meheret <101792782+senadev42@users.noreply.github.com>
Date:   Wed Nov 1 01:09:05 2023 +0300

    updated wiki link (oobabooga#4415)

commit 262f8ae
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Fri Oct 27 06:49:14 2023 -0700

    Use default gr.Dataframe for evaluation table

commit f481ce3
Author: James Braza <jamesbraza@gmail.com>
Date:   Thu Oct 26 21:02:28 2023 -0700

    Adding `platform_system` to `autoawq` (oobabooga#4390)

commit af98587
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Oct 27 00:46:16 2023 -0300

    Update accelerate requirement from ==0.23.* to ==0.24.* (oobabooga#4400)

commit 839a87b
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Oct 26 20:26:25 2023 -0700

    Fix is_ccl_available & is_xpu_available imports

commit 778a010
Author: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Date:   Fri Oct 27 08:09:51 2023 +0530

    Intel Gpu support initialization (oobabooga#4340)

commit 317e2c8
Author: GuizzyQC <86683381+GuizzyQC@users.noreply.github.com>
Date:   Thu Oct 26 22:03:21 2023 -0400

    sd_api_pictures: fix Gradio warning message regarding custom value (oobabooga#4391)

commit 92b2f57
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Thu Oct 26 18:57:32 2023 -0700

    Minor metadata bug fix (second attempt)

commit 2d97897
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Wed Oct 25 11:21:18 2023 -0700

    Don't install flash-attention on windows + cuda 11

commit 0ced78f
Author: LightningDragon <lightningdragon96@gmail.com>
Date:   Wed Oct 25 09:15:34 2023 -0600

    Replace hashlib.sha256 with hashlib.file_digest so we don't need to load entire files into ram before hashing them. (oobabooga#4383)

commit 72f6fc6
Author: tdrussell <6509934+tdrussell@users.noreply.github.com>
Date:   Wed Oct 25 10:10:28 2023 -0500

    Rename additive_repetition_penalty to presence_penalty, add frequency_penalty (oobabooga#4376)

commit ef1489c
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 20:45:43 2023 -0700

    Remove unused parameter in AutoAWQ

commit 1edf321
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 13:09:03 2023 -0700

    Lint

commit 280ae72
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 13:07:17 2023 -0700

    Organize

commit 49e5eec
Merge: 82c11be 4bc4113
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 12:54:05 2023 -0700

    Merge remote-tracking branch 'refs/remotes/origin/main'

commit 82c11be
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 12:49:07 2023 -0700

    Update 04 - Model Tab.md

commit 306d764
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 12:46:24 2023 -0700

    Minor metadata bug fix

commit 4bc4113
Author: adrianfiedler <adrian_fiedler@msn.com>
Date:   Mon Oct 23 19:09:57 2023 +0200

    Fix broken links (oobabooga#4367)

    ---------

    Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>

commit 92691ee
Author: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date:   Mon Oct 23 09:57:44 2023 -0700

    Disable trust_remote_code by default
  • Loading branch information
Begelit committed Nov 6, 2023
1 parent bb59dc3 commit 2273473
Show file tree
Hide file tree
Showing 45 changed files with 384 additions and 174 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
.DS_Store
.eslintrc.js
.idea
.env
.venv
venv
*.bak
Expand Down
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
* 4-bit, 8-bit, and CPU inference through the transformers library
* Use llama.cpp models with transformers samplers (`llamacpp_HF` loader)
* [Multimodal pipelines, including LLaVA and MiniGPT-4](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/multimodal)
* [Extensions framework](docs/Extensions.md)
* [Custom chat characters](docs/Chat-mode.md)
* [Extensions framework](https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions)
* [Custom chat characters](https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab#character)
* Very efficient text streaming
* Markdown output with LaTeX rendering, to use for instance with [GALACTICA](https://github.com/paperswithcode/galai)
* API, including endpoints for websocket streaming ([see the examples](https://github.com/oobabooga/text-generation-webui/blob/main/api-examples))
Expand Down Expand Up @@ -60,7 +60,7 @@ To define persistent command-line flags like `--listen` or `--api`, edit the `CM
#### Other info

* There is no need to run any of those scripts as admin/root.
* For additional instructions about AMD setup, WSL setup, and nvcc installation, consult [this page](https://github.com/oobabooga/text-generation-webui/blob/main/docs/One-Click-Installers.md).
* For additional instructions about AMD setup, WSL setup, and nvcc installation, consult [the documentation](https://github.com/oobabooga/text-generation-webui/wiki).
* The installer has been tested mostly on NVIDIA GPUs. If you can find a way to improve it for your AMD/Intel Arc/Mac Metal GPU, you are highly encouraged to submit a PR to this repository. The main file to be edited is `one_click.py`.
* For automated installation, you can use the `GPU_CHOICE`, `USE_CUDA118`, `LAUNCH_AFTER_INSTALL`, and `INSTALL_EXTENSIONS` environment variables. For instance: `GPU_CHOICE=A USE_CUDA118=FALSE LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=FALSE ./start_linux.sh`.

Expand Down Expand Up @@ -170,7 +170,7 @@ cp docker/.env.example .env
docker compose up --build
```

* You need to have docker compose v2.17 or higher installed. See [this guide](https://github.com/oobabooga/text-generation-webui/blob/main/docs/Docker.md) for instructions.
* You need to have docker compose v2.17 or higher installed. See [this guide](https://github.com/oobabooga/text-generation-webui/wiki/09-%E2%80%90-Docker) for instructions.
* For additional docker files, check out [this repository](https://github.com/Atinoda/text-generation-webui-docker).

### Updating the requirements
Expand Down Expand Up @@ -300,6 +300,7 @@ Optionally, you can use the following command-line flags:
| `--sdp-attention` | Use PyTorch 2.0's SDP attention. Same as above. |
| `--trust-remote-code` | Set `trust_remote_code=True` while loading the model. Necessary for some models. |
| `--use_fast` | Set `use_fast=True` while loading the tokenizer. |
| `--use_flash_attention_2` | Set use_flash_attention_2=True while loading the model. |

#### Accelerate 4-bit

Expand Down Expand Up @@ -336,6 +337,8 @@ Optionally, you can use the following command-line flags:
|`--gpu-split` | Comma-separated list of VRAM (in GB) to use per GPU device for model layers. Example: 20,7,7. |
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. |
|`--cfg-cache` | ExLlama_HF: Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader, but not necessary for CFG with base ExLlama. |
|`--no_flash_attn` | Force flash-attention to not be used. |
|`--cache_8bit` | Use 8-bit cache to save VRAM. |

#### AutoGPTQ

Expand Down
3 changes: 2 additions & 1 deletion api-examples/api-example-chat-stream.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ async def run(user_input, history):
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'additive_repetition_penalty': 0,
'presence_penalty': 0,
'frequency_penalty': 0,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
Expand Down
3 changes: 2 additions & 1 deletion api-examples/api-example-chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ def run(user_input, history):
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'additive_repetition_penalty': 0,
'presence_penalty': 0,
'frequency_penalty': 0,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
Expand Down
3 changes: 2 additions & 1 deletion api-examples/api-example-stream.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ async def run(context):
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'additive_repetition_penalty': 0,
'presence_penalty': 0,
'frequency_penalty': 0,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
Expand Down
3 changes: 2 additions & 1 deletion api-examples/api-example.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ def run(prompt):
'tfs': 1,
'top_a': 0,
'repetition_penalty': 1.18,
'additive_repetition_penalty': 0,
'presence_penalty': 0,
'frequency_penalty': 0,
'repetition_penalty_range': 0,
'top_k': 40,
'min_length': 0,
Expand Down
8 changes: 0 additions & 8 deletions css/main.css
Original file line number Diff line number Diff line change
Expand Up @@ -648,11 +648,3 @@ div.svelte-362y77>*, div.svelte-362y77>.form>* {
.options {
z-index: 100 !important;
}

/* ----------------------------------------------
Increase the height of the evaluation table
---------------------------------------------- */
#evaluation-table table {
max-height: none !important;
overflow-y: auto !important;
}
7 changes: 5 additions & 2 deletions docs/03 ‐ Parameters Tab.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,11 @@ For more information about the parameters, the [transformers documentation](http
* **max_new_tokens**: Maximum number of tokens to generate. Don't set it higher than necessary: it is used in the truncation calculation through the formula `(prompt_length) = min(truncation_length - max_new_tokens, prompt_length)`, so your prompt will get truncated if you set it too high.
* **temperature**: Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
* **top_p**: If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
* **min_p**: Tokens with probability smaller than `(min_p) * (probability of the most likely token)` are discarded. This is the same as top_a but without squaring the probability.
* **top_k**: Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
* **repetition_penalty**: Penalty factor for repeating prior tokens. 1 means no penalty, higher value = less repetition, lower value = more repetition.
* **additive_repetition_penalty**: Similar to repetition_penalty, but with an additive offset on the raw token scores instead of a multiplicative factor. It may generate better results. 0 means no penalty, higher value = less repetition, lower value = more repetition.
* **presence_penalty**: Similar to repetition_penalty, but with an additive offset on the raw token scores instead of a multiplicative factor. It may generate better results. 0 means no penalty, higher value = less repetition, lower value = more repetition. Previously called "additive_repetition_penalty".
* **frequency_penalty**: Repetition penalty that scales based on how many times the token has appeared in the context. Be careful with this; there's no limit to how much a token can be penalized.
* **repetition_penalty_range**: The number of most recent tokens to consider for repetition penalty. 0 makes all tokens be used.
* **typical_p**: If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
* **tfs**: Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. See [this blog post](https://www.trentonbricken.com/Tail-Free-Sampling/) for details. The closer to 0, the more discarded tokens.
Expand All @@ -47,7 +49,8 @@ For more information about the parameters, the [transformers documentation](http
* **penalty_alpha**: Contrastive Search is enabled by setting this to greater than zero and unchecking "do_sample". It should be used with a low value of top_k, for instance, top_k = 4.
* **mirostat_mode**: Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the [paper](https://arxiv.org/abs/2007.14966).
* **mirostat_tau**: No idea, see the paper for details. According to the Preset Arena, 8 is a good value.
* **mirostat_tau**: No idea, see the paper for details. According to the Preset Arena, 0.1 is a good value.
* **mirostat_eta**: No idea, see the paper for details. According to the Preset Arena, 0.1 is a good value.
* **temperature_last**: Makes temperature the last sampler instead of the first. With this, you can remove low probability tokens with a sampler like min_p and then use a high temperature to make the model creative without losing coherency.
* **do_sample**: When unchecked, sampling is entirely disabled, and greedy decoding is used instead (the most likely token is always picked).
* **Seed**: Set the Pytorch seed to this number. Note that some loaders do not use Pytorch (notably llama.cpp), and others are not deterministic (notably ExLlama v1 and v2). For these loaders, the seed has no effect.
* **encoder_repetition_penalty**: Also known as the "Hallucinations filter". Used to penalize tokens that are *not* in the prior text. Higher value = more likely to stay in context, lower value = more likely to diverge.
Expand Down
5 changes: 4 additions & 1 deletion docs/04 ‐ Model Tab.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Options:
* **load-in-4bit**: Load the model in 4-bit precision using bitsandbytes.
* **trust-remote-code**: Some models use custom Python code to load the model or the tokenizer. For such models, this option needs to be set. It doesn't download any remote content: all it does is execute the .py files that get downloaded with the model. Those files can potentially include malicious code; I have never seen it happen, but it is in principle possible.
* **use_fast**: Use the "fast" version of the tokenizer. Especially useful for Llama models, which originally had a "slow" tokenizer that received an update. If your local files are in the old "slow" format, checking this option may trigger a conversion that takes several minutes. The fast tokenizer is mostly useful if you are generating 50+ tokens/second using ExLlama_HF or if you are tokenizing a huge dataset for training.
* **use_flash_attention_2**: Set use_flash_attention_2=True while loading the model. Possibly useful for training.
* **disable_exllama**: Only applies when you are loading a GPTQ model through the transformers loader. It needs to be checked if you intend to train LoRAs with the model.

### ExLlama_HF
Expand All @@ -42,6 +43,8 @@ ExLlama_HF is the v1 of ExLlama (https://github.com/turboderp/exllama) connected
* **gpu-split**: If you have multiple GPUs, the amount of memory to allocate per GPU should be set in this field. Make sure to set a lower value for the first GPU, as that's where the cache is allocated.
* **max_seq_len**: The maximum sequence length for the model. In ExLlama, the cache is preallocated, so the higher this value, the higher the VRAM. It is automatically set to the maximum sequence length for the model based on its metadata, but you may need to lower this value be able to fit the model into your GPU. After loading the model, the "Truncate the prompt up to this length" parameter under "Parameters" > "Generation" is automatically set to your chosen "max_seq_len" so that you don't have to set the same thing twice.
* **cfg-cache**: Creates a second cache to hold the CFG negative prompts. You need to set this if and only if you intend to use CFG in the "Parameters" > "Generation" tab. Checking this parameter doubles the cache VRAM usage.
* **no_flash_attn**: Disables flash attention. Otherwise, it is automatically used as long as the library is installed.
* **cache_8bit**: Create a 8-bit precision cache instead of a 16-bit one. This saves VRAM but increases perplexity (I don't know by how much).

### ExLlamav2_HF

Expand Down Expand Up @@ -86,7 +89,7 @@ Loads: GGUF models. Note: GGML models have been deprecated and do not work anymo
Example: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF

* **n-gpu-layers**: The number of layers to allocate to the GPU. If set to 0, only the CPU will be used. If you want to offload all layers, you can simply set this to the maximum value.
* **n-ctx**: Context length of the model. In llama.cpp, the context is preallocated, so the higher this value, the higher the RAM/VRAM usage will be. It gets automatically updated with the value in the GGUF metadata for the model when you select it in the Model dropdown.
* **n_ctx**: Context length of the model. In llama.cpp, the cache is preallocated, so the higher this value, the higher the VRAM. It is automatically set to the maximum sequence length for the model based on the metadata inside the GGUF file, but you may need to lower this value be able to fit the model into your GPU. After loading the model, the "Truncate the prompt up to this length" parameter under "Parameters" > "Generation" is automatically set to your chosen "n_ctx" so that you don't have to set the same thing twice.
* **threads**: Number of threads. Recommended value: your number of physical cores.
* **threads_batch**: Number of threads for batch processing. Recommended value: your total number of cores (physical + virtual).
* **n_batch**: Batch size for prompt processing. Higher values are supposed to make generation faster, but I have never obtained any benefit from changing this value.
Expand Down
3 changes: 1 addition & 2 deletions download-model.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,8 +236,7 @@ def check_model_files(self, model, branch, links, sha256, output_folder):
continue

with open(output_folder / sha256[i][0], "rb") as f:
bytes = f.read()
file_hash = hashlib.sha256(bytes).hexdigest()
file_hash = hashlib.file_digest(f, "sha256").hexdigest()
if file_hash != sha256[i][1]:
print(f'Checksum failed: {sha256[i][0]} {sha256[i][1]}')
validated = False
Expand Down
5 changes: 4 additions & 1 deletion extensions/api/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,17 @@ def build_parameters(body, chat=False):
'max_tokens_second': int(body.get('max_tokens_second', 0)),
'do_sample': bool(body.get('do_sample', True)),
'temperature': float(body.get('temperature', 0.5)),
'temperature_last': bool(body.get('temperature_last', False)),
'top_p': float(body.get('top_p', 1)),
'min_p': float(body.get('min_p', 0)),
'typical_p': float(body.get('typical_p', body.get('typical', 1))),
'epsilon_cutoff': float(body.get('epsilon_cutoff', 0)),
'eta_cutoff': float(body.get('eta_cutoff', 0)),
'tfs': float(body.get('tfs', 1)),
'top_a': float(body.get('top_a', 0)),
'repetition_penalty': float(body.get('repetition_penalty', body.get('rep_pen', 1.1))),
'additive_repetition_penalty': float(body.get('additive_repetition_penalty', body.get('additive_rep_pen', 0))),
'presence_penalty': float(body.get('presence_penalty', body.get('presence_pen', 0))),
'frequency_penalty': float(body.get('frequency_penalty', body.get('frequency_pen', 0))),
'repetition_penalty_range': int(body.get('repetition_penalty_range', 0)),
'encoder_repetition_penalty': float(body.get('encoder_repetition_penalty', 1.0)),
'top_k': int(body.get('top_k', 0)),
Expand Down
3 changes: 2 additions & 1 deletion extensions/multimodal/abstract_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import torch
from PIL import Image
from transformers import is_torch_xpu_available


class AbstractMultimodalPipeline(ABC):
Expand Down Expand Up @@ -55,7 +56,7 @@ def placeholder_embeddings() -> torch.Tensor:

def _get_device(self, setting_name: str, params: dict):
if params[setting_name] is None:
return torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
return torch.device("cuda:0" if torch.cuda.is_available() else "xpu:0" if is_torch_xpu_available() else "cpu")
return torch.device(params[setting_name])

def _get_dtype(self, setting_name: str, params: dict):
Expand Down
5 changes: 4 additions & 1 deletion extensions/openai/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@
'auto_max_new_tokens': False,
'max_tokens_second': 0,
'temperature': 1.0,
'temperature_last': False,
'top_p': 1.0,
'min_p': 0,
'top_k': 1, # choose 20 for chat in absence of another default
'repetition_penalty': 1.18,
'additive_repetition_penalty': 0,
'presence_penalty': 0,
'frequency_penalty': 0,
'repetition_penalty_range': 0,
'encoder_repetition_penalty': 1.0,
'suffix': None,
Expand Down
2 changes: 1 addition & 1 deletion extensions/sd_api_pictures/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,7 @@ def ui():
height = gr.Slider(64, 2048, value=params['height'], step=64, label='Height')
with gr.Column(variant="compact", elem_id="sampler_col"):
with gr.Row(elem_id="sampler_row"):
sampler_name = gr.Dropdown(value=params['sampler_name'], label='Sampling method', elem_id="sampler_box")
sampler_name = gr.Dropdown(value=params['sampler_name'], allow_custom_value=True, label='Sampling method', elem_id="sampler_box")
create_refresh_button(sampler_name, lambda: None, lambda: {'choices': get_samplers()}, 'refresh-button')
steps = gr.Slider(1, 150, value=params['steps'], step=1, label="Sampling steps", elem_id="steps_box")
with gr.Row():
Expand Down
8 changes: 3 additions & 5 deletions models/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@
model_type: 'dollyv2'
.*replit:
model_type: 'replit'
.*AWQ:
n_batch: 1
.*(oasst|openassistant-|stablelm-7b-sft-v7-epoch-3):
instruction_template: 'Open Assistant'
skip_special_tokens: false
Expand All @@ -47,9 +45,6 @@
.*starchat-beta:
instruction_template: 'Starchat-Beta'
custom_stopping_strings: '"<|end|>"'
.*(openorca-platypus2):
instruction_template: 'OpenOrca-Platypus2'
custom_stopping_strings: '"### Instruction:", "### Response:"'
(?!.*v0)(?!.*1.1)(?!.*1_1)(?!.*stable)(?!.*chinese).*vicuna:
instruction_template: 'Vicuna-v0'
.*vicuna.*v0:
Expand Down Expand Up @@ -154,6 +149,9 @@
instruction_template: 'Orca Mini'
.*(platypus|gplatty|superplatty):
instruction_template: 'Alpaca'
.*(openorca-platypus2):
instruction_template: 'OpenOrca-Platypus2'
custom_stopping_strings: '"### Instruction:", "### Response:"'
.*longchat:
instruction_template: 'Vicuna-v1.1'
.*vicuna-33b:
Expand Down
3 changes: 2 additions & 1 deletion modules/AutoGPTQ_loader.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from pathlib import Path

from accelerate.utils import is_xpu_available
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

import modules.shared as shared
Expand Down Expand Up @@ -41,7 +42,7 @@ def load_quantized(model_name):
# Define the params for AutoGPTQForCausalLM.from_quantized
params = {
'model_basename': pt_path.stem,
'device': "cuda:0" if not shared.args.cpu else "cpu",
'device': "xpu:0" if is_xpu_available() else "cuda:0" if not shared.args.cpu else "cpu",
'use_triton': shared.args.triton,
'inject_fused_attention': not shared.args.no_inject_fused_attention,
'inject_fused_mlp': not shared.args.no_inject_fused_mlp,
Expand Down
Loading

0 comments on commit 2273473

Please sign in to comment.