Langchain OpenAI HTTP response code 422 #187

rodrigo-pedro · 2023-05-11T17:09:32Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I am trying to run the following code, as demonstrated in https://github.com/abetlen/llama-cpp-python/blob/main/examples/notebooks/Clients.ipynb :

from langchain.llms import OpenAI

api_key = "your_api_key"
api_base = "http://localhost:8000/v1"

llms = OpenAI(openai_api_base=api_base, openai_api_key=api_key)
llms(
    prompt="The quick brown fox jumps",
    stop=[".", "\n"],
)

I expect to receive a successful response.

Current Behavior

The response I receive is the following:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: Invalid response object from API: '{"detail":[{"loc":["body","prompt"],"msg":"str type expected","type":"type_error.str"}]}' (HTTP response code was 422).

In the server, this is the corresponding message:

INFO:     172.17.0.1:44014 - "POST /v1/completions HTTP/1.1" 422 Unprocessable Entity

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            12
    CPU max MHz:         5000.0000
    CPU min MHz:         800.0000
    BogoMIPS:            7200.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm
                          constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 sss
                         e3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault
                          invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx rdsee
                         d adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp sgx_lc md_clear flush_l1
                         d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    2 MiB (8 instances)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Not affected
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Not affected
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
  Retbleed:              Mitigation; IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Mitigation; Microcode
  Tsx async abort:       Mitigation; TSX disabled

Operating System, e.g. for Linux:

$ uname -a

Linux i99900k 6.2.1-060201-generic #202302251141 SMP PREEMPT_DYNAMIC Sat Feb 25 11:49:50 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

SDK version, e.g. for Linux:

$ python3 --version
Python 3.11.3
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110

Failure Information (for bugs)

See above

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

Start a llama-cpp-python server
Run the code above

Failure Logs

Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.

Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability.

Environment info:

llama-cpp-python$ git log | head -1
commit c3ed1330d7b7cf5193df7a638b33a1bc2d717577

llama-cpp-python$ pip list | egrep "uvicorn|fastapi|sse-starlette"
fastapi           0.95.1
sse-starlette     1.5.0
uvicorn           0.22.0

The text was updated successfully, but these errors were encountered:

abetlen · 2023-05-12T11:19:53Z

@rodrigo-pedro sorry about that, this was a bug that got re-introduced recently, openai api accepts arrays of prompts, I'll push a new pypi release tomorrow but it should be fixed in the github version.

* Bugfix: Ensure logs are printed when streaming * Update llama.cpp * Update llama.cpp * Add missing tfs_z paramter * Bump version * Fix docker command * Revert "llama_cpp server: prompt is a string". Closes abetlen#187 This reverts commit b9098b0. * Only support generating one prompt at a time. * Allow model to tokenize strings longer than context length and set add_bos. Closes abetlen#92 * Update llama.cpp * Bump version * Update llama.cpp * Fix obscure Wndows DLL issue. Closes abetlen#208 * chore: add note for Mac m1 installation * Add winmode arg only on windows if python version supports it * Bump mkdocs-material from 9.1.11 to 9.1.12 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.11 to 9.1.12. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.1.11...9.1.12) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Update README.md Fix typo. * Fix CMakeLists.txt * Add sampling defaults for generate * Update llama.cpp * Add model_alias option to override model_path in completions. Closes abetlen#39 * Update variable name * Update llama.cpp * Fix top_k value. Closes abetlen#220 * Fix last_n_tokens_size * Implement penalize_nl * Format * Update token checks * Move docs link up * Fixd CUBLAS dll load issue in Windows * Check for CUDA_PATH before adding --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Andrei Betlen <abetlen@gmail.com> Co-authored-by: Anchen <anchen.li+alias@pepperstone.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xiyou Zhou <xiyou.zhou@gmail.com> Co-authored-by: Aneesh Joy <aneeshjoy@gmail.com>

deetungsten · 2023-05-21T21:17:24Z

@abetlen This error still persist for me. I have the latest version from pypi and I also double checked that commit 8895b90 is reflected in my conda environment in app.py. Thanks!

deetungsten · 2023-05-21T21:24:15Z

Ahh interesting, OpenAI works but ChatOpenAI is broken.

alienatorZ · 2023-06-07T12:33:41Z

Add max_tokens parameter when calling ChatOpenAI worked for me.

* add ggml_rms_norm * update op num

gjmulder · 2023-06-14T19:02:09Z

Can we close this?

dflatline · 2023-06-20T03:07:11Z

Can we close this?

I just tried running commit 92b0013 from git, and it still has this same issue using langchain with GPTeam (101dotxyz/GPTeam#63). max_tokens is set to null, and I get:

HTTP/1.1 422 Unprocessable Entity
date: Mon, 19 Jun 2023 23:58:36 GMT
server: uvicorn
content-length: 116
content-type: application/json

{"detail":[{"loc":["body","max_tokens"],"msg":"none is not an allowed value","type":"type_error.none.not_allowed"}]}

abetlen closed this as completed in 8895b90 May 12, 2023

abetlen reopened this May 21, 2023

gjmulder added the bug Something isn't working label May 22, 2023

xaptronic pushed a commit to xaptronic/llama-cpp-python that referenced this issue Jun 13, 2023

Add RMS norm and use it (abetlen#187)

6eac39b

* add ggml_rms_norm * update op num

dflatline mentioned this issue Jun 20, 2023

Using GPTeam with gpt4all or llama-cpp-python openai-compatible API endpoint 101dotxyz/GPTeam#63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langchain OpenAI HTTP response code 422 #187

Langchain OpenAI HTTP response code 422 #187

rodrigo-pedro commented May 11, 2023

abetlen commented May 12, 2023

deetungsten commented May 21, 2023

deetungsten commented May 21, 2023

alienatorZ commented Jun 7, 2023

gjmulder commented Jun 14, 2023

dflatline commented Jun 20, 2023 •

edited

Loading

Langchain OpenAI HTTP response code 422 #187

Langchain OpenAI HTTP response code 422 #187

Comments

rodrigo-pedro commented May 11, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

abetlen commented May 12, 2023

deetungsten commented May 21, 2023

deetungsten commented May 21, 2023

alienatorZ commented Jun 7, 2023

gjmulder commented Jun 14, 2023

dflatline commented Jun 20, 2023 • edited Loading

dflatline commented Jun 20, 2023 •

edited

Loading