Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Badcase]: Qwen 2.5:72b-instruct 在百炼平台或其他线上平台的表现比本地部署的 Ollama 模型更好,当输入字数过多时,ollama本地部署的模型会出现答非所问的情况。请问如何解决这个问题? #1137

Open
4 tasks done
thhbbx opened this issue Dec 19, 2024 · 10 comments

Comments

@thhbbx
Copy link

thhbbx commented Dec 19, 2024

Model Series

Qwen2.5

What are the models used?

qwen2.5-72b-instruct

What is the scenario where the problem happened?

当字符数量接近或超过 1500 时,本地部署的 Ollama 模型开始出现答非所问的情况。然而,使用相同的提示词和消息,通过 API key 调用线上模型时,未出现类似问题。

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

OS: Ubuntu 22.04
Python: Python 3.12.4
GPUs: 2 x NVIDIA A100
NVIDIA driver: 550.127.08
CUDA compiler: 12.4.131
PyTorch: 2.5.1

Package Version


aiobotocore 2.12.3
aiohttp 3.9.5
aioitertools 0.7.1
aiosignal 1.2.0
alabaster 0.7.16
altair 5.0.1
anaconda-anon-usage 0.4.4
anaconda-catalogs 0.2.0
anaconda-client 1.12.3
anaconda-cloud-auth 0.5.1
anaconda-navigator 2.6.0
anaconda-project 0.11.1
annotated-types 0.6.0
anyio 4.2.0
appdirs 1.4.4
archspec 0.2.3
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
astroid 2.14.2
astropy 6.1.0
astropy-iers-data 0.2024.6.3.0.31.14
asttokens 2.0.5
async-lru 2.0.4
atomicwrites 1.4.0
attrs 23.1.0
Automat 20.2.0
autopep8 2.0.4
Babel 2.11.0
bcrypt 3.2.0
beautifulsoup4 4.12.3
binaryornot 0.4.4
black 24.4.2
bleach 4.1.0
blinker 1.6.2
bokeh 3.4.1
boltons 23.0.0
botocore 1.34.69
Bottleneck 1.3.7
Brotli 1.0.9
cachetools 5.3.3
certifi 2024.8.30
cffi 1.16.0
chardet 4.0.0
charset-normalizer 2.0.4
click 8.1.7
cloudpickle 2.2.1
colorama 0.4.6
colorcet 3.1.0
comm 0.2.1
conda 24.11.0
conda-build 24.5.1
conda-content-trust 0.2.0
conda_index 0.5.0
conda-libmamba-solver 24.1.0
conda-pack 0.7.1
conda-package-handling 2.3.0
conda_package_streaming 0.10.0
conda-repo-cli 1.0.88
conda-token 0.5.0+1.g2209e04
constantly 23.10.4
contourpy 1.2.0
cookiecutter 2.6.0
cryptography 42.0.5
cssselect 1.2.0
cycler 0.11.0
cytoolz 0.12.2
dask 2024.5.0
dask-expr 1.1.0
datashader 0.16.2
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
diff-match-patch 20200713
dill 0.3.8
distributed 2024.5.0
distro 1.9.0
docstring-to-markdown 0.11
docutils 0.18.1
entrypoints 0.4
et-xmlfile 1.1.0
executing 0.8.3
fastjsonschema 2.16.2
filelock 3.13.1
flake8 7.0.0
Flask 3.0.3
fonttools 4.51.0
frozendict 2.4.2
frozenlist 1.4.0
fsspec 2024.3.1
gensim 4.3.2
gitdb 4.0.7
GitPython 3.1.37
greenlet 3.0.1
h11 0.14.0
h5py 3.11.0
HeapDict 1.0.1
holoviews 1.19.0
httpcore 1.0.2
httpx 0.27.0
hvplot 0.10.0
hyperlink 21.0.0
idna 3.7
imagecodecs 2023.1.23
imageio 2.33.1
imagesize 1.4.1
imbalanced-learn 0.12.3
importlib-metadata 7.0.1
incremental 22.10.0
inflection 0.5.1
iniconfig 1.1.1
intake 0.7.0
intervaltree 3.1.0
ipykernel 6.28.0
ipython 8.25.0
ipython-genutils 0.2.0
ipywidgets 7.8.1
isort 5.13.2
itemadapter 0.3.0
itemloaders 1.1.0
itsdangerous 2.2.0
jaraco.classes 3.2.1
jedi 0.18.1
jeepney 0.7.1
jellyfish 1.0.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
json5 0.9.6
jsonpatch 1.33
jsonpointer 2.1
jsonschema 4.19.2
jsonschema-specifications 2023.7.1
jupyter 1.0.0
jupyter_client 8.6.0
jupyter-console 6.6.3
jupyter_core 5.7.2
jupyter-events 0.10.0
jupyter-lsp 2.2.0
jupyter_server 2.14.1
jupyter_server_terminals 0.4.4
jupyterlab 4.2.5
jupyterlab-pygments 0.1.2
jupyterlab_server 2.27.3
jupyterlab-widgets 1.0.0
keyring 24.3.1
kiwisolver 1.4.4
lazy_loader 0.4
lazy-object-proxy 1.10.0
lckr_jupyterlab_variableinspector 3.1.0
libarchive-c 2.9
libmambapy 1.5.8
linkify-it-py 2.0.0
llvmlite 0.42.0
lmdb 1.4.1
locket 1.0.0
lxml 5.2.1
lz4 4.3.2
Markdown 3.4.1
markdown-it-py 2.2.0
MarkupSafe 2.1.3
matplotlib 3.8.4
matplotlib-inline 0.1.6
mccabe 0.7.0
mdit-py-plugins 0.3.0
mdurl 0.1.0
menuinst 2.1.1
mistune 2.0.4
mkl-fft 1.3.8
mkl-random 1.2.4
mkl-service 2.4.0
more-itertools 10.1.0
mpmath 1.3.0
msgpack 1.0.3
multidict 6.0.4
multipledispatch 0.6.0
mypy 1.10.0
mypy-extensions 1.0.0
navigator-updater 0.5.1
nbclient 0.8.0
nbconvert 7.10.0
nbformat 5.9.2
nest-asyncio 1.6.0
networkx 3.2.1
nltk 3.8.1
notebook 7.2.2
notebook_shim 0.2.3
numba 0.59.1
numexpr 2.8.7
numpy 1.26.4
numpydoc 1.7.0
openpyxl 3.1.2
overrides 7.4.0
packaging 23.2
pandas 2.2.2
pandocfilters 1.5.0
panel 1.4.4
param 2.1.0
parsel 1.8.1
parso 0.8.3
partd 1.4.1
pathspec 0.10.3
patsy 0.5.6
pexpect 4.8.0
pickleshare 0.7.5
pillow 10.3.0
pip 24.0
pkce 1.0.3
pkginfo 1.10.0
platformdirs 3.10.0
plotly 5.22.0
pluggy 1.0.0
ply 3.11
prometheus-client 0.14.1
prompt-toolkit 3.0.43
Protego 0.1.16
protobuf 3.20.3
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 14.0.2
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.11.1
pycosat 0.6.6
pycparser 2.21
pyct 0.5.0
pycurl 7.45.2
pydantic 2.5.3
pydantic_core 2.14.6
pydeck 0.8.0
PyDispatcher 2.0.5
pydocstyle 6.3.0
pyerfa 2.0.1.4
pyflakes 3.2.0
Pygments 2.15.1
PyJWT 2.8.0
pylint 2.16.2
pylint-venv 3.0.3
pyls-spyder 0.4.0
pyodbc 5.0.1
pyOpenSSL 24.0.0
pyparsing 3.0.9
PyQt5 5.15.10
PyQt5-sip 12.13.0
PyQtWebEngine 5.15.6
PySocks 1.7.1
pytest 7.4.4
python-dateutil 2.9.0.post0
python-dotenv 0.21.0
python-json-logger 2.0.7
python-lsp-black 2.0.0
python-lsp-jsonrpc 1.1.2
python-lsp-server 1.10.0
python-slugify 5.0.2
python-snappy 0.6.1
pytoolconfig 1.2.6
pytz 2024.1
pyviz_comms 3.0.2
pywavelets 1.5.0
pyxdg 0.27
PyYAML 6.0.1
pyzmq 25.1.2
QDarkStyle 3.2.3
qstylizer 0.2.2
QtAwesome 1.2.2
qtconsole 5.5.1
QtPy 2.4.1
queuelib 1.6.2
referencing 0.30.2
regex 2023.10.3
requests 2.32.2
requests-file 1.5.1
requests-toolbelt 1.0.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.3.5
rope 1.12.0
rpds-py 0.10.6
Rtree 1.0.1
ruamel.yaml 0.17.21
ruamel-yaml-conda 0.17.21
s3fs 2024.3.1
scikit-image 0.23.2
scikit-learn 1.4.2
scipy 1.13.1
Scrapy 2.11.1
seaborn 0.13.2
SecretStorage 3.3.1
semver 3.0.2
Send2Trash 1.8.2
service-identity 18.1.0
setuptools 69.5.1
sip 6.7.12
six 1.16.0
smart-open 5.2.1
smmap 4.0.0
sniffio 1.3.0
snowballstemmer 2.2.0
sortedcontainers 2.4.0
soupsieve 2.5
Sphinx 7.3.7
sphinxcontrib-applehelp 1.0.2
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.0
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.10
spyder 5.5.1
spyder-kernels 2.5.0
SQLAlchemy 2.0.30
stack-data 0.2.0
statsmodels 0.14.2
streamlit 1.32.0
sympy 1.12
tables 3.9.2
tabulate 0.9.0
tblib 1.7.0
tenacity 8.2.2
terminado 0.17.1
text-unidecode 1.3
textdistance 4.2.1
threadpoolctl 2.2.0
three-merge 0.1.1
tifffile 2023.4.12
tinycss2 1.2.1
tldextract 3.2.0
toml 0.10.2
tomli 2.0.1
tomlkit 0.11.1
toolz 0.12.0
torch 2.5.1
torchaudio 2.5.1
torchvision 0.20.1
tornado 6.4.1
tqdm 4.66.4
traitlets 5.14.3
triton 3.1.0
truststore 0.8.0
Twisted 23.10.0
typing_extensions 4.11.0
tzdata 2023.3
uc-micro-py 1.0.1
ujson 5.10.0
unicodedata2 15.1.0
Unidecode 1.2.0
urllib3 2.2.2
w3lib 1.21.0
watchdog 4.0.1
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.0.3
whatthepatch 1.0.2
wheel 0.43.0
widgetsnbextension 3.6.6
wrapt 1.14.1
wurlitzer 3.0.2
xarray 2023.6.0
xyzservices 2022.9.0
yapf 0.40.2
yarl 1.9.3
zict 3.0.0
zipp 3.17.0
zope.interface 5.4.0
zstandard 0.22.0

Description

Steps to reproduce

当字符数量接近或超过 1500 时(大概值),本地部署的 Ollama 模型开始出现答非所问的情况(不按照系统提示词回答了)。然而,使用相同的提示词和消息,通过 API key 调用线上模型时,未出现类似问题。

@thhbbx
Copy link
Author

thhbbx commented Dec 19, 2024

When the character count approaches or exceeds 1500 (approximately), the locally deployed Ollama model starts giving irrelevant responses (not following the system prompt). However, when using the same prompts and messages through an API key to call the online model, this issue does not occur.

@jklj077
Copy link
Collaborator

jklj077 commented Dec 19, 2024

it could be related to the quantization method or the Ollama implementation.

  • please try using higher precisions for Ollama, e.g., Q5_K_M, fp16, etc. please be aware that the models on Ollama are provided by Ollama.
  • please try vllm or transformers with a quantized model locally to see if the same issue occurs.

@ironMan4pan
Copy link

看看Ollama的num_ctx配置项是多少

@thhbbx
Copy link
Author

thhbbx commented Dec 20, 2024

it could be related to the quantization method or the Ollama implementation.

* please try using higher precisions for Ollama, e.g., Q5_K_M, fp16, etc. please be aware that the models on Ollama are provided by Ollama.

* please try `vllm` or `transformers` with a quantized model locally to see if the same issue occurs.

I used ollama to load qwen2.5-32b-instruct-GGUF:q8_0, but it still resulted in too many characters and the answer was off-topic.

@thhbbx
Copy link
Author

thhbbx commented Dec 20, 2024

看看Ollama的num_ctx配置项是多少

大佬求指点,下面是模型信息
(base) root@app-540e333a:~# OLLAMA_HOST=127.0.0.1:8801 ollama show qwen2.5:72b
Model
architecture qwen2
parameters 72.7B
context length 32768
embedding length 8192
quantization Q4_K_M

System
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

License
Qwen RESEARCH LICENSE AGREEMENT
Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024

@ironMan4pan
Copy link

看看Ollama的num_ctx配置项是多少
大佬求指点,下面是模型信息
(base) root@app-540e333a:~# OLLAMA_HOST=127.0.0.1:8801 ollama show qwen2.5:72b
Model
architecture qwen2
parameters 72.7B
context length 32768
embedding length 8192
quantization Q4_K_M

System You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

License Qwen RESEARCH LICENSE AGREEMENT Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024

不太确定你的问题是不是由Ollama默认参数导致的。“By default, Ollama uses a context window size of 2048 tokens.”

解决方法:
https://github.com/ollama/ollama/blob/main/docs/faq.md
或者下述方法

image

1.导出模型文件
ollama show --modelfile qwen2.5:7b > Qwen2_5_7BModelfile

2.编辑模型文件添加如下参数:
PARAMETER num_ctx 32768
PARAMETER num_predict -1

3.根据模型文件创建新模型
ollama create qwen2.5:7b-max-context -f Qwen2_5_7BModelfile

@thhbbx
Copy link
Author

thhbbx commented Dec 20, 2024

看看Ollama的num_ctx配置项是多少
大佬求指点,下面是模型信息
(base) root@app-540e333a:~# OLLAMA_HOST=127.0.0.1:8801 ollama show qwen2.5:72b
Model
architecture qwen2
parameters 72.7B
context length 32768
embedding length 8192
quantization Q4_K_M

System You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
License Qwen RESEARCH LICENSE AGREEMENT Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024

不太确定你的问题是不是由Ollama默认参数导致的。“By default, Ollama uses a context window size of 2048 tokens.”

解决方法: https://github.com/ollama/ollama/blob/main/docs/faq.md 或者下述方法

image

1.导出模型文件 ollama show --modelfile qwen2.5:7b > Qwen2_5_7BModelfile

2.编辑模型文件添加如下参数: PARAMETER num_ctx 32768 PARAMETER num_predict -1

3.根据模型文件创建新模型 ollama create qwen2.5:7b-max-context -f Qwen2_5_7BModelfile

太感谢了大佬,不会答非所问了,这个方法编辑模型的方法是在哪里说明的呀?大佬留个微信,请你喝杯咖啡

@thhbbx
Copy link
Author

thhbbx commented Dec 20, 2024

看看Ollama的num_ctx配置项是多少
大佬求指点,下面是模型信息
(base) root@app-540e333a:~# OLLAMA_HOST=127.0.0.1:8801 ollama show qwen2.5:72b
Model
architecture qwen2
parameters 72.7B
context length 32768
embedding length 8192
quantization Q4_K_M

System You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
License Qwen RESEARCH LICENSE AGREEMENT Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024

不太确定你的问题是不是由Ollama默认参数导致的。“By default, Ollama uses a context window size of 2048 tokens.”

解决方法: https://github.com/ollama/ollama/blob/main/docs/faq.md 或者下述方法

image

1.导出模型文件 ollama show --modelfile qwen2.5:7b > Qwen2_5_7BModelfile

2.编辑模型文件添加如下参数: PARAMETER num_ctx 32768 PARAMETER num_predict -1

3.根据模型文件创建新模型 ollama create qwen2.5:7b-max-context -f Qwen2_5_7BModelfile

还想咨询一下大佬,ollama下载的模型智商和qwen官方的模型是一样的吗?我看ollama下载的72B模型 quantization只有Q4_K_M

@ironMan4pan
Copy link

看看Ollama的num_ctx配置项是多少
大佬求指点,下面是模型信息
(base) root@app-540e333a:~# OLLAMA_HOST=127.0.0.1:8801 ollama show qwen2.5:72b
Model
architecture qwen2
parameters 72.7B
context length 32768
embedding length 8192
quantization Q4_K_M

System You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
License Qwen RESEARCH LICENSE AGREEMENT Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024

不太确定你的问题是不是由Ollama默认参数导致的。“By default, Ollama uses a context window size of 2048 tokens.”
解决方法: https://github.com/ollama/ollama/blob/main/docs/faq.md 或者下述方法
image
1.导出模型文件 ollama show --modelfile qwen2.5:7b > Qwen2_5_7BModelfile
2.编辑模型文件添加如下参数: PARAMETER num_ctx 32768 PARAMETER num_predict -1
3.根据模型文件创建新模型 ollama create qwen2.5:7b-max-context -f Qwen2_5_7BModelfile

还想咨询一下大佬,ollama下载的模型智商和qwen官方的模型是一样的吗?我看ollama下载的72B模型 quantization只有Q4_K_M

我也是在学习摸索阶段,有好多东西也是不懂。https://qwen.readthedocs.io/zh-cn/latest/benchmark/quantization_benchmark.html

@thhbbx
Copy link
Author

thhbbx commented Dec 20, 2024

看看Ollama的num_ctx配置项是多少
大佬求指点,下面是模型信息
(base) root@app-540e333a:~# OLLAMA_HOST=127.0.0.1:8801 ollama show qwen2.5:72b
Model
architecture qwen2
parameters 72.7B
context length 32768
embedding length 8192
quantization Q4_K_M

System You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
License Qwen RESEARCH LICENSE AGREEMENT Qwen RESEARCH LICENSE AGREEMENT Release Date: September 19, 2024

不太确定你的问题是不是由Ollama默认参数导致的。“By default, Ollama uses a context window size of 2048 tokens.”
解决方法: https://github.com/ollama/ollama/blob/main/docs/faq.md 或者下述方法
image
1.导出模型文件 ollama show --modelfile qwen2.5:7b > Qwen2_5_7BModelfile
2.编辑模型文件添加如下参数: PARAMETER num_ctx 32768 PARAMETER num_predict -1
3.根据模型文件创建新模型 ollama create qwen2.5:7b-max-context -f Qwen2_5_7BModelfile

还想咨询一下大佬,ollama下载的模型智商和qwen官方的模型是一样的吗?我看ollama下载的72B模型 quantization只有Q4_K_M

我也是在学习摸索阶段,有好多东西也是不懂。https://qwen.readthedocs.io/zh-cn/latest/benchmark/quantization_benchmark.html

方便的话留个微信,请你喝杯咖啡,我也在学习这块

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants