常见问题

Q1: 本项目支持哪些文件格式？

A1: 目前已测试支持 txt、docx、md、pdf 格式文件，更多文件格式请参考 langchain 文档。目前已知文档中若含有特殊字符，可能存在文件无法加载的问题。

Q2: 执行 pip install -r requirements.txt 过程中，安装 detectron2 时发生报错怎么办？

A2: 如果不需要对 pdf 格式文件读取，可不安装 detectron2；如需对 pdf 文件进行高精度文本提取，建议按照如下方法安装：

$ git clone https://github.com/facebookresearch/detectron2.git
$ cd detectron2
$ pip install -e .

Q3: 使用过程中 Python 包nltk发生了Resource punkt not found.报错，该如何解决？

A3: 方法一：https://github.com/nltk/nltk_data/raw/gh-pages/packages/tokenizers/punkt.zip 中的 packages/tokenizers 解压，放到 nltk_data/tokenizers 存储路径下。

nltk_data 存储路径可以通过 nltk.data.path 查询。

方法二：执行python代码

import nltk
nltk.download()

Q4: 使用过程中 Python 包nltk发生了Resource averaged_perceptron_tagger not found.报错，该如何解决？

A4: 方法一：将 https://github.com/nltk/nltk_data/blob/gh-pages/packages/taggers/averaged_perceptron_tagger.zip 下载，解压放到 nltk_data/taggers 存储路径下。

nltk_data 存储路径可以通过 nltk.data.path 查询。

方法二：执行python代码

import nltk
nltk.download()

Q5: 本项目可否在 colab 中运行？

A5: 可以尝试使用 chatglm-6b-int4 模型在 colab 中运行，需要注意的是，如需在 colab 中运行 Web UI，需将webui.py中demo.queue(concurrency_count=3).launch( server_name='0.0.0.0', share=False, inbrowser=False)中参数share设置为True。

Q6: 在 Anaconda 中使用 pip 安装包无效如何解决？

A6: 此问题是系统环境问题，详细见在Anaconda中使用pip安装包无效问题

Q7: 本项目中所需模型如何下载至本地？

A7: 本项目中使用的模型均为huggingface.com中可下载的开源模型，以默认选择的chatglm-6b和text2vec-large-chinese模型为例，下载模型可执行如下代码：

# 安装 git lfs
$ git lfs install

# 下载 LLM 模型
$ git clone https://huggingface.co/THUDM/chatglm-6b /your_path/chatglm-6b

# 下载 Embedding 模型
$ git clone https://huggingface.co/GanymedeNil/text2vec-large-chinese /your_path/text2vec

# 模型需要更新时，可打开模型所在文件夹后拉取最新模型文件/代码
$ git pull

Q8: huggingface.com中模型下载速度较慢怎么办？

A8: 可使用本项目用到的模型权重文件百度网盘地址：

ernie-3.0-base-zh.zip 链接: https://pan.baidu.com/s/1CIvKnD3qzE-orFouA8qvNQ?pwd=4wih
ernie-3.0-nano-zh.zip 链接: https://pan.baidu.com/s/1Fh8fgzVdavf5P1omAJJ-Zw?pwd=q6s5
text2vec-large-chinese.zip 链接: https://pan.baidu.com/s/1sMyPzBIXdEzHygftEoyBuA?pwd=4xs7
chatglm-6b-int4-qe.zip 链接: https://pan.baidu.com/s/1DDKMOMHtNZccOOBGWIOYww?pwd=22ji
chatglm-6b-int4.zip 链接: https://pan.baidu.com/s/1pvZ6pMzovjhkA6uPcRLuJA?pwd=3gjd
chatglm-6b.zip 链接: https://pan.baidu.com/s/1B-MpsVVs1GHhteVBetaquw?pwd=djay

Q9: 下载完模型后，如何修改代码以执行本地模型？

A9: 模型下载完成后，请在 configs/model_config.py 文件中，对embedding_model_dict和llm_model_dict参数进行修改，如把llm_model_dict从

embedding_model_dict = {
    "ernie-tiny": "nghuyong/ernie-3.0-nano-zh",
    "ernie-base": "nghuyong/ernie-3.0-base-zh",
    "text2vec": "GanymedeNil/text2vec-large-chinese"
}

修改为

embedding_model_dict = {
                        "ernie-tiny": "nghuyong/ernie-3.0-nano-zh",
                        "ernie-base": "nghuyong/ernie-3.0-base-zh",
                        "text2vec": "/Users/liuqian/Downloads/ChatGLM-6B/text2vec-large-chinese"
}

Q10: 执行python cli_demo.py过程中，显卡内存爆了，提示"OutOfMemoryError: CUDA out of memory"

A10: 将 VECTOR_SEARCH_TOP_K 和 LLM_HISTORY_LEN 的值调低，比如 VECTOR_SEARCH_TOP_K = 5 和 LLM_HISTORY_LEN = 2，这样由 query 和 context 拼接得到的 prompt 会变短，会减少内存的占用。

Q11: 执行 pip install -r requirements.txt 过程中遇到 python 包，如 langchain 找不到对应版本的问题

A11: 更换 pypi 源后重新安装，如阿里源、清华源等，网络条件允许时建议直接使用 pypi.org 源，具体操作命令如下：

# 使用 pypi 源
$ pip install -r requirements.txt -i https://pypi.python.org/simple

或

# 使用阿里源
$ pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/

或

# 使用清华源
$ pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ.md

FAQ.md

常见问题

Files

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

常见问题