Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] python init_database.py --recreate-vs时报错 #3121

Closed
skywolf123 opened this issue Mar 1, 2024 · 6 comments
Closed

[BUG] python init_database.py --recreate-vs时报错 #3121

skywolf123 opened this issue Mar 1, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@skywolf123
Copy link

问题描述 / Problem Description
python init_database.py --recreate-vs时报错

第一个问题:
(MaxRetryError('HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'huggingface.co'. (_ssl.c:1006)")))'), '(Request ID: 4826469f-6904-4873-a7c7-291a84614162)')

第二个问题:
None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>

第三个问题:
('samples', 'test_files/langchain-ChatGLM_closed.jsonl', '从文件 samples/test_files/langchain-ChatGLM_closed.jsonl 加载文档时出错:jq package not found, please install it with pip install jq')

环境信息 / Environment Information

  • langchain-ChatGLM 版本/commit 号:v0.2.10
  • 是否使用 Docker 部署(是/否):否
  • 使用的模型(ChatGLM2-6B / Qwen-7B 等):Openai
  • 使用的 Embedding 模型(moka-ai/m3e-base 等):bge-large-zh-v1.5
  • 使用的向量库类型 (faiss / milvus / pg_vector 等): faiss
  • 操作系统及版本 / Operating system and version: Windows11
  • Python 版本 / Python version: v3.11.8
  • 其他相关环境信息 / Other relevant environment information:
    • langchain版本:0.0.354. fastchat版本:0.2.35
    • 当前使用的分词器:ChineseRecursiveTextSplitter

附加信息 / Additional Information
不管是否翻墙都会报错

@skywolf123 skywolf123 added the bug Something isn't working label Mar 1, 2024
Copy link

dosubot bot commented Mar 1, 2024

🤖

你好,@skywolf123!很高兴再次见到你,希望你一切都好。

根据您的问题描述,我找到了一些可能的解决方案:

第一个问题:MaxRetryError 和 SSLCertVerificationError
这个问题可能是由于网络问题或者SSL证书问题导致的。请确保您的系统可以访问 huggingface.co,并且SSL/TLS证书在客户端系统上正确安装和信任。如果您使用的是代理服务器或防火墙,也请检查它们的设置,确保它们没有阻止或干扰SSL/TLS连接。具体的解决步骤可以参考这个issue

第二个问题:模型标识符无效
这个问题可能是由于模型标识符填写错误或者模型文件缺失或损坏导致的。您可以尝试手动从Hugging Face下载 bge-large-zh 模型,然后在 model_config.py 文件中更新 EMBEDDING_MODELMODEL_PATH,指向您放置下载模型的正确路径。具体的解决步骤可以参考这个issue

第三个问题:jq package not found
这个问题是因为缺少 jq 包。您可以通过执行 pip install jq 来安装它。

如果以上的解决方案不能解决您的问题,或者您有其他的问题,欢迎您继续提问。

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@skywolf123
Copy link
Author

第三个问题我通过修改JSONLoader类实现了无需jq库,因为windows不支持安装jq库

@skywolf123
Copy link
Author

第二个问题我查到是server\knowledge_base\utils.py文件中的make_text_splitter函数在执行tokenizer = AutoTokenizer.from_pretrained(text_splitter_dict[splitter_name]["tokenizer_name_or_path"], trust_remote_code=True)时报的异常,由于ChineseRecursiveTextSplitter的tokenizer_name_or_path是None,这该怎么办?

@limoou
Copy link

limoou commented Mar 12, 2024

第三个问题我通过修改JSONLoader类实现了无需jq库,因为windows不支持安装jq库

可以上传代码么?我不知道如何修改,谢谢。

@lovepoy
Copy link

lovepoy commented Mar 30, 2024

第三个问题我通过修改JSONLoader类实现了无需jq库,因为windows不支持安装jq库

请教一下是怎么改的呢 我也是window环境

@CchenDdong
Copy link

第三个问题我通过修改JSONLoader类实现了无需jq库,因为windows不支持安装jq库

请教一下是怎么改的呢 我也是window环境

这里提供了一些解决办法langchain-ai/langchain#4396

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants