Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何直接上传QA对信息,不需要大模型来帮我生成 #4664

Closed
3 tasks done
983183947 opened this issue May 26, 2024 · 3 comments · May be fixed by #12844
Closed
3 tasks done

如何直接上传QA对信息,不需要大模型来帮我生成 #4664

983183947 opened this issue May 26, 2024 · 3 comments · May be fixed by #12844
Labels
💪 enhancement New feature or request

Comments

@983183947
Copy link

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit report (我已阅读并同意 Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

Provide a description of requested docs changes

目前没有找到能直接导入QA对的地方。只有上传一个文件后,选择QA分段模式,但是此时我并不想通过大模型生成很多QA对。
image

@dosubot dosubot bot added the 💪 enhancement New feature or request label May 26, 2024
Copy link

dosubot bot commented May 26, 2024

要直接上传QA对信息而不需要大模型来生成,可以使用Dify的知识API通过文本创建文档。以下是一个示例:

curl --location --request POST 'https://api.dify.ai/v1/datasets/<uuid:dataset_id>/document/create_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "QA Pairs",
    "text": "Q: What is Dify?\nA: Dify means Do it for you.\nQ: How to use Dify?\nA: You can use Dify by accessing its API.",
    "indexing_technique": "high_quality",
    "process_rule": {
        "rules": {
                "pre_processing_rules": [{
                        "id": "remove_extra_spaces",
                        "enabled": true
                }, {
                        "id": "remove_urls_emails",
                        "enabled": true
                }],
                "segmentation": {
                        "separator": "###",
                        "max_tokens": 500
                }
        },
        "mode": "custom"
    }
}'

这个命令将直接从提供的文本中创建包含QA对的文档。indexing_technique设置为“high_quality”以确保最佳处理,process_rule包含预处理规则和分段设置以适当处理文本。

如果你有包含QA对的CSV文件,可以使用QAIndexProcessor类中的format_by_template方法来处理CSV文件并从中创建文档。以下是一个Python示例:

from werkzeug.datastructures import FileStorage
from core.rag.index_processor.processor.qa_index_processor import QAIndexProcessor

# 假设 'file' 是包含你的CSV文件的FileStorage对象
file = FileStorage(stream=open('qa_pairs.csv', 'rb'), filename='qa_pairs.csv')

qa_processor = QAIndexProcessor()
documents = qa_processor.format_by_template(file=file)

# 现在你可以将这些文档加载到你的数据集中
dataset = Dataset(id='<uuid:dataset_id>', indexing_technique='high_quality')
qa_processor.load(dataset=dataset, documents=documents)

这种方法允许你直接从CSV文件上传QA对,而不需要使用大模型来生成它们[1][2][3].

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@WalterZou
Copy link

The support of this function is needed. Currently, in many businesses, there is already a content library of QA pairs. There is no need for a large model to generate QA pairs. They can be directly uploaded to the knowledge base. Through Q, problems can be matched, and then the corresponding A can be handed over to the LLM as the query content of the knowledge base for processing.

@datoujiejie
Copy link

1 use create_by_text api to create empty qa docment

curl --location --request POST 'https://api.dify.ai/v1/datasets/<uuid:dataset_id>/document/create_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "doc_form": "qa_model",
    "name": "QA Pairs",
    "text": "",
    "indexing_technique": "high_quality",
    "process_rule": {
        "rules": {
                "pre_processing_rules": [{
                        "id": "remove_extra_spaces",
                        "enabled": true
                }, {
                        "id": "remove_urls_emails",
                        "enabled": true
                }],
                "segmentation": {
                        "separator": "###",
                        "max_tokens": 500
                }
        },
        "mode": "custom"
    }
}'

2 use segments api to upload qa pair

curl --location --request POST 'http://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json'\
--data-raw '{"segment": {"content": "1","answer": "1", "keywords": ["a"]}}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request
Projects
None yet
4 participants