Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unsupervised-qa pipelines #3605

Merged
merged 12 commits into from
Nov 15, 2022

Conversation

westfish
Copy link
Contributor

@westfish westfish commented Oct 28, 2022

PR types

New features

PR changes

Models

Description

新增无监督问答pipelines,主要包含以下内容:
-无监督问答pipelines说明文档
-pipeline运行示例 examples/unsupervised-question-answering/unsupervised_question_answering_example.py
-离线生成问答对脚本
-新增节点QAFilter、AnswerExtractor、QuestionGenerator、AnswerExtractorPreprocessor(用于支持文档操作)、QAFilterPostprocessor(用于支持文档操作)
-新增pipeline QAGenerationPipeline
-FastAPI后端代码,承接ElasticSearch ANN检索库、QAGenerationPipeline和SemanticSearchPipeline
-streamlit无监督问答UI前端方案

WEB可视化系统功能:
问答检索
在线问答对生成
在线更新索引库
文件上传并自动生成和载入问答对
问答对生成可选择过滤
问答检索可选择返回答案数量和最大检索数量

效果展示:
uqa-quick

@westfish westfish closed this Oct 28, 2022
@westfish westfish force-pushed the unsupervised_qa_pipelines branch from d2e2b24 to b6c3589 Compare October 28, 2022 11:39
@westfish westfish reopened this Oct 28, 2022
@westfish westfish requested a review from wawltor October 28, 2022 11:57
@westfish westfish changed the title Unsupervised qa pipelines Add unsupervised qa pipelines Oct 28, 2022
@westfish westfish changed the title Add unsupervised qa pipelines Add unsupervised-qa pipelines Oct 31, 2022
+ 可控性好,合成语料和语义检索解耦合,可以人工筛查和删除合成的问答对,也可以添加人工标注的问答对
+ 端到端
+ 提供包括问答语料生成、模型服务部署、WebUI可视化一整套端到端智能问答系统能力
+ 多源数据支持: 支持对 Txt、Word、PDF、Image 多源数据进行解析、识别并写入 ANN 数据库
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咱们现在可以支持Txt、Word这种多源输入吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

等高升升级了相关pipeline之后接入

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最新的commit已经支持上传文档并自动解析
image

pipelines/pipelines/nodes/answer_extractor/base.py Outdated Show resolved Hide resolved
pipelines/pipelines/nodes/answer_extractor/base.py Outdated Show resolved Hide resolved
pipelines/pipelines/nodes/answer_extractor/task.py Outdated Show resolved Hide resolved
@westfish westfish self-assigned this Nov 3, 2022

# 环境变量设置
export PYTHONPATH=/root/project/paddle/paddlenlp/unsupervised_qa_pipelines/PaddleNLP/pipelines:$PYTHONPATH
export CUDA_VISIBLE_DEVICES=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以直接设置0,很多用户都是1张显卡

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续考虑一下windows的部署场景

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windows这部分看日后再添加吧


# 环境变量设置
unset http_proxy && unset https_proxy
export CUDA_VISIBLE_DEVICES=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants