-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unsupervised-qa pipelines #3605
Add unsupervised-qa pipelines #3605
Conversation
d2e2b24
to
b6c3589
Compare
+ 可控性好,合成语料和语义检索解耦合,可以人工筛查和删除合成的问答对,也可以添加人工标注的问答对 | ||
+ 端到端 | ||
+ 提供包括问答语料生成、模型服务部署、WebUI可视化一整套端到端智能问答系统能力 | ||
+ 多源数据支持: 支持对 Txt、Word、PDF、Image 多源数据进行解析、识别并写入 ANN 数据库 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
咱们现在可以支持Txt、Word这种多源输入吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
等高升升级了相关pipeline之后接入
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipelines/examples/unsupervised-question-answering/unsupervised_question_answering_example.py
Outdated
Show resolved
Hide resolved
|
||
# 环境变量设置 | ||
export PYTHONPATH=/root/project/paddle/paddlenlp/unsupervised_qa_pipelines/PaddleNLP/pipelines:$PYTHONPATH | ||
export CUDA_VISIBLE_DEVICES=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以直接设置0,很多用户都是1张显卡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续考虑一下windows的部署场景
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
windows这部分看日后再添加吧
|
||
# 环境变量设置 | ||
unset http_proxy && unset https_proxy | ||
export CUDA_VISIBLE_DEVICES=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Models
Description
新增无监督问答pipelines,主要包含以下内容:
-无监督问答pipelines说明文档
-pipeline运行示例 examples/unsupervised-question-answering/unsupervised_question_answering_example.py
-离线生成问答对脚本
-新增节点QAFilter、AnswerExtractor、QuestionGenerator、AnswerExtractorPreprocessor(用于支持文档操作)、QAFilterPostprocessor(用于支持文档操作)
-新增pipeline QAGenerationPipeline
-FastAPI后端代码,承接ElasticSearch ANN检索库、QAGenerationPipeline和SemanticSearchPipeline
-streamlit无监督问答UI前端方案
WEB可视化系统功能:
问答检索
在线问答对生成
在线更新索引库
文件上传并自动生成和载入问答对
问答对生成可选择过滤
问答检索可选择返回答案数量和最大检索数量
效果展示:
![uqa-quick](https://user-images.githubusercontent.com/20476674/200995008-2144e245-6ade-46a7-b397-8edca5da3d39.gif)