Skip to content

Commit

Permalink
Appflow (PaddlePaddle#351)
Browse files Browse the repository at this point in the history
appflow add qwen_vl sdxl
  • Loading branch information
LokeZhou authored Dec 14, 2023
1 parent 85ac514 commit 9fd8c06
Show file tree
Hide file tree
Showing 8 changed files with 252 additions and 33 deletions.
11 changes: 7 additions & 4 deletions applications/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ PaddleMIX提供一键预测功能,无需训练,这里以开放世界检测
>>> from paddlemix.appflow import Appflow
>>> from ppdiffusers.utils import load_image

>>> task = Appflow(task="openset_det_sam",
>>> task = Appflow(app="openset_det_sam",
models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
static_mode=False) #如果开启静态图推理,设置为True,默认动态图
>>> url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
Expand Down Expand Up @@ -73,10 +73,11 @@ Appflow提供丰富的开箱即用工具集,覆盖跨模态多场景应用,
### 跨模态多场景应用
| 应用名称 | 调用模型 | 静态图推理 |
| :--------------------------------- | -------------------------------- | ----------|
| [视觉语言对话(Vision-Language-Chat)](./VLChat/README.md) | `qwen-vl-chat-7b` | 🚧 |
| [开放世界检测分割(Openset-Det-Sam)](./CVinW/README.md/#开放世界检测分割grounded-sam-detect-and-segment-everything-with-text-prompt) | `grounded sam` ||
| [自动标注(AutoLabel)](./Automatic_label/README.md/#自动标注autolabel) | `blip2 grounded sam` ||
| [检测框引导的图像编辑(Det-Guided-Inpainting)](./Inpainting/README.md/#检测框引导的图像编辑det-guided-inpainting) | `chatglm-6b stable-diffusion-2-inpainting grounded sam` ||
| [文图生成(Text-to-Image Generation)](./text2image/README.md/#文图生成text-to-image-generation) | `runwayml/stable-diffusion-v1-5` | [fastdeploy](../ppdiffusers/deploy/README.md/#文图生成text-to-image-generation) |
| [文图生成(Text-to-Image Generation)](./text2image/README.md/#文图生成text-to-image-generation) | `runwayml/stable-diffusion-v1-5 stabilityai/stable-diffusion-xl-base-1.0` | [fastdeploy](../ppdiffusers/deploy/README.md/#文图生成text-to-image-generation) |
| [文本引导的图像放大(Text-Guided Image Upscaling)](./image2image/README.md/#文本引导的图像放大text-guided-image-upscaling) | `ldm-super-resolution-4x-openimages`||
| [文本引导的图像编辑(Text-Guided Image Inpainting)](./Inpainting/README.md/#文本引导的图像编辑text-guided-image-inpainting) | `stable-diffusion-2-inpainting` | [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像编辑text-guided-image-inpainting) |
| [文本引导的图像变换(Image-to-Image Text-Guided Generation)](./image2image/README.md/#文本引导的图像变换image-to-image-text-guided-generation) | `stable-diffusion-v1-5` | [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像变换image-to-image-text-guided-generation) |
Expand All @@ -86,6 +87,8 @@ Appflow提供丰富的开箱即用工具集,覆盖跨模态多场景应用,
| [音频对话(Audio-to-Chat Generation)](./AudioChat/README.md/#音频对话audio-to-chat-generation) | `chatglm-6b whisper fastspeech2` | |
| [音乐生成(Music Generation)](./MusicGeneration/README.md/#音乐生成music-generation) | `chatglm-6b minigpt4 audioldm` | |



更多应用持续开发中......

* ✅: Supported
* 🚧: In Progress
* ❌: Not Supported
8 changes: 7 additions & 1 deletion applications/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ PaddleMIX provides Appflow without training, and can directly input data to outp
>>> from paddlemix.appflow import Appflow
>>> from ppdiffusers.utils import load_image
>>> task = Appflow(task="openset_det_sam",
>>> task = Appflow(app="openset_det_sam",
models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
static_mode=False) #如果开启静态图推理,设置为True,默认动态图
>>> url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"
Expand Down Expand Up @@ -67,6 +67,7 @@ Appflow provides a rich set of out of the box tools that cover cross modal and m
### Multi Modal And Scenario
| name | models | static mode |
| :--------------------------------- | -------------------------------- | ----------|
| [视觉语言对话(Vision-Language-Chat)](./VLChat/README.md) | `qwen-vl-chat-7b` | 🚧 |
| [开放世界检测分割(Openset-Det-Sam)](./CVinW/README.md/#开放世界检测分割grounded-sam-detect-and-segment-everything-with-text-prompt) | `grounded sam` ||
| [自动标注(AutoLabel)](./Automatic_label/README.md/#自动标注autolabel) | `blip2 grounded sam` ||
| [检测框引导的图像编辑(Det-Guided-Inpainting)](./Inpainting/README.md/#检测框引导的图像编辑det-guided-inpainting) | `chatglm-6b stable-diffusion-2-inpainting grounded sam` ||
Expand All @@ -76,4 +77,9 @@ Appflow provides a rich set of out of the box tools that cover cross modal and m
| [文本引导的图像变换(Image-to-Image Text-Guided Generation)](./image2image/README.md/#文本引导的图像变换image-to-image-text-guided-generation) | `stable-diffusion-v1-5` | [fastdeploy](../ppdiffusers/deploy/README.md/#文本引导的图像变换image-to-image-text-guided-generation) |
| [文本条件的视频生成(Text-to-Video Generation)](./text2video/README.md/#文本条件的视频生成text-to-video-generation) | `text-to-video-ms-1.7b` ||


More applications under continuous development......

* ✅: Supported
* 🚧: In Progress
* ❌: Not Supported
44 changes: 44 additions & 0 deletions applications/VLChat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
### 视觉语言对话(Vision-Language-Chat)

#### 1. 应用介绍
输入图像或文字进行多轮对话,包括captions、grounding、视觉定位能力


#### 2. Demo

example:

```python

import paddle
from paddlemix.appflow import Appflow
from ppdiffusers.utils import load_image
paddle.seed(1234)
task = Appflow(app="image2text_generation",
models=["qwen-vl/qwen-vl-chat-7b"])
image= "https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg"
prompt = "这是什么?"
result = task(image=image,prompt=prompt)

print(result["result"])

prompt2 = "框出图中公交车的位置"
result = task(prompt=prompt2)
print(result["result"])

```

输入图片:<center><img src="https://github.com/LokeZhou/PaddleMIX/assets/13300429/95f73037-097e-4712-95be-17d5ca489f11" /></center>

prompt:“这是什么?”

输出:
```
这是一张红色城市公交车的图片,它正在道路上行驶,穿越城市。该区域似乎是一个住宅区,因为可以在背景中看到一些房屋。除了公交车之外,还有其他车辆,包括一辆汽车和一辆卡车,共同构成了交通场景。此外,图片中还显示了一一个人,他站在路边,可能是在等待公交车或进行其他活动。
```
prompt2:“框出图中公交车的位置”

输出:
```
<ref>公交车</ref><box>(178,280),(803,894)</box>
```
12 changes: 7 additions & 5 deletions applications/text2image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
import paddle
from paddlemix.appflow import Appflow

paddle.seed(1024)
paddle.seed(42)
task = Appflow(app="text2image_generation",
models=["stabilityai/stable-diffusion-v1-5"]
models=["stabilityai/stable-diffusion-xl-base-1.0"]
)
prompt = "a photo of an astronaut riding a horse on mars."
result = task(prompt=prompt)['result']
Expand All @@ -19,7 +19,9 @@ result = task(prompt=prompt)['result']

<div align="center">

| prompt | Generated Image |
|:----:|:----:|
| a photo of an astronaut riding a horse on mars | ![astronaut_rides_horse_sd](https://github.com/LokeZhou/PaddleMIX/assets/13300429/1622fb1e-c841-4531-ad39-9c5092a2456c)|
| model| prompt | Generated Image |
|:----:|:----:|:----:|
|stabilityai/stable-diffusion-v1-5| a photo of an astronaut riding a horse on mars | ![astronaut_rides_horse_sd](https://github.com/LokeZhou/PaddleMIX/assets/13300429/1622fb1e-c841-4531-ad39-9c5092a2456c)|
|stabilityai/stable-diffusion-xl-base-1.0| a photo of an astronaut riding a horse on mars |![sdxl_text2image](https://github.com/LokeZhou/PaddleMIX/assets/13300429/9e339d97-18cd-4cfc-89a6-c545e2872f7e) |
</div>

34 changes: 21 additions & 13 deletions paddlemix/appflow/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,20 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from .audio_asr import AudioASRTask
from .image2image_text_guided_generation import (
StableDiffusionImg2ImgTask,
StableDiffusionUpscaleTask,
StableDiffusionXLImg2ImgTask,
)
from .image2text_generation import Blip2CaptionTask, MiniGPT4Task
from .image2text_generation import Blip2CaptionTask, MiniGPT4Task, QwenVLChatTask
from .openset_det_sam import OpenSetDetTask, OpenSetSegTask
from .text2audio_generation import AudioLDMPipelineTask
from .text2image_generation import StableDiffusionTask, VersatileDiffusionDualGuidedTask
from .text2image_inpaiting import StableDiffusionInpaintTask
from .text2speech_synthesize import AudioTTSTask
from .text2text_generation import ChatGlmTask
from .text2video_generation import TextToVideoSDTask
from .audio_asr import AudioASRTask
from .text2audio_generation import AudioLDMPipelineTask
from .text2speech_synthesize import AudioTTSTask

APPLICATIONS = {
"openset_det_sam": {
Expand Down Expand Up @@ -91,6 +92,10 @@
"task_class": StableDiffusionTask,
"task_flag": "text2image_generation-stable-diffusion-v1-5",
},
"stabilityai/stable-diffusion-xl-base-1.0": {
"task_class": StableDiffusionTask,
"task_flag": "text2image_generation-stable-diffusion-xl-base-1.0",
},
},
"default": {
"model": "stabilityai/stable-diffusion-2",
Expand All @@ -101,7 +106,11 @@
"Linaqruf/anything-v3.0": {
"task_class": StableDiffusionImg2ImgTask,
"task_flag": "image2image_text_guided_generation-Linaqruf/anything-v3.0",
}
},
"stabilityai/stable-diffusion-xl-refiner-1.0": {
"task_class": StableDiffusionXLImg2ImgTask,
"task_flag": "image2image_text_guided_generation-stabilityai/stable-diffusion-xl-refiner-1.0",
},
},
"default": {
"model": "Linaqruf/anything-v3.0",
Expand Down Expand Up @@ -150,6 +159,10 @@
"task_class": MiniGPT4Task,
"task_flag": "image2text_generation-MiniGPT4-7B",
},
"qwen-vl/qwen-vl-chat-7b": {
"task_class": QwenVLChatTask,
"task_flag": "image2text_generation-QwenVLChat-7B",
},
},
"default": {
"model": "paddlemix/blip2-caption-opt2.7b",
Expand All @@ -159,23 +172,21 @@
"models": {
"conformer_u2pp_online_wenetspeech": {
"task_class": AudioASRTask,
"task_flag": "audio2caption-asr-conformer_u2pp_online_wenetspeech"
"task_flag": "audio2caption-asr-conformer_u2pp_online_wenetspeech",
},
"THUDM/chatglm-6b": {
"task_class": ChatGlmTask,
"task_flag": "audio2caption-chatglm-6b",
},

}
},

"music_generation": {
"models": {
"miniGPT4/MiniGPT4-7B": {
"task_class": MiniGPT4Task,
"task_flag": "music_generation-MiniGPT4-7B",
},
"THUDM/chatglm-6b": {
"THUDM/chatglm-6b": {
"task_class": ChatGlmTask,
"task_flag": "music_generation-chatglm-6b",
},
Expand All @@ -185,12 +196,11 @@
},
}
},

"audio_chat": {
"models": {
"conformer_u2pp_online_wenetspeech": {
"task_class": AudioASRTask,
"task_flag": "audio_chat-asr-conformer_u2pp_online_wenetspeech"
"task_flag": "audio_chat-asr-conformer_u2pp_online_wenetspeech",
},
"speech": {
"task_class": AudioTTSTask,
Expand All @@ -200,8 +210,6 @@
"task_class": ChatGlmTask,
"task_flag": "audio_chat-chatglm-6b",
},

}
},

}
61 changes: 60 additions & 1 deletion paddlemix/appflow/image2image_text_guided_generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from ppdiffusers import StableDiffusionImg2ImgPipeline, StableDiffusionUpscalePipeline
import paddle

from ppdiffusers import (
StableDiffusionImg2ImgPipeline,
StableDiffusionUpscalePipeline,
StableDiffusionXLImg2ImgPipeline,
)

from .apptask import AppTask

Expand Down Expand Up @@ -126,3 +132,56 @@ def _postprocess(self, inputs):
"""

return inputs


class StableDiffusionXLImg2ImgTask(AppTask):
def __init__(self, task, model, **kwargs):
super().__init__(task=task, model=model, **kwargs)

# Default to static mode
self._static_mode = False
self._construct_model(model)

def _construct_model(self, model):
"""
Construct the inference model for the predictor.
"""

# build model
model_instance = StableDiffusionXLImg2ImgPipeline.from_pretrained(
model, paddle_dtype=paddle.float16, variant="fp16"
)

self._model = model_instance

def _preprocess(self, inputs):
""" """
image = inputs.get("image", None)
assert image is not None, "The image is None"
prompt = inputs.get("prompt", None)
assert prompt is not None, "The prompt is None"

return inputs

def _run_model(self, inputs):
"""
Run the task model from the outputs of the `_preprocess` function.
"""

result = self._model(
prompt=inputs["prompt"],
image=inputs["image"],
).images[0]

inputs.pop("prompt", None)
inputs.pop("image", None)
inputs["result"] = result

return inputs

def _postprocess(self, inputs):
"""
The model output is tag ids, this function will convert the model output to raw text.
"""

return inputs
Loading

0 comments on commit 9fd8c06

Please sign in to comment.