Skip to content

dheepakshakthi/Ai-Chatbot-with-image-processing-features

Repository files navigation

Hercules-GPT

for image generation type : generate image 'your prompt'
for caption generation type : generate caption
for enhancing an image type : enhance image
for trying the object detection feature type : detect objects

About the project:

This is a ai chatbot application powered by google gemini. It has image processing features such as object detection, caption generation, image generation and image enhancement. Our primary goal for this project was to create a chabot which can analyze is environment with its image processing features and generate images.

This project is just a prototype

The backend of this application is written using python language. We used python-flask to create this application, this application runs on local host.
As said, google's gemini api is used for the chatbot. You can replace the api key with your own api key.

image

Mask-R-CNN model and COCO dataset is used for object detection, BLIP image captioning model is used for cpation generation for images, Stable diffusion model is used for image generation and ESRGAN is used for enhancing the images.

You can change between "float32" and "float16" according to our need (float16 is faster).
You can also change between "DPMSolverMultistepScheduler" and "EulerDiscreteScheduler". image
use float16 for faster computational time. If you get black image as an ouput then use float32.
The full path of the COCO anotations file "instances_val2017.json" should be given
image
The full path of RRDB_ESRGAN_x4.pth model should be given (model is present in models folder)
image
The full path of instances_val2017.json file should be given
image

Limitations:

- The image enhancement feature doesnt work on all png images but works on all jpg and jpeg images
- The image enhancement feature is very slow due to implementation
- laptops with GTX series graphics processors or any other graphics cards less than 6gb vram may need to use float32 in line 35 of app.py file (check if the generated image is a black image when using float16, if the image is not a black image then you can use float16 else use float32)
- image generation and image enhancement features are not recomended to use in laptops/pc's without dedicated graphics

Recomended System Requirements:

a PC or laptop with dedicated graphics card (with cuda cores)

Problems and fixes

Some of you guys might get an error like this one below:

Traceback (most recent call last):
  File "d:\testing\Web_Ai_Chatbot-master\app.py", line 4, in 
  File "d:\testing\Web_Ai_Chatbot-master\app.py", line 4, in 
    from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
    from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
  File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\__init__.py", line 5, in 
  File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\__init__.py", line 5, in 
    from .utils import (
  File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\utils\__init__.py", line 100, in 
    from .peft_utils import (
  File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\utils\peft_utils.py", line 28, in 
    import torch
  File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\torch\__init__.py", line 148, in 
    raise err
OSError: [WinError 126] The specified module could not be found. Error loading "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies.

To solve this you can attach a file "libomp140.x86_64.dll" in the path "C:\Windows\System32" or install Visual Studio C/C++. The program will work after this fix.

Link to download the libomp140.x86_64.dll file : libomp140.x86_64.dll