for caption generation type : generate caption
for enhancing an image type : enhance image
for trying the object detection feature type : detect objects This is a ai chatbot application powered by google gemini. It has image processing features such as object detection, caption generation, image generation and image enhancement. Our primary goal for this project was to create a chabot which can analyze is environment with its image processing features and generate images.
This project is just a prototype
The backend of this application is written using python language. We used python-flask to create this application, this application runs on local host.
As said, google's gemini api is used for the chatbot. You can replace the api key with your own api key.
Mask-R-CNN model and COCO dataset is used for object detection, BLIP image captioning model is used for cpation generation for images, Stable diffusion model is used for image generation and ESRGAN is used for enhancing the images.
You can change between "float32" and "float16" according to our need (float16 is faster).
You can also change between "DPMSolverMultistepScheduler" and "EulerDiscreteScheduler".
use float16 for faster computational time. If you get black image as an ouput then use float32.
The full path of the COCO anotations file "instances_val2017.json" should be given
The full path of RRDB_ESRGAN_x4.pth model should be given (model is present in models folder)
The full path of instances_val2017.json file should be given
- The image enhancement feature is very slow due to implementation
- laptops with GTX series graphics processors or any other graphics cards less than 6gb vram may need to use float32 in line 35 of app.py file (check if the generated image is a black image when using float16, if the image is not a black image then you can use float16 else use float32)
- image generation and image enhancement features are not recomended to use in laptops/pc's without dedicated graphics
a PC or laptop with dedicated graphics card (with cuda cores) Some of you guys might get an error like this one below:
Traceback (most recent call last): File "d:\testing\Web_Ai_Chatbot-master\app.py", line 4, in File "d:\testing\Web_Ai_Chatbot-master\app.py", line 4, in from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\__init__.py", line 5, in File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\__init__.py", line 5, in from .utils import ( File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\utils\__init__.py", line 100, in from .peft_utils import ( File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\diffusers\utils\peft_utils.py", line 28, in import torch File "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\torch\__init__.py", line 148, in raise err OSError: [WinError 126] The specified module could not be found. Error loading "D:\testing\Web_Ai_Chatbot-master\.venv\lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies.
To solve this you can attach a file "libomp140.x86_64.dll" in the path "C:\Windows\System32" or install Visual Studio C/C++. The program will work after this fix.
Link to download the libomp140.x86_64.dll file : libomp140.x86_64.dll