You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to train novel concepts into your blip model, like the way that textual inversions work for stable diffusion image generation? If so is there a training script provided or would one need to be created?
Also, there have been some recent innovations in computer vision software that might prove useful but I don't know how much it would require altering your model to use some of these. Kosmos2 by Microsoft has proved very promising in creating image captions for instance. Much better than my previous blip model I had used. Maybe a more powerful language model would overcome some of BLIPS shortcoming in identifying novel concepts. Further, there are new ways for these types of computer vision softwares to go about scanning an image to ensure, such as SAHI (Slicing Aided Hyper Inference) that allow for the computer to find smaller objects in larger images. I provided both of the links below for you to look at.
Thank you so much for discussing and sharing! Regarding the first question, training new concepts into the model, we think that new scripts are needed for further training. Regarding your proposed new research results such as MLLM, we think it is a very worthwhile practice to try.
Is there a way to train novel concepts into your blip model, like the way that textual inversions work for stable diffusion image generation? If so is there a training script provided or would one need to be created?
Also, there have been some recent innovations in computer vision software that might prove useful but I don't know how much it would require altering your model to use some of these. Kosmos2 by Microsoft has proved very promising in creating image captions for instance. Much better than my previous blip model I had used. Maybe a more powerful language model would overcome some of BLIPS shortcoming in identifying novel concepts. Further, there are new ways for these types of computer vision softwares to go about scanning an image to ensure, such as SAHI (Slicing Aided Hyper Inference) that allow for the computer to find smaller objects in larger images. I provided both of the links below for you to look at.
https://huggingface.co/docs/transformers/main/en/model_doc/kosmos-2
https://github.com/obss/sahi
The text was updated successfully, but these errors were encountered: