An image annotation tool that automatically generates captions for images using the BLIP (Bootstrapping Language-Image Pre-training) model from Salesforce. This application processes images in a specified folder, generates descriptive captions, formats them into comma-separated keywords using an OpenAI assistant, and saves them as text files.
- Automatic Image Captioning: Generates captions for images using a pre-trained BLIP model.
- Folder Context in Captions: Includes the parent folder name in the generated captions to provide additional context.
- Caption Formatting with OpenAI Assistant: Formats generated captions into concise, comma-separated keywords using an OpenAI assistant.
- Batch Processing with Concurrency Control: Processes all images in a folder with controlled concurrency for optimized performance.
- Encoding Conversion: Converts text files to UTF-8 encoding to ensure compatibility.
- Error Handling: Robust error handling to continue processing even if some files cause errors.
- Python 3.7+
- PyTorch
- Transformers
- Pillow
- aiofiles
- chardet
- OpenAI Python Library
- python-dotenv
Clone the repository from GitHub and navigate into the project directory.
git clone https://github.com/birdhouses/ImageAnnotator.git
cd ImageAnnotator
It's recommended to use a virtual environment to manage dependencies.
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install the required Python packages from the requirements.txt
file:
pip install -r requirements.txt
Ensure that you have all dependencies installed, including PyTorch configured with CUDA if you have a GPU available.
Create a .env
file in the root directory to store your environment variables. This file should not be committed to version control. Add your OpenAI API key to the .env
file.
To use the OpenAI assistant feature, you need an OpenAI API key. Sign up for an account at OpenAI, navigate to the API Keys section, and generate a new API key.
├── modules
│ ├── annotator.py # Contains the annotation logic
│ └── assistant.py # Contains the Assistant class integrating with OpenAI API
├── app.py # Main script to run the annotator
├── images # Folder containing images to process
├── .env # Environment variables file (not included in version control)
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Place all the images you want to process into a folder, for example, images
in the project root directory. Supported image formats are .jpg
, .jpeg
, and .png
.
Execute the main script to process all images in the specified folder, generate captions, format them using the OpenAI assistant, and save them as .txt
files alongside the images.
python app.py images
Replace images
with the path to your images folder if different.
By default, the script will not overwrite existing .txt
files. If you wish to overwrite existing captions, modify the overwrite
variable in app.py
:
overwrite = True
The application uses asynchronous processing with controlled concurrency to optimize performance. The concurrency level can be adjusted in annotator.py
by modifying the semaphore
value:
semaphore = asyncio.Semaphore(2) # Increase the number for higher concurrency
The application includes the parent folder name in the generated captions to provide additional context. For example, if your images are in a folder named beach
, the captions will include "photo of beach - ..." to provide context.
If you have existing text files with unknown encoding, you can convert all text files in the folder to UTF-8 encoding using the convert_to_utf8
function in annotator.py
.
To format existing captions in .txt
files, you can call the process_folder_filewords
function in annotator.py
.
Ensure you have the required Python packages installed, including PyTorch, Transformers, Pillow, aiofiles
, chardet
, OpenAI, and python-dotenv
. For optimal performance, especially when processing many images, it's recommended to have a CUDA-compatible GPU.
- BLIP Model: The application uses the
Salesforce/blip-image-captioning-base
model. Ensure you have a stable internet connection to download the model the first time you run the script. - OpenAI Assistant: The assistant is used to format the captions. Ensure that you have set up your OpenAI API key correctly and that your API keys are valid.
- Performance: Processing a large number of images can be time-consuming. Using a GPU and adjusting the concurrency level can significantly reduce processing time.
- CUDA Errors: If you encounter errors related to CUDA, ensure that your GPU drivers are up to date and that PyTorch is installed with CUDA support.
- OpenAI API Errors: Ensure your OpenAI API key is correct and that you have sufficient access rights.
- File Encoding Issues: If you experience issues with text file encodings, use the
convert_to_utf8
function to standardize all text files to UTF-8 encoding.
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
This project is licensed under the MIT License.
- Salesforce Research for the BLIP model.
- Hugging Face Transformers for providing easy access to state-of-the-art models.
- OpenAI for the assistant capabilities.
- Pillow for image processing capabilities.
- aiofiles for asynchronous file operations.
Note: This application is intended for educational purposes. Please ensure you comply with all relevant laws and terms of service when using pretrained models and processing data.
Create a .env
file in the root directory with the following content:
OPENAI_API_KEY=your_openai_api_key
Replace your_openai_api_key
with your actual OpenAI API key.
- OPENAI_API_KEY: Your OpenAI API key for accessing the assistant functionality.
The application uses Python's logging
module to provide informative messages during execution. By default, the logging level is set to INFO
. You can adjust the logging level in app.py
as needed.
logging.basicConfig(level=logging.INFO)
- Assistant Response: The formatting assistant relies on the OpenAI API response. In cases where the assistant response is not as expected, the original caption may be used.
- Image Formats: Only images with
.jpg
,.jpeg
, and.png
extensions are processed. - Asynchronous Processing: The application uses asynchronous I/O for reading and writing files, which may not provide performance benefits on systems without I/O bottlenecks.
- Command-Line Arguments: Implement command-line arguments to allow users to specify options like overwrite, concurrency level, and folder paths.
- Progress Indicators: Add progress bars or status updates to inform users of processing status.
- Error Reporting: Improve error messages and provide more detailed troubleshooting steps.