LongPrompt-LLamaGen is a improved LLamaGen model that combines long-text prompts with cutting-edge AI technology, providing unprecedented image generation capabilities for creatives and developers.
- High-Quality Training Data: Fine-tuned on 500,000 high-quality images
- Long Text Understanding: Each image accompanied by 300+ token prompts
- Intelligent Prompt Optimization: Built-in prompt refining with Complex Human Instruction for enhanced output quality
- Continuous Updates: Our team constantly optimizes the model to stay ahead of the curve
- Install the required packages following the instructions in the original LlamaGen repository.
- Download our pre-trained model from HuggingFace Link, the model size is about 3.11G. And install&download Language models for text-conditional image generation:
pip install ftfy
pip install transformers
pip install accelerate
pip install sentencepiece
pip install pandas
pip install bs4
Download flan-t5-xl models from flan-t5-xl and put into the folder of ./pretrained_models/t5-ckpt/
Download vq-ds16-t2i models from vq-ds16-t2i and put into the folder of ./pretrained_models/vq-ckpt/
- Modify
sample_t2i.py
to specify the paths of the pre-trained model, t5-ckpt, and vq-ckpt. - Use the model to generate images by following the example code provided in the repository.
- For Complex Human Instruction, please install Ollama, our Complex Human Instruction using ollama as backend, and automatically load&using model.
Please be aware that LongPrompt-LLamaGen is an ongoing development project. The model is continuously being trained and improved. We kindly ask for your patience as we work on refining the model.
A concise technical report detailing our methodology and findings will be released soon. Stay tuned for updates!
Thank you for your interest and support in this project.
We would like to express our gratitude to the LlamaGen team for their groundbreaking work: Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation. Their research has been instrumental in advancing the field of image generation using autoregressive models.