- Generative Adversarial Networks
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- Conditional Generative Adversarial Nets
- pix2pix
- StarGANs
- Contrastive Learning for Unpaired Image-to-Image Translation
- ViT
- MLP-Mixer
- Towards General Purpose Vision Systems (GPV)
- SelfPatch
- Towards Total Recall in Industrial Anomaly Detection (PatchCore)
- Emerging Properties of Self-Supervised Vision Transformers (DINO)
- SimCLR
- MoCo
- MoCoV3
- BYOL
- SwAV
- What do Self-Supervised ViT learn?
- MAE
- iBOT
- UniAD
- MetaFormer
- Unified-IO
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- LARGE LANGUAGE MODELS ARE HUMAN-LEVEL PROMPT ENGINEERS
- Learning Transferable Visual Models From Natural Language Supervision
- WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
- Instruction Induction: From Few Examples to Natural Language Task Descriptions
- OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
- Optimizing Prompts for Text-to-Image Generation
- Grounded Language-Image Pre-training
- Universal Few-shot Learning Of Dense Prediction Tasks With Viual Token Matching
- Anomaly Detection Requires Better Representations
- Towards Open World Object Detection
- Flamingo: a Visual Language Model for Few-Shot Learning
- Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
- Learning to Prompt for Vision-Language Models
- Conditional Prompt Learning For Vision-Language Models
- POUF: Prompt-oriented Unsupervised Fine-tuning For Large Pre-trained Models
- SimpleNet: A Simple Network for Image Anomaly Detection and Localization