🍌 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Pico-Banana-400K is a large-scale dataset of ~400K text–image–edit triplets designed to advance research in text-guided image editing.
Each example contains:

an original image (from Open Images),
a human-like edit instruction, and
the edited result generated and verified by the Nano-Banana model.

The dataset spans 35 edit operations across 8 semantic categories, covering diverse transformations—from low-level color adjustments to high-level object, scene, and stylistic edits.

🧩 Key Features

Feature	Description
Total Samples	~257K single-turn text–image–edit triplets for SFT, ~56K single-turn text-image(positive) - image(negative)-edit for preference learning, and ~72K multi-turn texts-images-edits for multi-turn applications
Source	Open Images
Edit Operations	35 across 8 semantic categories
Categories	Pixel & Photometric, Object-Level, Scene Composition, Stylistic, Text & Symbol, Human-Centric, Scale & Perspective, Spatial/Layout
Image Resolution	512–1024 px
Prompt Generator	Gemini-2.5-Flash
Editing Model	Nano-Banana
Self-Evaluation	Automated judging pipeline using Gemini-2.5-Pro for edit quality

🏗️ Dataset Construction

Pico-Banana-400K is built using a two-stage multimodal generation pipeline:

Instruction Generation
Each Open Images sample is passed to Gemini-2.5-Flash, which writes concise, natural-language editing instructions grounded in visible content. We also provide short instructions summarized by Qwen-2.5-Instruct-7B. Example:
```
{
  "instruction": "Change the red car to blue."
}
```
Editing + Self-Evaluation The Nano-Banana model performs the edit, then automatically evaluates the result using a structured quality prompt that measures: Instruction Compliance (40%) Editing Realism (25%) Preservation Balance (20%) Technical Quality (15%) Only edits scoring above a strict threshold (~0.7) are labeled as successful, forming the main dataset; the remaining ~56K are retained as failure cases for robustness and preference learning.

📊 Dataset Statistics

Nano-Banana-400K contains ~400K image editing data, covering a wide visual and semantic range drawn from real-world imagery.

🧭 Category Distribution

Category	Description	Percentage
Object-Level Semantic	Add, remove, replace, or relocate objects	35%
Scene Composition & Multi-Subject	Contextual and environmental transformations	20%
Human-Centric	Edits involving clothing, expression, or appearance	18%
Stylistic	Domain and artistic style transfer	10%
Text & Symbol	Edits involving visible text, signs, or symbols	8%
Pixel & Photometric	Brightness, contrast, and tonal adjustments	5%
Scale & Perspective	Zoom, viewpoint, or framing changes	2%
Spatial / Layout	Outpainting, composition, or canvas extension	2%

📂 Data Composition

Single-Turn SFT samples (successful edits): ~257K
Single-Turn Preference samples (failure cases): ~56K
Multi-Turn SFT samples (failure cases): ~72K
Gemini-generated instructions: concise, natural, and image-aware
Edit coverage: 35 edit types across 8 semantic categories
Image diversity: includes humans, objects, text-rich scenes, etc from Open Images

🖼️ Visualization

Below are representative examples from different categories:

Category	Example
Object-Level	“Replace the red apple with a green one.”
Scene Composition	“Add sunlight streaming through the window.”
Human-Centric	“Change the person’s expression to smiling.”
Text & Symbol	“Uppercase the text on the billboard.”
Stylistic	“Convert the image to a Van Gogh painting style.”

Pico-Banana-400K provides both breadth (diverse edit operations) and depth (quality-controlled multimodal supervision), making it a strong foundation for training and evaluating text-guided image editing models.

🧠 Applications

Pico-Banana-400K serves as a versatile resource for advancing controllable and instruction-aware image editing.
Beyond single-step editing, the dataset enables multi-turn, conversational editing and reward-based training paradigms.

📦 Dataset Download Guide

The Pico-Banana-400K dataset is hosted on Apple’s public CDN.
You can download each component (single-turn, multi-turn, and preference data) using the provided manifest files.

🖼️ 1. Single-Turn Edited Images

Manifest files: sft link and preference link

🖼️ 2. Multi-Turn Edited Images

Manifest file: multi-turn link

🖼️ 3. Source Images

Urls to download source images are provided along with edit instructions in sft link, preference link, and multi-turn link

🧩 License

Pico-Banana-400K is released under the Creative Commons Attribution–NonCommercial–NoDerivatives (CC BY-NC-ND 4.0) license. ✅ Free for research and non-commercial use ❌ Commercial use and derivative redistribution are not permitted 🖼️ Source images follow the Open Images (CC BY 2.0) license By using this dataset, you agree to comply with the terms of both licenses.

📘 Citation

If you use 🍌 Pico-Banana-400K in your research, please cite it as follows:

@misc{qian2025picobanana,
  title        = {Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing},
  author       = {Yusu Qian and Eli Bocek-Rivele and Liangchen Song and Jiasen Lu and Jialing Tong and Yinfei Yang and Wenze Hu and Zhe Gan},
  year         = {2025},
  note         = {Dataset release (preprint / placeholder citation). Paper forthcoming.},
  url          = {https://github.com/apple/ml-pico-banana-400K},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE_DATA		LICENSE_DATA
Pico-Banana-400K.pdf		Pico-Banana-400K.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍌 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

🧩 Key Features

🏗️ Dataset Construction

📊 Dataset Statistics

🧭 Category Distribution

📂 Data Composition

🖼️ Visualization

🧠 Applications

📦 Dataset Download Guide

🖼️ 1. Single-Turn Edited Images

🖼️ 2. Multi-Turn Edited Images

🖼️ 3. Source Images

🧩 License

📘 Citation

About

Uh oh!

Releases

Packages

Uh oh!

apple/pico-banana-400k

Folders and files

Latest commit

History

Repository files navigation

🍌 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

🧩 Key Features

🏗️ Dataset Construction

📊 Dataset Statistics

🧭 Category Distribution

📂 Data Composition

🖼️ Visualization

🧠 Applications

📦 Dataset Download Guide

🖼️ 1. Single-Turn Edited Images

🖼️ 2. Multi-Turn Edited Images

🖼️ 3. Source Images

🧩 License

📘 Citation

About

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages