Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



9 Commits

Repository files navigation


This repository contains the Birds-to-Words dataset, a collection of paragraph-length descriptions of the differences between pairs of iNaturalist bird photographs.

The Birds-to-Words dataset was introduced in the paper:

Neural Naturalist: Generating Fine-Grained Image Comparisons
Maxwell Forbes, Christine Kaeser-Chen, Piyush Sharma, Serge Belongie
EMNLP 2019

Please see the Neural Naturalist project page for an overview of the research project and publication.

The data is provided in the file birds-to-words-v1.0.tsv in this repository.


Animal 1 Animal 2
A photograph of a brown and white bird
photo: John Ratzlaff (CC BY-NC-ND 4.0)
A photograph of a multicolored bird
photo: Jessica (CC BY-NC 4.0)

Comparative Descriptions (four different writers):

  1. Animal 1 is brown and white with a squatty body with a light brown head. Animal 2 is multi-colored with a light blue and black head.

  2. Animal 1 has a brown head and wings, with a pale breast. The breast also has darker brown speckles on it. Animal 2 has a bright blue area around its eye, with a black patch right along the eye. Animal 2 also has a darker brown breast and greenish wings and back of its head.

  3. Animal 1 has a brown and white face, animal 2 has a black and bright blue face. Animal 1 has a white breast with black spots, animal 2 has a brown breast. Animal 1 has brown wings, animal 2 has green wings.

  4. Animal 1 is much smaller and shorter. Animal 2 has a larger head and longer tail feathers. Animal 1 has extensive spotting on the neck, chest, and belly. Animal 2 has turquoise head patches and brown coloring on the chest and belly.

Data Format

The tsv file is tab-separated and contains the following eleven columns:

Column Name Type Description
1 img1ObservationURL string URL of the iNaturalist photo record (including metadata) corresponding to the left image in the pair
2 img1ImgURL string URL of the left image itself
3 img1Species string Scientific species name for the animal in the left image
4 img1Selection string How the left image was selected in the "pivot-branch" stratified sampling procedure described in the paper. Value is one of: {base, visual, sameSpecies, sameGenus, sameFamily, sameOrder, sameClass}
5 img2ObservationURL string URL of the iNaturalist photo record (including metadata) corresponding to the right image in the pair
6 img2ImgURL string URL of the right image itself
7 img2Species string Scientific species name for the animal in the right image
8 img2Selection string How the right image was selected in the "pivot-branch" stratified sampling procedure described in the paper. Value is one of: {base, visual, sameSpecies, sameGenus, sameFamily, sameOrder, sameClass}
9 split string Split for training models and reporting results. One of: {train, val, test}
10 annN int We collect up to five annotations of each image pair. This is the annotation number of this instance. Value is one of: {1,2,3,4,5}
11 description string A natural language paragraph describing the differences between the animals in the two photographs


The Birds-to-Words dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International License. For the full license, see LICENSE.txt.