simpledataset

Utility tools for simple vision image dataset format.

Features

See the summary of a dataset
Convert from/to various dataset formats.
CUI tools to split and concat datasets.
CUI tools to modify labels.

simpledataset-labeler provides web UI to view/edit a dataset.

Install

pip install simpledataset

Usage

# Show summary
dataset_summary <input_dataset>

# For Classification dataset, extract only the images that have the specified labels.
# For Detection dataset, extract only the boxes that have the specified labels.
dataset_filter <input_dataset> <output_dataset> [--include_class <class_id> [<class_id> ...]] [--exclude_class <class_id> [<class_id> ...]]

# Update class labels
dataset_map <input_dataset> <output_dataset> --map <src_class_id> <dst_class_id> [--map <src_class_id> <dst_class_id> [--map...]]

dataset_split # NYI

# Concatenate multiple datasets into one dataset.
dataset_concat <input_txt_filepath> <input_txt_filepath2> [<input_txt_filepath>, ...] <output_txt_filepath>

dataset_shuffle # NYI

# Extract a small subset from the dataset randomly.
dataset_sample <input_txt_filepath> <output_filepath> [-n <num_images>]

# Re-package images and labels into new zip files.
dataset_pack <input_txt_filepath> <output_filepath> [--images_directory=<images_directory>] [--keep_empty_images]

# Remove labels with no actual data.
dataset_defrag <input_txt_filepath> <output_txt_filepath>

# Draw bounding boxes into images.
dataset_draw <input_txt_filepath> <output_dir>

# Convert from/to other dataset types. 
dataset_convert_from {coco|openimages_od|openimages_vr} ... <output_filepath>
dataset_convert_to <input_dataset> {coco|image_classification|object_detection} <output_filepath>

# Create an image-only dataset. Useful for labeling with simpledatasetlabeler.
dataset_create <image_filepaths> -o <output_filepath>

Examples

Please see CONVERT.md for the dataset conversion examples.

Change class ids

For example, if you would like to change MNIST to odd or even classification dataset, you can use dataset_map command. In this example, we use class_id=0 for even numbers, and class_id=1 for odd numbers.

dataset_map mnist.txt new_dataset.txt --map 2 0 --map 3 1 --map 4 0 --map 5 1 --map 6 0 --map 7 1 --map 8 0 --map 9 1

Concatenate two datasets

For example, if you had 2 datasets (mnist_subset and mnist_subset2) and wanted to combine them, you can use dataset_concat command.

dataset_concat mnist_subset/images.txt mnist_subset2/images.txt new_combined.txt

# new_combined.txt has 20 classes at this point. Let's merge them into 10 classes.
dataset_map new_combined.txt new_mapped_10.txt --map 10 0 --map 11 1 --map 12 2 --map 13 3 --map 14 4 --map 15 5 --map 16 6 --map 17 7 --map 18 8 --map 19 9

Python APIs

from simpledataset import SimpleDatasetFactory, DatasetWriter

# Load a dataset
images_filepath = pathlib.Path('images.txt')
dataset = SimpleDatasetFactory().load(images_filepath)

# Save a dataset
output_filepath = pathlib.Path('output.txt')
DatasetWriter().write(dataset, output_filepath)

SIMPLE Dataset format

Currently there are 2 dataset formats, Image Classification and Object Detection. Both datasets have a single txt file, image files and an optional list of label names (labels.txt). In addition to that, Object Detection datasets has label files that contains bbox info.

Image Classification

The main txt format is:

<file> ::= <txt_line> ('\n' <txt_line>)*
<txt_line> ::= <image_filepath> ' ' <labels>
<image_filepath> ::= <filepath> | <zip_filepath> '@' <entry_name>
<labels> ::= <class_id> (',' <class_id>)*

Here is an example txt file.

train_images.zip@0.jpg 0
train_images2.zip@1.jpg 1
image.png 0,1
image2.bmp 0,1,2,3

Object Detection

The main txt format is:

<file> ::= <txt_line> ('\n' <txt_line>)*
<txt_line> ::= <image_filepath> ' ' <label_filepath>
<image_filepath> ::= <filepath> | <zip_filepath> '@' <entry_name>
<label_filepath> ::= <filepath> | <zip_filepath> '@' <entry_name>

The format of a label file is:

<file> ::= <label_line> ('\n' <label_line>)*
<label_line> ::= <class_id> ' ' <bbox_x_min> ' ' <bbox_y_min> ' ' <bbox_x_max> ' ' <bbox_y_max>
<class_id> ::= <int>
<bbox_x_min> ::= <int>      ; 0 <= <bbox_x_min> < <bbox_x_max> <= <image_width>
<bbox_y_min> ::= <int>      ; 0 <= <bbox_y_min> < <bbox_y_max> <= <image_height>
<bbox_x_max> ::= <int>
<bbox_y_max> ::= <int>

Visual Relationship

The main txt format is same with Object Detection.

The format of a label file is:

<file> ::= <label_line> ('\n' <label_line>)*
<label_line> ::= <subject_id> ' ' <subject_bbox_x_min> ' ' <subject_bbox_y_min> ' ' <subject_bbox_x_max> ' ' <subject_bbox_y_max> ' ' <object_id> ' ' <object_bbox_x_min> ' ' <object_bbox_y_min> ' ' <object_bbox_x_max> ' ' <object_bbox_y_max> ' ' <predicate_id>
<subject_id> ::= <int>
<object_id> ::= <int>
<predicate_id> ::= <int>

Usage for remote datasets

NYI. This tool allows you to use datasets on Azure Blob Storage. You can update a dataset on the storage efficiently.

# To download a dataset from Azure Blob Storage.
dataset_download <url_with_container_sas> <output_dir>

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
simpledataset		simpledataset
CONVERT.md		CONVERT.md
README.md		README.md
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simpledataset

Features

Install

Usage

Examples

Change class ids

Concatenate two datasets

Python APIs

SIMPLE Dataset format

Image Classification

Object Detection

Visual Relationship

Usage for remote datasets

About

Releases 8

Languages

shonohs/simpledataset

Folders and files

Latest commit

History

Repository files navigation

simpledataset

Features

Install

Usage

Examples

Change class ids

Concatenate two datasets

Python APIs

SIMPLE Dataset format

Image Classification

Object Detection

Visual Relationship

Usage for remote datasets

About

Resources

Stars

Watchers

Forks

Releases 8

Languages