Skip to content

Commit

Permalink
Update new_dataset_to_armory.md
Browse files Browse the repository at this point in the history
Signed-off-by: Etienne Deprit <etienne.deprit@twosixtech.com>
  • Loading branch information
deprit authored Nov 19, 2024
1 parent 8361cf7 commit 2e75ebe
Showing 1 changed file with 84 additions and 33 deletions.
117 changes: 84 additions & 33 deletions docs/new_dataset_to_armory.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,80 +4,131 @@ This file presents two examples of how to add new datasets into armory-library.

## Torchvision

The [SAMPLE (Synthetic and Measured Paired Labeled Experiment) dataset](https://github.com/benjaminlewis-afrl/SAMPLE_dataset_public) consists of measured SAR imagery from the MSTAR collection (Moving and Stationary Target Acquisition and Recognition) paired with synthetic SAR imagery.
The [SAMPLE (Synthetic and Measured Paired Labeled Experiment)](https://github.com/benjaminlewis-afrl/SAMPLE_dataset_public) dataset consists of measured SAR imagery from the MSTAR collection (Moving and Stationary Target Acquisition and Recognition) paired with synthetic SAR imagery.

The MSTAR dataset contains SAR imagery of 10 types of military vehicles illustrated in the figure below.

![MSTAR classes](./assets/MSTAR-classes.png)

[Anas, H., Majdoulayne, H., Chaimae, A., & Nabil, S. M. (2020). Deep learning for sar image classification. In Intelligent Systems and Applications: Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 1 (pp. 890-898). Springer International Publishing.](https://link.springer.com/chapter/10.1007/978-3-030-29516-5_67)

For a Torchvision dataset, we load the dataset using the `ImageFolder` dataset builder, which automatically infers the class labels based on the directory names.
The SAMPLE dataset is organized according to the `ImageFolder` pattern. The imagery is split into two normalizations -- decibel and quarter power magnitude (QPM).
For each normalization type, real and synthetic SAR gray-scale imagery is partitioned into folders according to vehicle type.
```
|-SAMPLE_dataset_public
| |-png_images
| | |-qpm
| | | |-real
| | | | |-m1
| | | | |-t72
| | | | |-btr70
| | | | |-m548
| | | | |-zsu23
| | | | |-bmp2
| | | | |-m35
| | | | |-m2
| | | | |-m60
| | | | |-2s1
```

For a Torchvision dataset, we load the dataset using the `ImageFolder` dataset class, which automatically infers
the class labels based on the directory names. The `transform` parameter applies a chain of transformations
that resize, normalize and ouput the images as numpy arrays.

```python
import numpy as np
import torchvision as tv
from tv import transforms as T

tmp_dir = Path('/tmp')
sample_dir = tmp_dir / Path('SAMPLE_dataset_public')
data_dir = sample_dir / Path("png_images", "qpm", "real")
raw_dataset = datasets.load_dataset('imagefolder', data_dir=data_dir)

tv_dataset = tv.datasets.ImageFolder(
root=data_dir,
transform=T.Compose(
[
T.Resize(size=(224, 224)),
T.ToTensor(), # HWC->CHW and scales to 0-1
T.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
T.Lambda(np.asarray),
]
),
)
```

Next, we define train, validation, and test splits.
Next, we use scikit-learn's [`train_test_split`](https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.train_test_split.html)
function to generate stratified train and test splits based on the dataset target classes.
```python
train_dataset = raw_dataset['train'].train_test_split(
test_size=3/10,
stratify_by_column='label'
from torch.utils.data import Subset
from sklearn.model_selection import train_test_split

# generate indices: instead of the actual data we pass in integers instead
train_indices, test_indices, _, _ = train_test_split(
range(len(tv_dataset)),
tv_dataset.targets,
stratify=tv_dataset.targets,
test_size=0.25,
)

test_dataset = train_dataset['test'].train_test_split(
test_size=2/3,
stratify_by_column='label'
)
# generate subset based on indices
train_split = Subset(tv_dataset, train_indices)
test_split = Subset(tv_dataset, test_indices)
```

mstar_dataset = datasets.DatasetDict(
{
'train': train_dataset['train'],
'valid': test_dataset['train'],
'test': test_dataset['test']
}
)
Next, we wrap the training split into an armory-library dataset with the `TupleDataset` class.
```python
armory_dataset = armory.dataset.TupleDataset(train_split, ("image", "label"))
```

Last, we integrate the dataset into Armory.
Finally, we use the tuple dataset above to define an `ImageClassificationDataLoader` and
evaluation dataset. Note that the armory-library `normalized_scale` must match the normalization
transform defined by the Torchvision dataset.
```python
normalized_scale = armory.data.Scale(
dtype=armory.data.DataType.FLOAT,
max=1.0,
mean=(0.5, 0.5, 0.5),
std=(0.5, 0.5, 0.5),
)

batch_size = 16
shuffle = False

unnormalized_scale = armory.data.Scale(
dtype=armory.data.DataType.UINT8,
max=255,
)

mstar_dataloader = armory.dataset.ImageClassificationDataLoader(
mstar_dataset['train'],
dataloader = armory.dataset.ImageClassificationDataLoader(
armory_dataset,
dim=armory.data.ImageDimensions.CHW,
scale=unnormalized_scale,
scale=normalized_scale,
image_key="image",
label_key="label",
batch_size=batch_size,
shuffle=shuffle,
)

armory_dataset = armory.evaluation.Dataset(
name="MSTAR-qpm-real",
dataloader=mstar_dataloader,
evaluation_dataset = armory.evaluation.Dataset(
name="food-101",
dataloader=dataloader,
)
```

## Hugging Face

To demonstrate a new Hugging Face dataset, we load the [VisDrone2019 dataset](https://github.com/VisDrone/VisDrone-Dataset) object detection dataset. The VisDrone2019 dataset, created by the AISKYEYE team at Tianjin University, China, includes 288 video clips and 10,209 images from various drones, providing a comprehensive benchmark with over 2.6 million manually annotated bounding boxes for objects like pedestrians and vehicles across diverse conditions and locations.
To demonstrate a new Hugging Face dataset, we load the [VisDrone2019 dataset](https://github.com/VisDrone/VisDrone-Dataset) object detection dataset.
The VisDrone2019 dataset, created by the AISKYEYE team at Tianjin University, China, includes 288 video clips and 10,209 images from various drones,
providing a comprehensive benchmark with over 2.6 million manually annotated bounding boxes for objects like pedestrians and vehicles across diverse
conditions and locations.

As a first step, we download the [validation split](https://drive.google.com/file/d/1bxK5zgLn0_L8x276eKkuYA_FzwCIjb59/view?usp=sharing) to a temporary directory. Note that we do not need to unzip the archive for processing as a Hugging Face dataset.
As a first step, we download the [validation split](https://drive.google.com/file/d/1bxK5zgLn0_L8x276eKkuYA_FzwCIjb59/view?usp=sharing) to a temporary directory.
Note that we do not need to unzip the archive for processing as a Hugging Face dataset.
```python
tmp_dir = Path('/tmp')
visdrone_dir = tmp_dir / Path('visdrone_2019')
visdrone_dir.mkdir(exist_ok=True)

visdrone_val_zip = visdrone_dir / Path('VisDrone2019-DET-val.zip')
```
The VisDrone 2019 Task 1 dataset is organized as parallel folders of images and annotations containing pairs of image and annotation files, respectively. We then need to designate the object categories and name the fields in the annotation files.
The VisDrone 2019 Task 1 dataset is organized as parallel folders of images and annotations containing pairs of image and annotation files, respectively.
We then need to designate the object categories and name the fields in the annotation files.
```python
CATEGORIES = [
'ignored',
Expand Down

0 comments on commit 2e75ebe

Please sign in to comment.