Sample order and ID #2516

wangjiawen2013 · 2024-11-20T09:05:22Z

Hi,
When training/testing a customized dataset, how does burn determines the samples order ? For example, here is the directory tree of cifar10:

When testing model with this customized dataset (png format for teaching & tutorials), I can get the predicted results, but how to pair the results with the samples ? Can burn output both the sample directory and the predicted results together ?
For hugginface dataset, such as mnist, we don't even know the sample ID and cannot see the images, It's also necessary to output the samples' names !

laggui · 2024-11-20T19:20:52Z

The ImageFolderDataset items are collected by globbing the directory to find the files that match the image extension pattern. As indicated by the globwalk doc, the order of elements yielded is unspecified. This means the order is not guaranteed.

The current implementation also discards the original image path for an ImageDatasetItem (mapper implementation here). The item only contains the parsed data (i.e., image data and label for the image classification case).

If you'd like to keep this information for the use case you described, the ImageDatasetItem would have to be modified to keep the image path in an additional field. And then to keep this information within the batch, the batcher implementation would have to be adapted as well.

Perhaps the changes to ImageDatasetItem could be incorporated in Burn, not sure how important that is though.

wangjiawen2013 · 2024-11-21T01:39:47Z

What does "the order is not guaranteed" mean ? Does it mean that the order of elements yielded by globwalk is randomly and will be different when I re-run it ?
I think it is important to output the items path. As the modification is complicated, is there a workround that I can output it by myself (such as I run globwalk again and output them) ?

laggui · 2024-11-21T15:18:54Z

Ahhh sorry I totally forgot one implementation detail: we actually sort the paths so the order of the items will be sorted (as long as you don't shuffle with the dataloader).

You could double check that the data is always the same to validate. But the paths will still not be accessible.

wangjiawen2013 · 2024-11-22T10:23:49Z

I copied the code of globbing and outputed the images path. Don't you need the images path ? If not, how do you pair your predictions with the images ?

laggui · 2024-11-22T19:31:17Z

Yeah the images path are used to get the corresponding image data and ground-truth label! The list of image/annotation pairs is stored in a vec of ImageDatasetItemRaw items. And as I mentioned before, each item is mapped to an ImageDatasetItem that contains the image data (read from the path) and annotation. So the original image path is not preserved further.

wangjiawen2013 · 2024-11-23T03:59:56Z

I mean when performing inference, we need know the image id or path. for example, in burn's guide example, the infer.rs will give the prediction result, but how to get the image id of each prediction? ![image](https://github.com/user-attachments/assets/22e4ede5-71f8-4f60-b164-359fb289cc03)

wangjiawen2013 · 2024-11-23T08:35:11Z

Especially when we use customized datasets for inference, we need to know the image id corresponding to the prediction result. ![image](https://github.com/user-attachments/assets/f75eeebc-fd18-4426-8479-342aee6c77ef)

laggui · 2024-11-25T13:07:40Z

As I said earlier, the image id (i.e., path to the source) is discarded when reading the image data.

This happens specifically within the mapper that transforms an ImageDatasetItemRaw, which contains the image_path and "raw" annotation fields, to an ImageDatasetItem containing the image and annotation data.

There is no way with the current implementation to preserve that information because it is not currently kept as a dataset item field. But you could easily adapt the code to simply have something like this in your implementation:

/// Modified image dataset item that preserves the image source field.
#[derive(Debug, Clone, PartialEq)]
pub struct ImageDatasetItem {
    /// Image as a vector with a valid image type.
    pub image: Vec<PixelDepth>,

    /// Annotation for the image.
    pub annotation: Annotation,

    /// Original image source.
    pub image_path: String,
}

impl Mapper<ImageDatasetItemRaw, ImageDatasetItem> for PathToImageDatasetItem {
    /// Convert a raw image dataset item (path-like) to a 3D image array with a target label.
    fn map(&self, item: &ImageDatasetItemRaw) -> ImageDatasetItem {
        let annotation = parse_image_annotation(&item.annotation, &self.classes);

        // Load image from disk
        let image = image::open(&item.image_path).unwrap();

        // Image as Vec<PixelDepth>
        let img_vec = match image.color() {
           // ...
        };

        ImageDatasetItem {
            image: img_vec,
            annotation,
            // Keep the image source as a field
            image_path: item.image_path.display().to_string(),
        }
    }
}

wangjiawen2013 · 2024-11-27T02:32:37Z

Yes, you're right. I forked burn and modify ImageDatasetItem and batch, the output looks good and this is actually what I want !
Why not adding this field to burn ? It can give great convinence to users.

laggui · 2024-11-27T14:13:47Z

I don't have any opposition to this addition.

If you already made the changes you could make a PR from your fork 🙂

laggui mentioned this issue Nov 28, 2024

images source #2558

Merged

2 tasks

laggui closed this as completed Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample order and ID #2516

Sample order and ID #2516

wangjiawen2013 commented Nov 20, 2024 •

edited

Loading

laggui commented Nov 20, 2024

wangjiawen2013 commented Nov 21, 2024

laggui commented Nov 21, 2024

wangjiawen2013 commented Nov 22, 2024

laggui commented Nov 22, 2024

wangjiawen2013 commented Nov 23, 2024 via email •

edited

Loading

wangjiawen2013 commented Nov 23, 2024 via email •

edited

Loading

laggui commented Nov 25, 2024

wangjiawen2013 commented Nov 27, 2024

laggui commented Nov 27, 2024

Sample order and ID #2516

Sample order and ID #2516

Comments

wangjiawen2013 commented Nov 20, 2024 • edited Loading

laggui commented Nov 20, 2024

wangjiawen2013 commented Nov 21, 2024

laggui commented Nov 21, 2024

wangjiawen2013 commented Nov 22, 2024

laggui commented Nov 22, 2024

wangjiawen2013 commented Nov 23, 2024 via email • edited Loading

wangjiawen2013 commented Nov 23, 2024 via email • edited Loading

laggui commented Nov 25, 2024

wangjiawen2013 commented Nov 27, 2024

laggui commented Nov 27, 2024

wangjiawen2013 commented Nov 20, 2024 •

edited

Loading

wangjiawen2013 commented Nov 23, 2024 via email •

edited

Loading

wangjiawen2013 commented Nov 23, 2024 via email •

edited

Loading