Feat/2361 segmentation mask #2426

anthonytorlucci · 2024-10-26T08:58:45Z

Pull Request Template

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

TODO: Should add a new example to dataset illustrating new_segmentation_with_items() method for the ImageFolderDataset.

Related Issues/PRs

Issue 2361

Changes

Implemented necessary components for SegmentationMask to be used with ImageFolderDataset.

Testing

Tests mimic the tests of the multilabel classification. New images have been added to the tests/data directory. These images are small 8 x 8 pixel images created with a python script.

codecov · 2024-10-26T14:12:27Z

Codecov Report

Attention: Patch coverage is 93.12977% with 9 lines in your changes missing coverage. Please review.

Project coverage is 82.81%. Comparing base (6fe3ff5) to head (e9cb8d3).
Report is 65 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/burn-dataset/src/vision/image_folder.rs	93.12%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2426      +/-   ##
==========================================
- Coverage   84.88%   82.81%   -2.08%     
==========================================
  Files         769      809      +40     
  Lines       98557   104320    +5763     
==========================================
+ Hits        83660    86391    +2731     
- Misses      14897    17929    +3032

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

laggui · 2024-10-29T12:12:51Z

Thanks for taking the lead on this 😄

I've been busy with the latest release but I should be able to review the changes this week!

anthonytorlucci · 2024-10-29T13:00:42Z

Thanks for taking the lead on this 😄

I've been busy with the latest release but I should be able to review the changes this week!

I'm happy to get it started and become a regular contributor in the future, but I don't think this PR is ready to merge. I'd be happy to take any suggestions and keep working on it including:

additional example in burn book
as part of a larger UNet example (model is the easy part; data is notoriously difficult!)

laggui · 2024-10-29T13:11:29Z

I'm happy to get it started and become a regular contributor in the future, but I don't think this PR is ready to merge.

No worries! In this case, I converted this to a draft PR and when you're ready just mark it as ready for review and/or ping me 🙂

I'd be happy to take any suggestions and keep working on it including:

additional example in burn book

as part of a larger UNet example (model is the easy part; data is notoriously difficult!)

That would be awesome! If you want to keep it simple for the current PR, I would start with the dataset implementation. Then, the examples can be added in follow-up PRs 👍

anthonytorlucci · 2024-10-30T22:28:07Z

For completeness, the python code used to generate the new images and masks is below:

"""
Synthetic Image Segmentation Generation
"""
from typing import List, Tuple
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import cv2

def save_image_to_text_file(image: np.ndarray, filename: Path) -> None:
    """Saves an image as a text file with pixel values."""
    with open(filename, 'w') as file:
        for row in image:
            file.write(' '.join(str(pixel) for pixel in row) + '\n')

def rgb_to_grayscale(rgb: Tuple[int, int, int]) -> int:
  """Converts a list of RGB values to its grayscale equivalent.

  Args:
    rgb: A list of RGB values in the range 0 - 255; [R, G, B].

  Returns:
    An integer representing the grayscale intensity.
  """
  return int(np.round(0.299 * rgb[0] + 0.587 * rgb[1] + 0.114 * rgb[2]))

def checkerboard_pattern(
    height:int,
    width:int,
    color1:Tuple[int, int, int],
    color2:Tuple[int, int, int]) -> tuple[np.ndarray, np.ndarray]:
    # Initialize the source image with zeros
    source_image = np.zeros((height, width, 3), dtype=np.uint8)

    # Initialize the mask with zeros
    mask = np.zeros((height, width, 3), dtype=np.uint8)
    m1 = 1  #rgb_to_grayscale(color1)
    m2 = 2  #rgb_to_grayscale(color2)
    # Let's create a checkerboard pattern
    for i in range(height):
        for j in range(width):
            if (i + j) % 2 == 0:
                source_image[i, j] = color1
                mask[i, j] = (m1, m1, m1)
            else:
                source_image[i, j] = color2
                mask[i, j] = (m2, m2, m2)
    return source_image, mask

def random_distribution_2colors(
    height:int,
    width:int,
    color1:Tuple[int, int, int],
    color2:Tuple[int, int, int],
    random_seed:int = 42) -> tuple[np.ndarray, np.ndarray]:
    # Initialize the source image with zeros
    source_image = np.zeros((height, width, 3), dtype=np.uint8)

    m1 = 1  #rgb_to_grayscale(color1)
    m2 = 2  #rgb_to_grayscale(color2)
    np.random.seed(random_seed)  # For reproducibility
    random_mask = np.random.choice([m1, m2], size=(height, width))
    for i in range(height):
        for j in range(width):
            if random_mask[i, j] == m1:
                source_image[i, j] = color1
            else:
                source_image[i, j] = color2
    mask = np.empty_like(source_image)
    mask[:, :, 0] = random_mask
    mask[:, :, 1] = random_mask
    mask[:, :, 2] = random_mask

    return source_image, mask

def random_distribution_3colors(
    height:int,
    width:int,
    color1:Tuple[int, int, int],
    color2:Tuple[int, int, int],
    color3:Tuple[int, int, int],
    random_seed:int = 42) -> tuple[np.ndarray, np.ndarray]:
    # Initialize the source image with zeros
    source_image = np.zeros((height, width, 3), dtype=np.uint8)

    np.random.seed(random_seed)  # For reproducibility
    m1 = 1  #rgb_to_grayscale(color1)
    m2 = 2  #rgb_to_grayscale(color2)
    m3 = 3  #rgb_to_grayscale(color3)
    random_mask = np.random.choice([m1, m2, m3], size=(height, width))
    for i in range(height):
        for j in range(width):
            if random_mask[i, j] == m1:
                source_image[i, j] = color1
            elif random_mask[i, j] == m2:
                source_image[i, j] = color2
            else:
                source_image[i, j] = color3
    mask = np.empty_like(source_image)
    mask[:, :, 0] = random_mask
    mask[:, :, 1] = random_mask
    mask[:, :, 2] = random_mask

    return source_image, mask


if __name__ == "__main__":
    # Define image dimensions
    IMAGE_HEIGHT, IMAGE_WIDTH = 8, 8

    # Define colors in RGB
    CRIMSON = (220,20,60)
    TEAL = (0,128,128)
    AQUA = (0,255,255)
    TURQUOISE = (64,224,208)
    MAGENTA = (255,0,255)
    ORCHID = (218,112,214)
    BURLY_WOOD = (222,184,135)

    # Generate checkerboard pattern
    image_chkr, mask_chkr = checkerboard_pattern(
        height=IMAGE_HEIGHT,
        width=IMAGE_WIDTH,
        color1=CRIMSON,
        color2=AQUA
    )
    image_rnd2, mask_rnd2 = random_distribution_2colors(
        height=IMAGE_HEIGHT,
        width=IMAGE_WIDTH,
        color1=MAGENTA,
        color2=TEAL,
        random_seed=42  # For reproducibility
    )
    image_rnd3, mask_rnd3 = random_distribution_3colors(
        height=IMAGE_HEIGHT,
        width=IMAGE_WIDTH,
        color1=TURQUOISE,
        color2=ORCHID,
        color3=BURLY_WOOD,
        random_seed=42  # For reproducibility
    )

    # ----- Save the results to disk ---
    results_path = Path(__file__).parent.parent.joinpath('results8x8')
    assert results_path.exists(), "The results directory does not exist. Please create it."
    # Save the image and mask to disk as PNG files
    # NOTE: opencv uses the convention Blue,Green,Red on reads and writes.
    #       convert RGB to BGR, which then gets written out in RGB (strangeness....)
    image_chkr = cv2.cvtColor(image_chkr, cv2.COLOR_RGB2BGR)
    cv2.imwrite(results_path.joinpath("image_checkerboard.png"), image_chkr)
    cv2.imwrite(results_path.joinpath("mask_checkerboard.png"), mask_chkr)
    image_rnd2 = cv2.cvtColor(image_rnd2, cv2.COLOR_RGB2BGR)
    cv2.imwrite(results_path.joinpath("image_random_2colors.png"), image_rnd2)
    cv2.imwrite(results_path.joinpath("mask_random_2colors.png"), mask_rnd2)
    image_rnd3 = cv2.cvtColor(image_rnd3, cv2.COLOR_RGB2BGR)
    cv2.imwrite(results_path.joinpath("image_random_3colors.png"), image_rnd3)
    cv2.imwrite(results_path.joinpath("mask_random_3colors.png"), mask_rnd3)

    # Save the mask array data to a column-delimited text file
    mask_chkr = mask_chkr[..., 0]  # Convert 3D mask to 2D
    save_image_to_text_file(mask_chkr, results_path.joinpath("mask_checkerboard.txt"))

    mask_rnd2 = mask_rnd2[..., 0]  # Convert 3D mask to 2D
    save_image_to_text_file(mask_rnd2, results_path.joinpath("mask_random_2colors.txt"))

    mask_rnd3 = mask_rnd3[..., 0]  # Convert 3D mask to 2D
    save_image_to_text_file(mask_rnd3, results_path.joinpath("mask_random_3colors.txt"))

laggui

Sorry for the delayed review!

On the right path 🙂 But I have a couple comments of the current implementation.

I don't think it is desirable to have to provide the segmentation masks directly when loading a dataset. For image segmentation datasets with a lot of items (and larger image sizes), this won't fit into memory. I think we should instead take paths to the masks and parse them when accessing an item.

laggui · 2024-11-01T19:45:42Z

crates/burn-dataset/src/vision/image_folder.rs

@@ -104,7 +104,8 @@ pub struct ImageDatasetItem {
 enum AnnotationRaw {
    Label(String),
    MultiLabel(Vec<String>),
-    // TODO: bounding boxes and segmentation mask
+    SegmentationMask(Vec<String>),


I think it would be preferable to have the raw form of a segmentation mask point to the mask image path on this. That way, the masks would only be loaded when an item is fetched from the dataset.

laggui · 2024-11-01T19:50:08Z

crates/burn-dataset/src/vision/image_folder.rs

+    /// Create an image segmentation dataset with the specified items.
+    ///
+    /// # Arguments
+    ///
+    /// * `items` - List of dataset items, each item represented by a tuple `(image path, labels)`.
+    /// * `classes` - Dataset class names.
+    ///
+    /// # Returns
+    /// A new dataset instance.
+    pub fn new_segmentation_with_items<P: AsRef<Path>, S: AsRef<str>>(
+        items: Vec<(P, SegmentationMask)>,
+        classes: &[S],
+    ) -> Result<Self, ImageLoaderError> {


With the suggested changes from my previous comment, it would make more sense (and probably be more practical from a user standpoint) to have a new_segmentation method that instead takes a list of (image path, mask path) pairs (so Vec<(P, P)>).

We still need the method to accept the class names, which can be used to map to a class id similar to what is already done. These identifiers would be use to map a pixel value to a class.

For added flexibility we could provide a way to have specific pixel values map to a class.

anthonytorlucci · 2024-11-03T19:48:04Z

Suggested changes made with a few comments and questions:

For the current implementation, we will only consider segmentation masks that are images and assume all channels are the same, i.e. a greyscale image where each pixel value corresponds to a single class.
A new function was created fn image_path_to_pixel_depth which was copied from the PathToImageDatasetItem::map() function for input images. This has lead to duplicated code which I haven't addressed (not wanting to modify anything outside the segmentation scope).
The last line of the method pub fn new_segmentation_with_items(_) calls Self::with_items(_) which creates an InMemoryDataset. As @laggui pointed out, this could be problematic for large images or large datasets. I'm not sure what the solution is here.

laggui

That's great! Most of my comments have been addressed, what's left is basically changes that are relevant to the questions you asked.

Also, to answer

The last line of the method pub fn new_segmentation_with_items() calls Self::with_items() which creates an InMemoryDataset. As @laggui pointed out, this could be problematic for large images or large datasets. I'm not sure what the solution is here.

I don't think this is problematic now since the raw annotations just point to a path. So your InMemoryDataset will hold (image, annotation) path pairs only, which should not be much of an issue.

crates/burn-dataset/src/vision/image_folder.rs

laggui · 2024-11-04T15:16:26Z

crates/burn-dataset/src/vision/image_folder.rs

+            // assume that each channel in the mask image is the same and
+            // each pixel in the first channel corresponds to a class.
+            // multi-channel image segmentation is not supported at this time.
+            Annotation::SegmentationMask(SegmentationMask {
+                mask: mask_image
+                    .into_iter()
+                    .enumerate()
+                    .filter(|(i, _)| i % 3 == 0)
+                    .map(|(_, pixel)| pixel)
+                    .collect(),
+            })


The filtering here will probably not be required given the suggested changes in the previous comment.

filtering was removed and put in the segmentation_mask_to_vec_usize. fixed in the next commit.

crates/burn-dataset/src/vision/image_folder.rs

laggui · 2024-11-04T15:23:35Z

crates/burn-dataset/src/vision/image_folder.rs

+    pub fn new_segmentation_with_items<P: AsRef<Path>, S: AsRef<str>>(
+        items: Vec<(P, P)>,
+        classes: &[S],
+    ) -> Result<Self, ImageLoaderError> {


Awesome! That's exactly what I meant, so there should not be any memory issues and users are not forced to pre-load all of their segmentation masks into memory to create a dataset 👍

crates/burn-dataset/src/vision/image_folder.rs

anthonytorlucci · 2024-11-10T21:29:09Z

It's still unclear to me at which point in the training workflow the image and annotation will be transformed to burn::tensor::Tensor<B, 4, Float> with shape [batch_size, 3, height, width] and burn::tensor::Tensor<B, 4, Int> with shape [batch_size, 1, height, width], respectively.

laggui · 2024-11-11T16:40:13Z

It's still unclear to me at which point in the training workflow the image and annotation will be transformed to burn::tensor::Tensor<B, 4, Float> with shape [batch_size, 3, height, width] and burn::tensor::Tensor<B, 4, Int> with shape [batch_size, 1, height, width], respectively.

That is usually implemented in the Batcher (see this example as reference). Though now that tensors are Sync it could also be done when retrieving a single item in the dataset, and the batcher would only concatenate the tensors.

/edit: you can also check out this post which details the workflow

laggui

Thanks for addressing all my comments 🙏

anthonytorlucci added 2 commits October 19, 2024 07:27

2361-SegmentationMask implementation and initial test

0a9ab49

2361-SegmentationMask validated tests for test data

2632981

laggui self-requested a review October 29, 2024 12:12

laggui marked this pull request as draft October 29, 2024 13:09

2361-SegmentationMask removed unnecessary serialize/deserialize

ca13b9f

anthonytorlucci marked this pull request as ready for review October 30, 2024 22:24

laggui requested changes Nov 1, 2024

View reviewed changes

2361-SegmentationMask raw mask as path rather than Vec<_>

780a14d

2361-SegmentationMask updated synthetic images and fixed test

0ff7c95

laggui requested changes Nov 4, 2024

View reviewed changes

2361-SegmentationMask rever back to Vec<usize>

e9cb8d3

laggui approved these changes Nov 11, 2024

View reviewed changes

laggui merged commit 6e71aaf into tracel-ai:main Nov 11, 2024
11 checks passed

anthonytorlucci deleted the feat/2361-SegmentationMask branch November 11, 2024 23:03

This was referenced Nov 11, 2024

Implement SegmentationMask in image_folder.rs (dataset::vision) #2361

Closed

docs / add segmentation mask example to burn book #2494

Closed

Add Example for Creating Segmentation Mask to Burn Book #2495

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/2361 segmentation mask #2426

Feat/2361 segmentation mask #2426

anthonytorlucci commented Oct 26, 2024 •

edited

Loading

codecov bot commented Oct 26, 2024 •

edited

Loading

laggui commented Oct 29, 2024

anthonytorlucci commented Oct 29, 2024

laggui commented Oct 29, 2024

anthonytorlucci commented Oct 30, 2024 •

edited

Loading

laggui left a comment •

edited

Loading

laggui Nov 1, 2024

laggui Nov 1, 2024

anthonytorlucci commented Nov 3, 2024 •

edited

Loading

laggui left a comment

laggui Nov 4, 2024

anthonytorlucci Nov 10, 2024

laggui Nov 4, 2024

anthonytorlucci commented Nov 10, 2024

laggui commented Nov 11, 2024 •

edited

Loading

laggui left a comment

Feat/2361 segmentation mask #2426

Feat/2361 segmentation mask #2426

Conversation

anthonytorlucci commented Oct 26, 2024 • edited Loading

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

codecov bot commented Oct 26, 2024 • edited Loading

Codecov Report

laggui commented Oct 29, 2024

anthonytorlucci commented Oct 29, 2024

laggui commented Oct 29, 2024

anthonytorlucci commented Oct 30, 2024 • edited Loading

laggui left a comment • edited Loading

Choose a reason for hiding this comment

laggui Nov 1, 2024

Choose a reason for hiding this comment

laggui Nov 1, 2024

Choose a reason for hiding this comment

anthonytorlucci commented Nov 3, 2024 • edited Loading

laggui left a comment

Choose a reason for hiding this comment

laggui Nov 4, 2024

Choose a reason for hiding this comment

anthonytorlucci Nov 10, 2024

Choose a reason for hiding this comment

laggui Nov 4, 2024

Choose a reason for hiding this comment

anthonytorlucci commented Nov 10, 2024

laggui commented Nov 11, 2024 • edited Loading

laggui left a comment

Choose a reason for hiding this comment

anthonytorlucci commented Oct 26, 2024 •

edited

Loading

codecov bot commented Oct 26, 2024 •

edited

Loading

anthonytorlucci commented Oct 30, 2024 •

edited

Loading

laggui left a comment •

edited

Loading

anthonytorlucci commented Nov 3, 2024 •

edited

Loading

laggui commented Nov 11, 2024 •

edited

Loading