Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/2361 segmentation mask #2426

Merged
merged 6 commits into from
Nov 11, 2024

Conversation

anthonytorlucci
Copy link
Contributor

@anthonytorlucci anthonytorlucci commented Oct 26, 2024

Pull Request Template

Checklist

  • Confirmed that run-checks all script has been executed.
  • Made sure the book is up to date with changes in this PR.

TODO: Should add a new example to dataset illustrating new_segmentation_with_items() method for the ImageFolderDataset.

Related Issues/PRs

Changes

Implemented necessary components for SegmentationMask to be used with ImageFolderDataset.

Testing

Tests mimic the tests of the multilabel classification. New images have been added to the tests/data directory. These images are small 8 x 8 pixel images created with a python script.

Copy link

codecov bot commented Oct 26, 2024

Codecov Report

Attention: Patch coverage is 93.12977% with 9 lines in your changes missing coverage. Please review.

Project coverage is 82.81%. Comparing base (6fe3ff5) to head (e9cb8d3).
Report is 65 commits behind head on main.

Files with missing lines Patch % Lines
crates/burn-dataset/src/vision/image_folder.rs 93.12% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2426      +/-   ##
==========================================
- Coverage   84.88%   82.81%   -2.08%     
==========================================
  Files         769      809      +40     
  Lines       98557   104320    +5763     
==========================================
+ Hits        83660    86391    +2731     
- Misses      14897    17929    +3032     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@laggui
Copy link
Member

laggui commented Oct 29, 2024

Thanks for taking the lead on this 😄

I've been busy with the latest release but I should be able to review the changes this week!

@laggui laggui self-requested a review October 29, 2024 12:12
@anthonytorlucci
Copy link
Contributor Author

Thanks for taking the lead on this 😄

I've been busy with the latest release but I should be able to review the changes this week!

I'm happy to get it started and become a regular contributor in the future, but I don't think this PR is ready to merge. I'd be happy to take any suggestions and keep working on it including:

  • additional example in burn book
  • as part of a larger UNet example (model is the easy part; data is notoriously difficult!)

@laggui laggui marked this pull request as draft October 29, 2024 13:09
@laggui
Copy link
Member

laggui commented Oct 29, 2024

I'm happy to get it started and become a regular contributor in the future, but I don't think this PR is ready to merge.

No worries! In this case, I converted this to a draft PR and when you're ready just mark it as ready for review and/or ping me 🙂

I'd be happy to take any suggestions and keep working on it including:

  • additional example in burn book
  • as part of a larger UNet example (model is the easy part; data is notoriously difficult!)

That would be awesome! If you want to keep it simple for the current PR, I would start with the dataset implementation. Then, the examples can be added in follow-up PRs 👍

@anthonytorlucci anthonytorlucci marked this pull request as ready for review October 30, 2024 22:24
@anthonytorlucci
Copy link
Contributor Author

anthonytorlucci commented Oct 30, 2024

For completeness, the python code used to generate the new images and masks is below:

"""
Synthetic Image Segmentation Generation
"""
from typing import List, Tuple
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import cv2

def save_image_to_text_file(image: np.ndarray, filename: Path) -> None:
    """Saves an image as a text file with pixel values."""
    with open(filename, 'w') as file:
        for row in image:
            file.write(' '.join(str(pixel) for pixel in row) + '\n')

def rgb_to_grayscale(rgb: Tuple[int, int, int]) -> int:
  """Converts a list of RGB values to its grayscale equivalent.

  Args:
    rgb: A list of RGB values in the range 0 - 255; [R, G, B].

  Returns:
    An integer representing the grayscale intensity.
  """
  return int(np.round(0.299 * rgb[0] + 0.587 * rgb[1] + 0.114 * rgb[2]))

def checkerboard_pattern(
    height:int,
    width:int,
    color1:Tuple[int, int, int],
    color2:Tuple[int, int, int]) -> tuple[np.ndarray, np.ndarray]:
    # Initialize the source image with zeros
    source_image = np.zeros((height, width, 3), dtype=np.uint8)

    # Initialize the mask with zeros
    mask = np.zeros((height, width, 3), dtype=np.uint8)
    m1 = 1  #rgb_to_grayscale(color1)
    m2 = 2  #rgb_to_grayscale(color2)
    # Let's create a checkerboard pattern
    for i in range(height):
        for j in range(width):
            if (i + j) % 2 == 0:
                source_image[i, j] = color1
                mask[i, j] = (m1, m1, m1)
            else:
                source_image[i, j] = color2
                mask[i, j] = (m2, m2, m2)
    return source_image, mask

def random_distribution_2colors(
    height:int,
    width:int,
    color1:Tuple[int, int, int],
    color2:Tuple[int, int, int],
    random_seed:int = 42) -> tuple[np.ndarray, np.ndarray]:
    # Initialize the source image with zeros
    source_image = np.zeros((height, width, 3), dtype=np.uint8)

    m1 = 1  #rgb_to_grayscale(color1)
    m2 = 2  #rgb_to_grayscale(color2)
    np.random.seed(random_seed)  # For reproducibility
    random_mask = np.random.choice([m1, m2], size=(height, width))
    for i in range(height):
        for j in range(width):
            if random_mask[i, j] == m1:
                source_image[i, j] = color1
            else:
                source_image[i, j] = color2
    mask = np.empty_like(source_image)
    mask[:, :, 0] = random_mask
    mask[:, :, 1] = random_mask
    mask[:, :, 2] = random_mask

    return source_image, mask

def random_distribution_3colors(
    height:int,
    width:int,
    color1:Tuple[int, int, int],
    color2:Tuple[int, int, int],
    color3:Tuple[int, int, int],
    random_seed:int = 42) -> tuple[np.ndarray, np.ndarray]:
    # Initialize the source image with zeros
    source_image = np.zeros((height, width, 3), dtype=np.uint8)

    np.random.seed(random_seed)  # For reproducibility
    m1 = 1  #rgb_to_grayscale(color1)
    m2 = 2  #rgb_to_grayscale(color2)
    m3 = 3  #rgb_to_grayscale(color3)
    random_mask = np.random.choice([m1, m2, m3], size=(height, width))
    for i in range(height):
        for j in range(width):
            if random_mask[i, j] == m1:
                source_image[i, j] = color1
            elif random_mask[i, j] == m2:
                source_image[i, j] = color2
            else:
                source_image[i, j] = color3
    mask = np.empty_like(source_image)
    mask[:, :, 0] = random_mask
    mask[:, :, 1] = random_mask
    mask[:, :, 2] = random_mask

    return source_image, mask


if __name__ == "__main__":
    # Define image dimensions
    IMAGE_HEIGHT, IMAGE_WIDTH = 8, 8

    # Define colors in RGB
    CRIMSON = (220,20,60)
    TEAL = (0,128,128)
    AQUA = (0,255,255)
    TURQUOISE = (64,224,208)
    MAGENTA = (255,0,255)
    ORCHID = (218,112,214)
    BURLY_WOOD = (222,184,135)

    # Generate checkerboard pattern
    image_chkr, mask_chkr = checkerboard_pattern(
        height=IMAGE_HEIGHT,
        width=IMAGE_WIDTH,
        color1=CRIMSON,
        color2=AQUA
    )
    image_rnd2, mask_rnd2 = random_distribution_2colors(
        height=IMAGE_HEIGHT,
        width=IMAGE_WIDTH,
        color1=MAGENTA,
        color2=TEAL,
        random_seed=42  # For reproducibility
    )
    image_rnd3, mask_rnd3 = random_distribution_3colors(
        height=IMAGE_HEIGHT,
        width=IMAGE_WIDTH,
        color1=TURQUOISE,
        color2=ORCHID,
        color3=BURLY_WOOD,
        random_seed=42  # For reproducibility
    )

    # ----- Save the results to disk ---
    results_path = Path(__file__).parent.parent.joinpath('results8x8')
    assert results_path.exists(), "The results directory does not exist. Please create it."
    # Save the image and mask to disk as PNG files
    # NOTE: opencv uses the convention Blue,Green,Red on reads and writes.
    #       convert RGB to BGR, which then gets written out in RGB (strangeness....)
    image_chkr = cv2.cvtColor(image_chkr, cv2.COLOR_RGB2BGR)
    cv2.imwrite(results_path.joinpath("image_checkerboard.png"), image_chkr)
    cv2.imwrite(results_path.joinpath("mask_checkerboard.png"), mask_chkr)
    image_rnd2 = cv2.cvtColor(image_rnd2, cv2.COLOR_RGB2BGR)
    cv2.imwrite(results_path.joinpath("image_random_2colors.png"), image_rnd2)
    cv2.imwrite(results_path.joinpath("mask_random_2colors.png"), mask_rnd2)
    image_rnd3 = cv2.cvtColor(image_rnd3, cv2.COLOR_RGB2BGR)
    cv2.imwrite(results_path.joinpath("image_random_3colors.png"), image_rnd3)
    cv2.imwrite(results_path.joinpath("mask_random_3colors.png"), mask_rnd3)

    # Save the mask array data to a column-delimited text file
    mask_chkr = mask_chkr[..., 0]  # Convert 3D mask to 2D
    save_image_to_text_file(mask_chkr, results_path.joinpath("mask_checkerboard.txt"))

    mask_rnd2 = mask_rnd2[..., 0]  # Convert 3D mask to 2D
    save_image_to_text_file(mask_rnd2, results_path.joinpath("mask_random_2colors.txt"))

    mask_rnd3 = mask_rnd3[..., 0]  # Convert 3D mask to 2D
    save_image_to_text_file(mask_rnd3, results_path.joinpath("mask_random_3colors.txt"))

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review!

On the right path 🙂 But I have a couple comments of the current implementation.

I don't think it is desirable to have to provide the segmentation masks directly when loading a dataset. For image segmentation datasets with a lot of items (and larger image sizes), this won't fit into memory. I think we should instead take paths to the masks and parse them when accessing an item.

@@ -104,7 +104,8 @@ pub struct ImageDatasetItem {
enum AnnotationRaw {
Label(String),
MultiLabel(Vec<String>),
// TODO: bounding boxes and segmentation mask
SegmentationMask(Vec<String>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to have the raw form of a segmentation mask point to the mask image path on this. That way, the masks would only be loaded when an item is fetched from the dataset.

Comment on lines 411 to 423
/// Create an image segmentation dataset with the specified items.
///
/// # Arguments
///
/// * `items` - List of dataset items, each item represented by a tuple `(image path, labels)`.
/// * `classes` - Dataset class names.
///
/// # Returns
/// A new dataset instance.
pub fn new_segmentation_with_items<P: AsRef<Path>, S: AsRef<str>>(
items: Vec<(P, SegmentationMask)>,
classes: &[S],
) -> Result<Self, ImageLoaderError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggested changes from my previous comment, it would make more sense (and probably be more practical from a user standpoint) to have a new_segmentation method that instead takes a list of (image path, mask path) pairs (so Vec<(P, P)>).

We still need the method to accept the class names, which can be used to map to a class id similar to what is already done. These identifiers would be use to map a pixel value to a class.

For added flexibility we could provide a way to have specific pixel values map to a class.

@anthonytorlucci
Copy link
Contributor Author

anthonytorlucci commented Nov 3, 2024

Suggested changes made with a few comments and questions:

  • For the current implementation, we will only consider segmentation masks that are images and assume all channels are the same, i.e. a greyscale image where each pixel value corresponds to a single class.
  • A new function was created fn image_path_to_pixel_depth which was copied from the PathToImageDatasetItem::map() function for input images. This has lead to duplicated code which I haven't addressed (not wanting to modify anything outside the segmentation scope).
  • The last line of the method pub fn new_segmentation_with_items(_) calls Self::with_items(_) which creates an InMemoryDataset. As @laggui pointed out, this could be problematic for large images or large datasets. I'm not sure what the solution is here.

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! Most of my comments have been addressed, what's left is basically changes that are relevant to the questions you asked.

Also, to answer

The last line of the method pub fn new_segmentation_with_items() calls Self::with_items() which creates an InMemoryDataset. As @laggui pointed out, this could be problematic for large images or large datasets. I'm not sure what the solution is here.

I don't think this is problematic now since the raw annotations just point to a path. So your InMemoryDataset will hold (image, annotation) path pairs only, which should not be much of an issue.

crates/burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
crates/burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
Comment on lines 216 to 226
// assume that each channel in the mask image is the same and
// each pixel in the first channel corresponds to a class.
// multi-channel image segmentation is not supported at this time.
Annotation::SegmentationMask(SegmentationMask {
mask: mask_image
.into_iter()
.enumerate()
.filter(|(i, _)| i % 3 == 0)
.map(|(_, pixel)| pixel)
.collect(),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering here will probably not be required given the suggested changes in the previous comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filtering was removed and put in the segmentation_mask_to_vec_usize. fixed in the next commit.

crates/burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
crates/burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
Comment on lines +491 to +494
pub fn new_segmentation_with_items<P: AsRef<Path>, S: AsRef<str>>(
items: Vec<(P, P)>,
classes: &[S],
) -> Result<Self, ImageLoaderError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! That's exactly what I meant, so there should not be any memory issues and users are not forced to pre-load all of their segmentation masks into memory to create a dataset 👍

crates/burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
@anthonytorlucci
Copy link
Contributor Author

It's still unclear to me at which point in the training workflow the image and annotation will be transformed to burn::tensor::Tensor<B, 4, Float> with shape [batch_size, 3, height, width] and burn::tensor::Tensor<B, 4, Int> with shape [batch_size, 1, height, width], respectively.

@laggui
Copy link
Member

laggui commented Nov 11, 2024

It's still unclear to me at which point in the training workflow the image and annotation will be transformed to burn::tensor::Tensor<B, 4, Float> with shape [batch_size, 3, height, width] and burn::tensor::Tensor<B, 4, Int> with shape [batch_size, 1, height, width], respectively.

That is usually implemented in the Batcher (see this example as reference). Though now that tensors are Sync it could also be done when retrieving a single item in the dataset, and the batcher would only concatenate the tensors.

/edit: you can also check out this post which details the workflow

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all my comments 🙏

@laggui laggui merged commit 6e71aaf into tracel-ai:main Nov 11, 2024
11 checks passed
@anthonytorlucci anthonytorlucci deleted the feat/2361-SegmentationMask branch November 11, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants