-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/2361 segmentation mask #2426
Feat/2361 segmentation mask #2426
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2426 +/- ##
==========================================
- Coverage 84.88% 82.81% -2.08%
==========================================
Files 769 809 +40
Lines 98557 104320 +5763
==========================================
+ Hits 83660 86391 +2731
- Misses 14897 17929 +3032 ☔ View full report in Codecov by Sentry. |
Thanks for taking the lead on this 😄 I've been busy with the latest release but I should be able to review the changes this week! |
I'm happy to get it started and become a regular contributor in the future, but I don't think this PR is ready to merge. I'd be happy to take any suggestions and keep working on it including:
|
No worries! In this case, I converted this to a draft PR and when you're ready just mark it as ready for review and/or ping me 🙂
That would be awesome! If you want to keep it simple for the current PR, I would start with the dataset implementation. Then, the examples can be added in follow-up PRs 👍 |
For completeness, the python code used to generate the new images and masks is below: """
Synthetic Image Segmentation Generation
"""
from typing import List, Tuple
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import cv2
def save_image_to_text_file(image: np.ndarray, filename: Path) -> None:
"""Saves an image as a text file with pixel values."""
with open(filename, 'w') as file:
for row in image:
file.write(' '.join(str(pixel) for pixel in row) + '\n')
def rgb_to_grayscale(rgb: Tuple[int, int, int]) -> int:
"""Converts a list of RGB values to its grayscale equivalent.
Args:
rgb: A list of RGB values in the range 0 - 255; [R, G, B].
Returns:
An integer representing the grayscale intensity.
"""
return int(np.round(0.299 * rgb[0] + 0.587 * rgb[1] + 0.114 * rgb[2]))
def checkerboard_pattern(
height:int,
width:int,
color1:Tuple[int, int, int],
color2:Tuple[int, int, int]) -> tuple[np.ndarray, np.ndarray]:
# Initialize the source image with zeros
source_image = np.zeros((height, width, 3), dtype=np.uint8)
# Initialize the mask with zeros
mask = np.zeros((height, width, 3), dtype=np.uint8)
m1 = 1 #rgb_to_grayscale(color1)
m2 = 2 #rgb_to_grayscale(color2)
# Let's create a checkerboard pattern
for i in range(height):
for j in range(width):
if (i + j) % 2 == 0:
source_image[i, j] = color1
mask[i, j] = (m1, m1, m1)
else:
source_image[i, j] = color2
mask[i, j] = (m2, m2, m2)
return source_image, mask
def random_distribution_2colors(
height:int,
width:int,
color1:Tuple[int, int, int],
color2:Tuple[int, int, int],
random_seed:int = 42) -> tuple[np.ndarray, np.ndarray]:
# Initialize the source image with zeros
source_image = np.zeros((height, width, 3), dtype=np.uint8)
m1 = 1 #rgb_to_grayscale(color1)
m2 = 2 #rgb_to_grayscale(color2)
np.random.seed(random_seed) # For reproducibility
random_mask = np.random.choice([m1, m2], size=(height, width))
for i in range(height):
for j in range(width):
if random_mask[i, j] == m1:
source_image[i, j] = color1
else:
source_image[i, j] = color2
mask = np.empty_like(source_image)
mask[:, :, 0] = random_mask
mask[:, :, 1] = random_mask
mask[:, :, 2] = random_mask
return source_image, mask
def random_distribution_3colors(
height:int,
width:int,
color1:Tuple[int, int, int],
color2:Tuple[int, int, int],
color3:Tuple[int, int, int],
random_seed:int = 42) -> tuple[np.ndarray, np.ndarray]:
# Initialize the source image with zeros
source_image = np.zeros((height, width, 3), dtype=np.uint8)
np.random.seed(random_seed) # For reproducibility
m1 = 1 #rgb_to_grayscale(color1)
m2 = 2 #rgb_to_grayscale(color2)
m3 = 3 #rgb_to_grayscale(color3)
random_mask = np.random.choice([m1, m2, m3], size=(height, width))
for i in range(height):
for j in range(width):
if random_mask[i, j] == m1:
source_image[i, j] = color1
elif random_mask[i, j] == m2:
source_image[i, j] = color2
else:
source_image[i, j] = color3
mask = np.empty_like(source_image)
mask[:, :, 0] = random_mask
mask[:, :, 1] = random_mask
mask[:, :, 2] = random_mask
return source_image, mask
if __name__ == "__main__":
# Define image dimensions
IMAGE_HEIGHT, IMAGE_WIDTH = 8, 8
# Define colors in RGB
CRIMSON = (220,20,60)
TEAL = (0,128,128)
AQUA = (0,255,255)
TURQUOISE = (64,224,208)
MAGENTA = (255,0,255)
ORCHID = (218,112,214)
BURLY_WOOD = (222,184,135)
# Generate checkerboard pattern
image_chkr, mask_chkr = checkerboard_pattern(
height=IMAGE_HEIGHT,
width=IMAGE_WIDTH,
color1=CRIMSON,
color2=AQUA
)
image_rnd2, mask_rnd2 = random_distribution_2colors(
height=IMAGE_HEIGHT,
width=IMAGE_WIDTH,
color1=MAGENTA,
color2=TEAL,
random_seed=42 # For reproducibility
)
image_rnd3, mask_rnd3 = random_distribution_3colors(
height=IMAGE_HEIGHT,
width=IMAGE_WIDTH,
color1=TURQUOISE,
color2=ORCHID,
color3=BURLY_WOOD,
random_seed=42 # For reproducibility
)
# ----- Save the results to disk ---
results_path = Path(__file__).parent.parent.joinpath('results8x8')
assert results_path.exists(), "The results directory does not exist. Please create it."
# Save the image and mask to disk as PNG files
# NOTE: opencv uses the convention Blue,Green,Red on reads and writes.
# convert RGB to BGR, which then gets written out in RGB (strangeness....)
image_chkr = cv2.cvtColor(image_chkr, cv2.COLOR_RGB2BGR)
cv2.imwrite(results_path.joinpath("image_checkerboard.png"), image_chkr)
cv2.imwrite(results_path.joinpath("mask_checkerboard.png"), mask_chkr)
image_rnd2 = cv2.cvtColor(image_rnd2, cv2.COLOR_RGB2BGR)
cv2.imwrite(results_path.joinpath("image_random_2colors.png"), image_rnd2)
cv2.imwrite(results_path.joinpath("mask_random_2colors.png"), mask_rnd2)
image_rnd3 = cv2.cvtColor(image_rnd3, cv2.COLOR_RGB2BGR)
cv2.imwrite(results_path.joinpath("image_random_3colors.png"), image_rnd3)
cv2.imwrite(results_path.joinpath("mask_random_3colors.png"), mask_rnd3)
# Save the mask array data to a column-delimited text file
mask_chkr = mask_chkr[..., 0] # Convert 3D mask to 2D
save_image_to_text_file(mask_chkr, results_path.joinpath("mask_checkerboard.txt"))
mask_rnd2 = mask_rnd2[..., 0] # Convert 3D mask to 2D
save_image_to_text_file(mask_rnd2, results_path.joinpath("mask_random_2colors.txt"))
mask_rnd3 = mask_rnd3[..., 0] # Convert 3D mask to 2D
save_image_to_text_file(mask_rnd3, results_path.joinpath("mask_random_3colors.txt")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review!
On the right path 🙂 But I have a couple comments of the current implementation.
I don't think it is desirable to have to provide the segmentation masks directly when loading a dataset. For image segmentation datasets with a lot of items (and larger image sizes), this won't fit into memory. I think we should instead take paths to the masks and parse them when accessing an item.
@@ -104,7 +104,8 @@ pub struct ImageDatasetItem { | |||
enum AnnotationRaw { | |||
Label(String), | |||
MultiLabel(Vec<String>), | |||
// TODO: bounding boxes and segmentation mask | |||
SegmentationMask(Vec<String>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to have the raw form of a segmentation mask point to the mask image path on this. That way, the masks would only be loaded when an item is fetched from the dataset.
/// Create an image segmentation dataset with the specified items. | ||
/// | ||
/// # Arguments | ||
/// | ||
/// * `items` - List of dataset items, each item represented by a tuple `(image path, labels)`. | ||
/// * `classes` - Dataset class names. | ||
/// | ||
/// # Returns | ||
/// A new dataset instance. | ||
pub fn new_segmentation_with_items<P: AsRef<Path>, S: AsRef<str>>( | ||
items: Vec<(P, SegmentationMask)>, | ||
classes: &[S], | ||
) -> Result<Self, ImageLoaderError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the suggested changes from my previous comment, it would make more sense (and probably be more practical from a user standpoint) to have a new_segmentation
method that instead takes a list of (image path, mask path)
pairs (so Vec<(P, P)>
).
We still need the method to accept the class names, which can be used to map to a class id similar to what is already done. These identifiers would be use to map a pixel value to a class.
For added flexibility we could provide a way to have specific pixel values map to a class.
Suggested changes made with a few comments and questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! Most of my comments have been addressed, what's left is basically changes that are relevant to the questions you asked.
Also, to answer
The last line of the method pub fn new_segmentation_with_items() calls Self::with_items() which creates an InMemoryDataset. As @laggui pointed out, this could be problematic for large images or large datasets. I'm not sure what the solution is here.
I don't think this is problematic now since the raw annotations just point to a path. So your InMemoryDataset
will hold (image, annotation) path pairs only, which should not be much of an issue.
// assume that each channel in the mask image is the same and | ||
// each pixel in the first channel corresponds to a class. | ||
// multi-channel image segmentation is not supported at this time. | ||
Annotation::SegmentationMask(SegmentationMask { | ||
mask: mask_image | ||
.into_iter() | ||
.enumerate() | ||
.filter(|(i, _)| i % 3 == 0) | ||
.map(|(_, pixel)| pixel) | ||
.collect(), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering here will probably not be required given the suggested changes in the previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filtering was removed and put in the segmentation_mask_to_vec_usize
. fixed in the next commit.
pub fn new_segmentation_with_items<P: AsRef<Path>, S: AsRef<str>>( | ||
items: Vec<(P, P)>, | ||
classes: &[S], | ||
) -> Result<Self, ImageLoaderError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! That's exactly what I meant, so there should not be any memory issues and users are not forced to pre-load all of their segmentation masks into memory to create a dataset 👍
It's still unclear to me at which point in the training workflow the image and annotation will be transformed to burn::tensor::Tensor<B, 4, Float> with shape [batch_size, 3, height, width] and burn::tensor::Tensor<B, 4, Int> with shape [batch_size, 1, height, width], respectively. |
That is usually implemented in the /edit: you can also check out this post which details the workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all my comments 🙏
Pull Request Template
Checklist
run-checks all
script has been executed.TODO: Should add a new example to dataset illustrating
new_segmentation_with_items()
method for theImageFolderDataset
.Related Issues/PRs
Changes
Implemented necessary components for
SegmentationMask
to be used withImageFolderDataset
.Testing
Tests mimic the tests of the multilabel classification. New images have been added to the
tests/data
directory. These images are small 8 x 8 pixel images created with a python script.