Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

Add color profile to training #22

Merged
merged 11 commits into from
Jan 3, 2024
Merged

Add color profile to training #22

merged 11 commits into from
Jan 3, 2024

Conversation

jonasteuwen
Copy link
Contributor

@jonasteuwen jonasteuwen commented Nov 22, 2023

This is the PR that allows ICC profiles to be used during training.

  • ICC profile is applied in the dataset
  • H5ImageFileWriter needs to be able to embed icc_profile
    • Add possibility to embed the icc_profile

Furthermore it also fixes #43.

@jonasteuwen jonasteuwen marked this pull request as draft November 22, 2023 12:11
@jonasteuwen jonasteuwen marked this pull request as ready for review December 8, 2023 22:52
ahcore/writers.py Outdated Show resolved Hide resolved
# This only works when the mode is 'overflow' and in 'C' order.
metadata = {
"mpp": self._mpp,
"dtype": str(batch_dtype),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing dtype from metadata causes the reader to fail in the _open_file function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two further comments:

  • Applying the ICC profiles during training works (tested with TCGA images and Openslide backend).
  • When using the CLI functionality of tiling.py, a type error in data.py is raised. Simply ignoring the pylinting with a comment is not enough. I believe adding from __future__ import annotations at the top would solve the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also observe this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!

Copy link
Member

@EricMarcus-ai EricMarcus-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Some minor comments added.

@@ -33,18 +57,23 @@ def __init__(
tile_size: tuple[int, int],
tile_overlap: tuple[int, int],
num_samples: int,
is_binary: bool = False,
is_compressed_image: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a bool, i.e., is the behavior completely the same irrespective of the compression type? Or should we make an Enum somewhere with the known compression types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be independent as the compression type is included in the binary blob.

# This only works when the mode is 'overflow' and in 'C' order.
metadata = {
"mpp": self._mpp,
"dtype": str(batch_dtype),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed? It is being used in readers, for example here:

self._dtype = self._metadata["dtype"]

We could add it with first_batch.dtype if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it back

else:
_mode = "ARRAY"
_format = "RAW"
_num_channels = first_batch.shape[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a typo here, should be first_batch.shape[1], right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's correct as it's the conversion of a PIL Image to an array. If it would be a tensor it would indeed be channels first:

import PIL.Image
import numpy as np
print(np.asarray(PIL.Image.open("image.jpg")).shape)

(1635, 1966, 3)

@AjeyPaiK
Copy link
Member

@jonasteuwen.. this PR also fixes {#16}. It would be nice to add it in the PR description.

AjeyPaiK and others added 2 commits December 21, 2023 18:24
Made this commit to make ahcore compatible with upstream changes made to dlup.
Copy link
Member

@AjeyPaiK AjeyPaiK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonasteuwen jonasteuwen merged commit 172384a into main Jan 3, 2024
2 checks passed
@AjeyPaiK AjeyPaiK deleted the feature/use-icc-profile branch May 24, 2024 13:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

H5FileImageWriter metadata is wrong
4 participants