-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomize instance order and crop #38
Conversation
add functionality for randomizing crop anchor
WalkthroughThe updates across various modules in the BioGTR project focus on enhancing functionality related to data structure manipulation, especially concerning centroids and anchors. Key changes include the introduction of new properties and methods for handling centroids and anchors, conditional augmentation based on training mode, and improved bounding box calculations. These enhancements aim to streamline data handling and augment data processing capabilities in training scenarios. Changes
Recent Review DetailsConfiguration used: CodeRabbit UI Files selected for processing (5)
Additional Context UsedRuff (2)
Additional comments not posted (11)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Out of diff range and nitpick comments (3)
biogtr/datasets/microscopy_dataset.py (1)
Line range hint
86-86
: Replace lambda expression with a function definition for better readability and maintainability.- parser = lambda x: data_utils.parse_synthetic(x, source=source) + def parser(x): + return data_utils.parse_synthetic(x, source=source)biogtr/datasets/data_utils.py (1)
Line range hint
334-334
: Avoid using bareexcept
to handle exceptions explicitly.- except: + except Exception as e: + print(f"Error encountered: {e}")biogtr/data_structures.py (1)
122-122
: Validate the representation ofcentroid
in the__repr__
method.The representation of
centroid
in the__repr__
method might expose internal structure which could be sensitive. Consider overriding the__str__
method in the dictionary to provide a more controlled output.
@@ -16,6 +16,7 @@ def __init__( | |||
pred_track_id: int = -1, | |||
bbox: ArrayLike = torch.empty((0, 4)), | |||
crop: ArrayLike = torch.tensor([]), | |||
centroid: dict[str, ArrayLike] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure the default value for centroid
is consistent with its usage.
The default value for centroid
is set to None
, which might lead to issues if methods assume it always contains a dictionary. Consider initializing it as an empty dictionary {}
for safer handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad! As is initializing with the torch.empty
and torch.tensor
above since these are all mutable!
Just change the type to centroid: Optional[dict[str, ArrayLike]] = None
or centroid: dict[str, ArrayLike] | None = None
(newer python syntax)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@talmo, you're absolutely right. Using Optional[dict[str, ArrayLike]] = None
or centroid: dict[str, ArrayLike] | None = None
(for Python 3.10 and newer) in the type hint clearly indicates that the centroid
can be None
. This approach is safer and more explicit, especially in a dynamic setting where the centroid
might not always be provided. This change also aligns with best practices in Python to avoid mutable default arguments which can lead to less predictable behaviors. Thank you for pointing this out!
New Learnings Added
User: talmo
PR: talmolab/biogtr#38
File: biogtr/data_structures.py:19-19
Timestamp: 2024-05-06T19:13:29.738Z
Learning: The `centroid` attribute in the `Instance` class should be initialized with `None` and explicitly allowed to be `None` through type hints using `Optional` or the union operator `|`. This avoids issues associated with mutable default arguments.
Note: Learnings are effective only in the context of specific code segments. To apply general coding guidelines, please configure review instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue to go through and remove all the mutable defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
if centroid is not None: | ||
self._centroid = centroid | ||
elif self.bbox.shape[0]: | ||
y1, x1, y2, x2 = self.bbox.squeeze() | ||
self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])} | ||
else: | ||
self._centroid = {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the centroid calculation to a separate method.
The logic for calculating the centroid based on the bounding box is repeated in the constructor and the pose
setter. Consider extracting this into a separate method to adhere to the DRY principle.
+ def calculate_centroid(self):
+ y1, x1, y2, x2 = self.bbox.squeeze()
+ return {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}
- y1, x1, y2, x2 = self.bbox.squeeze()
- self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}
+ self._centroid = self.calculate_centroid()
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if centroid is not None: | |
self._centroid = centroid | |
elif self.bbox.shape[0]: | |
y1, x1, y2, x2 = self.bbox.squeeze() | |
self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])} | |
else: | |
self._centroid = {} | |
if centroid is not None: | |
self._centroid = centroid | |
elif self.bbox.shape[0]: | |
self._centroid = self.calculate_centroid() | |
else: | |
self._centroid = {} | |
def calculate_centroid(self): | |
y1, x1, y2, x2 = self.bbox.squeeze() | |
return {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])} |
@@ -69,16 +69,16 @@ def __init__( | |||
self.clip_length = clip_length | |||
self.crop_size = crop_size | |||
self.padding = padding | |||
self.mode = mode | |||
self.mode = mode.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent use of string literals for mode comparison.
- if augmentations and self.mode == "train":
+ if augmentations and self.mode == 'train':
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
self.mode = mode.lower() | |
self.mode = mode.lower() |
@@ -74,16 +74,17 @@ def __init__( | |||
self.clip_length = clip_length | |||
self.crop_size = crop_size | |||
self.padding = padding | |||
self.mode = mode | |||
self.mode = mode.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent use of string literals for mode comparison.
- if augmentations and self.mode == "train":
+ if augmentations and self.mode == 'train':
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
self.mode = mode.lower() | |
self.mode = mode.lower() |
@@ -75,18 +75,18 @@ def __init__( | |||
self.crop_size = crop_size | |||
self.chunk = chunk | |||
self.clip_length = clip_length | |||
self.mode = mode | |||
self.mode = mode.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent use of string literals for mode comparison.
- if augmentations and self.mode == "train":
+ if augmentations and self.mode == 'train':
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
self.mode = mode.lower() | |
self.mode = mode.lower() |
As noted in #31, the model may be overfitting to the order in which the instance reid_features are fed into the model. Especially with our dataset logic in microscopy where we just iterate over the track ids this could be especially problematic. Thus in this PR we:
We also add a functionality for random anchor selection when we crop to test if we can generalize the model to missing anchors
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
mode
attribute handling by converting it to lowercase across multiple dataset classes to ensure consistent behavior.