Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safer video loading from SLP #119

Merged
merged 8 commits into from
Sep 29, 2024
Merged

Safer video loading from SLP #119

merged 8 commits into from
Sep 29, 2024

Conversation

talmo
Copy link
Contributor

@talmo talmo commented Sep 28, 2024

This PR adds more flexibility and a safer path for loading SLP files in cases where the video files might not be accessible.

  • Catches more I/O errors when checking if the video is accessible (addresses PermissionError when loading a .slp file pointing to a video in /root #116).
  • Explicit control over whether video files should be opened when loading labels with:sio.load_slp(..., open_videos=False)
  • Explicit control over whether backend is auto-initialized when creating or using Video objects with Video(..., open_backend=False).
  • More sanitization of filenames to posix/forward-slash safe forms when reading and writing SLP files.
  • Added sio.io.utils.is_file_accessible to check for readability by actually reading a byte. This catches permission and other esoteric filesystem errors.

Copy link
Contributor

coderabbitai bot commented Sep 28, 2024

Warning

Rate limit exceeded

@talmo has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 42 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Files that changed from the base of the PR and between 8e179ca and 0487ebc.

Walkthrough

The changes introduce additional parameters to several functions in the SLEAP library, specifically for managing video backend operations. The make_video, read_videos, and read_labels functions now include an open_backend parameter, while the Video class has been updated to support this new functionality. These modifications allow users to control whether the video backend is opened during various input/output operations, enhancing the flexibility of video handling within the library.

Changes

Files Change Summary
sleap_io/io/slp.py Modified make_video, read_videos, and read_labels functions to include open_backend and open_videos parameters. Logic restructured to conditionally open video backend based on these parameters. Added sanitize_filename helper function.
sleap_io/model/video.py Updated Video class to include open_backend parameter. Adjusted logic in __attrs_post_init__ and open methods to handle backend opening behavior based on this parameter.
tests/io/test_slp.py Introduced test_lazy_video_read and test_video_path_resolution functions to test behavior of read_labels with respect to video handling and path resolution.
sleap_io/io/nwb.py Added convert_predictions_to_dataframe function to convert prediction data from a Labels object into a structured Pandas DataFrame.
sleap_io/io/utils.py Introduced is_file_accessible function to check file accessibility. Removed convert_predictions_to_dataframe function.
tests/model/test_video.py Updated test_video_open_close to include new parameters and added test_open_backend_preference to check behavior with open_backend set to False.

Possibly related PRs

  • Add support for embedding images in .pkg.slp #91: The changes in sleap_io/io/slp.py regarding the make_video and read_videos functions are related to the modifications in the same file, which also involve video handling and embedding functionalities.
  • Fix suggestions deserialization #95: The updates to the read_labels function in sleap_io/io/slp.py to handle exceptions are relevant as they align with the changes made to the read_labels function in the main PR, which also involves video backend handling.
  • Labels.split and Labels.make_training_splits #98: The modifications in sleap_io/io/slp.py related to embedding frames and the read_labels function are connected to the changes in the main PR, which also enhances video handling and metadata management.

Poem

🐇 In a world of hops and playful cheer,
New options bloom, oh so near!
With videos open or closed with care,
Flexibility dances in the air.
SLEAP's magic grows, bright and bold,
A rabbit's joy in changes told! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Base automatically changed from talmo/drop-av to main September 28, 2024 23:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
sleap_io/io/main.py (2)

10-22: LGTM! Consider a minor docstring improvement.

The changes to the load_slp function look good and align well with the PR objectives. The new open_videos parameter provides the desired flexibility for safer video loading.

Consider adding a brief example in the docstring to illustrate how to use the open_videos parameter:

def load_slp(filename: str, open_videos: bool = True) -> Labels:
    """Load a SLEAP dataset.

    Args:
        filename: Path to a SLEAP labels file (`.slp`).
        open_videos: If `True` (the default), attempt to open the video backend for
            I/O. If `False`, the backend will not be opened (useful for reading metadata
            when the video files are not available).

    Returns:
        The dataset as a `Labels` object.

    Example:
        # Load dataset without opening video backend
        labels = load_slp("path/to/dataset.slp", open_videos=False)
    """
    return slp.read_labels(filename, open_videos=open_videos)

This addition would provide users with a clear example of how to use the new parameter.


Line range hint 168-203: Update load_file function to support open_videos parameter

The load_file function should be updated to support the new open_videos parameter when loading SLP files. This ensures consistency and provides the same functionality when using this generic loading function.

Please update the load_file function as follows:

 def load_file(
     filename: str | Path, format: Optional[str] = None, **kwargs
 ) -> Union[Labels, Video]:
     """Load a file and return the appropriate object.
 
     Args:
         filename: Path to a file.
         format: Optional format to load as. If not provided, will be inferred from the
             file extension. Available formats are: "slp", "nwb", "labelstudio", "jabs"
             and "video".
+        open_videos: If loading an SLP file, controls whether to open the video backend.
+            Defaults to True. See `load_slp` for more details.
 
     Returns:
         A `Labels` or `Video` object.
     """
     if isinstance(filename, Path):
         filename = filename.as_posix()
 
     if format is None:
         if filename.endswith(".slp"):
             format = "slp"
         elif filename.endswith(".nwb"):
             format = "nwb"
         elif filename.endswith(".json"):
             format = "json"
         elif filename.endswith(".h5"):
             format = "jabs"
         else:
             for vid_ext in Video.EXTS:
                 if filename.endswith(vid_ext):
                     format = "video"
                     break
         if format is None:
             raise ValueError(f"Could not infer format from filename: '{filename}'.")
 
     if filename.endswith(".slp"):
-        return load_slp(filename, **kwargs)
+        open_videos = kwargs.pop('open_videos', True)
+        return load_slp(filename, open_videos=open_videos, **kwargs)
     elif filename.endswith(".nwb"):
         return load_nwb(filename, **kwargs)
     elif filename.endswith(".json"):
         return load_labelstudio(filename, **kwargs)
     elif filename.endswith(".h5"):
         return load_jabs(filename, **kwargs)
     elif format == "video":
         return load_video(filename, **kwargs)

This change ensures that the open_videos parameter is properly handled when loading SLP files through the load_file function, maintaining consistency with the new load_slp functionality.

sleap_io/io/slp.py (1)

1022-1029: Consider aligning parameter names for consistency

In the read_labels function, the parameter is named open_videos, whereas in read_videos and make_video, it's named open_backend. For consistency and clarity, consider using the same parameter name across these functions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7e91d04 and f314fe6.

📒 Files selected for processing (2)
  • sleap_io/io/main.py (1 hunks)
  • sleap_io/io/slp.py (6 hunks)
🧰 Additional context used
🪛 Ruff
sleap_io/io/slp.py

120-120: Do not use bare except

(E722)

Comment on lines 131 to 138
def read_videos(labels_path: str, open_backend: bool = False) -> list[Video]:
"""Read `Video` dataset in a SLEAP labels file.

Args:
labels_path: A string path to the SLEAP labels file.
open_backend: If `True` (the default), attempt to open the video backend for
I/O. If `False`, the backend will not be opened (useful for reading metadata
when the video files are not available).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the docstring for open_backend default value

In the read_videos function, the parameter open_backend defaults to False, but the docstring states that it defaults to True. Please update the docstring to reflect the correct default value.

Apply this diff to fix the docstring:

         Args:
             labels_path: A string path to the SLEAP labels file.
-            open_backend: If `True` (the default), attempt to open the video backend for
-                I/O. If `False`, the backend will not be opened (useful for reading metadata
-                when the video files are not available).
+            open_backend: If `True`, attempt to open the video backend for
+                I/O. If `False` (the default), the backend will not be opened (useful for reading metadata
+                when the video files are not available).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def read_videos(labels_path: str, open_backend: bool = False) -> list[Video]:
"""Read `Video` dataset in a SLEAP labels file.
Args:
labels_path: A string path to the SLEAP labels file.
open_backend: If `True` (the default), attempt to open the video backend for
I/O. If `False`, the backend will not be opened (useful for reading metadata
when the video files are not available).
def read_videos(labels_path: str, open_backend: bool = False) -> list[Video]:
"""Read `Video` dataset in a SLEAP labels file.
Args:
labels_path: A string path to the SLEAP labels file.
open_backend: If `True`, attempt to open the video backend for
I/O. If `False` (the default), the backend will not be opened (useful for reading metadata
when the video files are not available).

sleap_io/io/slp.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
sleap_io/io/slp.py (1)

1042-1055: LGTM with a minor suggestion: Consider renaming parameter for consistency

The changes to the read_labels function are well-implemented and consistent with the updates in the read_videos function. The new open_videos parameter is properly documented and correctly passed to read_videos, enhancing the flexibility of video loading as per the PR objectives.

However, for better consistency across the module, consider renaming the parameter from open_videos to open_backend to match the naming used in other functions like read_videos and make_video.

Here's a suggested change:

-def read_labels(labels_path: str, open_videos: bool = True) -> Labels:
+def read_labels(labels_path: str, open_backend: bool = True) -> Labels:
    """Read a SLEAP labels file.

    Args:
        labels_path: A string path to the SLEAP labels file.
-        open_videos: If `True` (the default), attempt to open the video backend for
+        open_backend: If `True` (the default), attempt to open the video backend for
            I/O. If `False`, the backend will not be opened (useful for reading metadata
            when the video files are not available).

    Returns:
        The processed `Labels` object.
    """
    tracks = read_tracks(labels_path)
-    videos = read_videos(labels_path, open_backend=open_videos)
+    videos = read_videos(labels_path, open_backend=open_backend)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f314fe6 and d04fea5.

📒 Files selected for processing (3)
  • sleap_io/io/slp.py (9 hunks)
  • sleap_io/model/video.py (5 hunks)
  • tests/io/test_slp.py (1 hunks)
🧰 Additional context used
🪛 Ruff
sleap_io/io/slp.py

187-187: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

tests/io/test_slp.py

361-361: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

🔇 Additional comments (12)
sleap_io/model/video.py (7)

37-41: LGTM: Clear documentation for the new open_backend parameter.

The added documentation for the open_backend parameter is clear and concise. It effectively explains both the default behavior and when to set it to False, aligning well with the PR objectives of safer video loading.


55-55: LGTM: open_backend parameter correctly added.

The open_backend parameter is correctly added to the class with a default value of True. This implementation matches the documentation and PR objectives while maintaining backward compatibility.


61-62: LGTM: __attrs_post_init__ method correctly updated.

The __attrs_post_init__ method has been appropriately updated to check self.open_backend before attempting to open the backend. This change aligns with the new open_backend functionality and prevents automatic opening of the backend when open_backend is False.


190-196: LGTM: __getitem__ method correctly updated with clear error handling.

The __getitem__ method has been appropriately updated to check self.open_backend before attempting to open the backend. The added ValueError with clear instructions for the user is a good practice when open_backend is False and the backend is not open.


223-232: LGTM: open method signature and documentation updated correctly.

The open method has been appropriately updated with a new filename parameter. The added documentation clearly explains the purpose of this new parameter, enhancing the flexibility of the video loading functionality as per the PR objectives.


249-251: LGTM: New logic in open method correctly handles the filename parameter.

The added logic in the open method correctly handles the new filename parameter by calling self.replace_filename when a new filename is provided. The use of open=False in the replace_filename call is appropriate, as it prevents immediate opening and allows the rest of the open method to handle the opening process.


Line range hint 1-300: Overall assessment: Excellent implementation of safer and more flexible video loading.

The changes introduced in this PR successfully implement the open_backend parameter and enhance the video loading functionality of the Video class. The modifications are consistent, well-documented, and align perfectly with the PR objectives. These changes provide users with more control over when and how video backends are opened, which is particularly useful for scenarios involving metadata reading from potentially unavailable video files.

Key improvements:

  1. Addition of the open_backend parameter with clear documentation.
  2. Updated __attrs_post_init__ and __getitem__ methods to respect the open_backend setting.
  3. Enhanced open method with a new filename parameter for increased flexibility.

These changes collectively contribute to safer video loading practices and offer a more robust user experience when working with video files in the SLEAP library.

sleap_io/io/slp.py (5)

46-60: LGTM: Well-implemented filename sanitization function

The sanitize_filename function is a good addition that ensures consistent POSIX-compatible format for filenames. It handles both single filenames and lists of filenames, which adds flexibility to its usage.


Line range hint 62-146: Excellent improvements to video handling flexibility and safety

The changes to the make_video function significantly enhance its flexibility and safety:

  1. The new open_backend parameter allows users to control whether the video backend is opened, aligning with the PR objectives.
  2. The function now handles cases where video files might not be available, improving robustness.
  3. The use of sanitize_filename ensures consistent path handling across different platforms.
  4. Error handling has been improved by catching specific exceptions.

These changes make the function more versatile and safer to use in various scenarios.


Line range hint 150-169: LGTM: Consistent implementation of open_backend parameter

The changes to the read_videos function are well-implemented and consistent with the updates in the make_video function. The new open_backend parameter is properly documented and correctly passed to make_video, enhancing the flexibility of video loading as per the PR objectives.


Line range hint 173-229: LGTM: Improved filename consistency in video metadata

The changes to the video_to_dict function enhance the consistency of filename handling in video metadata. The use of sanitize_filename ensures that all filenames are converted to a canonical POSIX-compatible format, which improves cross-platform compatibility and robustness of the serialized video metadata.

🧰 Tools
🪛 Ruff

187-187: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


Line range hint 1-1130: Excellent improvements to video loading safety and flexibility

The changes in this file significantly enhance the safety and flexibility of video loading in the SLEAP library:

  1. The new sanitize_filename function ensures consistent, cross-platform filename handling.
  2. The introduction of the open_backend parameter in various functions allows users to control whether the video backend is opened, which is particularly useful when only metadata is needed.
  3. Error handling has been improved, making the code more robust.
  4. The changes are well-documented and consistently implemented across different functions.

These improvements align perfectly with the PR objectives and provide users with more control over video loading behavior. The code quality is high, and the new functionality is added without breaking existing behavior.

Great job on these enhancements!

Copy link

codecov bot commented Sep 29, 2024

Codecov Report

Attention: Patch coverage is 92.64706% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.13%. Comparing base (7e91d04) to head (0487ebc).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sleap_io/io/slp.py 83.33% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #119      +/-   ##
==========================================
+ Coverage   96.05%   96.13%   +0.08%     
==========================================
  Files          15       15              
  Lines        2000     2020      +20     
==========================================
+ Hits         1921     1942      +21     
+ Misses         79       78       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (10)
docs/index.md (1)

140-146: LGTM! The updated comments improve clarity.

The changes to the comments enhance the documentation's clarity:

  1. The updated comment for replace_filenames() more accurately describes the function's purpose.
  2. The new comment about saving labels with updated paths provides useful context.

These improvements align well with good documentation practices.

For consistency, consider adding a comment before the labels.replace_filenames() call to explain its purpose, similar to the comments before the load and save operations. For example:

# Replace video file paths with updated locations
labels.replace_filenames(prefix_map={
    "D:/data/sleap_projects": "/home/user/sleap_projects",
    "C:/Users/sleaper/Desktop/test": "/home/user/sleap_projects",
})
sleap_io/model/video.py (4)

38-42: LGTM: New open_backend parameter enhances flexibility.

The addition of the open_backend parameter allows users to control whether the backend should be automatically opened, which aligns with the PR objectives. This change provides more flexibility in handling video backends, especially when the video file might not exist.

Consider adding a brief example in the docstring to illustrate how to use open_backend=False:

"""
Example:
    # Create a Video object without opening the backend
    video = Video(filename="path/to/video.mp4", open_backend=False)
    # Manually open the backend when needed
    video.open()
"""

191-197: LGTM: Improved error handling in __getitem__ method.

The new error handling in the __getitem__ method respects the open_backend flag and provides clear instructions to the user when the backend is not open. This change enhances the user experience and aligns with the PR objectives.

Consider slightly rewording the error message for clarity:

raise ValueError(
    "Video backend is not open. Either call video.open() manually "
    "or set video.open_backend to True for automatic opening on frame read."
)

201-218: LGTM: Enhanced file accessibility check in exists method.

The exists method now uses is_file_accessible, which improves safety by ensuring that the file can be read, not just that it exists. This change aligns with the PR objectives of enhancing safety in video loading.

Consider optimizing the loop for checking multiple files using all():

if isinstance(self.filename, list):
    if check_all:
        return all(is_file_accessible(f) for f in self.filename)
    else:
        return is_file_accessible(self.filename[0])
return is_file_accessible(self.filename)

This change would make the code more concise and potentially more efficient for large lists of files.

🧰 Tools
🪛 Ruff

212-215: Use return all(is_file_accessible(f) for f in self.filename) instead of for loop

Replace with return all(is_file_accessible(f) for f in self.filename)

(SIM110)


227-227: LGTM: Enhanced flexibility in open method with new filename parameter.

The addition of the filename parameter to the open method allows for changing the filename when opening the video, providing more flexibility in handling video files. This change aligns with the PR objectives of improving video loading functionality.

Consider updating the method's docstring to include information about the new filename parameter:

def open(
    self,
    filename: Optional[str] = None,
    dataset: Optional[str] = None,
    grayscale: Optional[str] = None,
    keep_open: bool = True,
):
    """Open the video backend for reading.

    Args:
        filename: Optional new filename to set before opening. If provided,
            this will update the video's filename before opening the backend.
        dataset: Name of dataset in HDF5 file.
        grayscale: Whether to force grayscale. If None, autodetect on first frame
            load.
        keep_open: Whether to keep the video reader open between calls to read
            frames. If False, will close the reader after each call. If True (the
            default), it will keep the reader open and cache it for subsequent calls
            which may enhance the performance of reading multiple frames.

    ...
    """

Also applies to: 253-255

tests/io/test_slp.py (1)

368-399: Comprehensive test for video path resolution

The test_video_path_resolution function effectively covers various scenarios for video path resolution, including:

  1. Resolving when the video is in the same directory as the labels file.
  2. Handling inaccessible video files.

The test is well-structured and uses appropriate file operations to simulate real-world scenarios.

To improve readability, consider adding comments before each test scenario to clearly separate and explain the different cases being tested.

Here's a suggested improvement for better readability:

def test_video_path_resolution(slp_real_data, tmp_path):
    # Initial setup
    labels = read_labels(slp_real_data)
    assert (
        Path(labels.video.filename).as_posix()
        == "tests/data/videos/centered_pair_low_quality.mp4"
    )
    shutil.copyfile(labels.video.filename, tmp_path / "centered_pair_low_quality.mp4")
    labels.video.replace_filename(
        "fake/path/to/centered_pair_low_quality.mp4", open=False
    )
    labels.save(tmp_path / "labels.slp")

    # Scenario 1: Resolve when the same video filename is found in the labels directory
    labels = read_labels(tmp_path / "labels.slp")
    assert (
        Path(labels.video.filename).as_posix()
        == (tmp_path / "centered_pair_low_quality.mp4").as_posix()
    )
    assert labels.video.exists()

    # Scenario 2: Fail to resolve when the video file is inaccessible
    labels.video.replace_filename("new_fake/path/to/inaccessible.mp4", open=False)
    labels.save(tmp_path / "labels2.slp")
    shutil.copyfile(
        tmp_path / "centered_pair_low_quality.mp4", tmp_path / "inaccessible.mp4"
    )
    Path(tmp_path / "inaccessible.mp4").chmod(0o000)

    labels = read_labels(tmp_path / "labels2.slp")
    assert not labels.video.exists()
    assert Path(labels.video.filename).as_posix() == "new_fake/path/to/inaccessible.mp4"

These comments help to clearly delineate the different scenarios being tested, making the test function more readable and easier to understand.

sleap_io/io/slp.py (1)

Line range hint 59-143: LGTM: Improved flexibility and error handling in make_video

The changes to the make_video function are well-implemented. The new open_backend parameter provides more flexibility, and the improved error handling for file accessibility is a great addition. The integration of the sanitize_filename function ensures consistent filename formatting.

One minor suggestion:

Consider adding more specific exception handling for the video backend creation. Instead of catching a general Exception, you could catch specific exceptions that might occur during backend creation, such as IOError or ValueError. This would provide more informative error messages and avoid masking unexpected exceptions.

-        except Exception:
+        except (IOError, ValueError) as e:
+            print(f"Error creating video backend: {e}")
             backend = None
tests/model/test_video.py (1)

163-163: Remove or utilize the unused variable img

The variable img is assigned but never used. If the sole purpose is to trigger backend loading by accessing a frame, consider omitting the assignment or assigning to _ to indicate the intentional disregard of the value.

Apply one of the following diffs to address the issue:

Option 1:

-    img = video[0]
+    video[0]

Option 2:

-    img = video[0]
+    _ = video[0]
🧰 Tools
🪛 Ruff

163-163: Local variable img is assigned to but never used

Remove assignment to unused variable img

(F841)

sleap_io/io/nwb.py (2)

32-48: Improve docstring for clarity and completeness

Consider enhancing the docstring by providing more detailed descriptions for the parameters and return value. Specifically:

  • In the Args section, clarify that labels should be a Labels object containing predicted instances.
  • In the Returns section, provide more precise information about the structure of the returned pd.DataFrame, including details about the hierarchical columns and index.

89-95: Add comments to explain complex DataFrame transformations

The chained pandas operations for transforming labels_df into labels_tidy_df involve several non-trivial steps. Adding inline comments to describe each operation (e.g., set_index, unstack, swaplevel, sort_index) will improve code readability and help future maintainers understand the data manipulation process.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d04fea5 and 8e179ca.

📒 Files selected for processing (7)
  • docs/index.md (1 hunks)
  • sleap_io/io/nwb.py (1 hunks)
  • sleap_io/io/slp.py (10 hunks)
  • sleap_io/io/utils.py (2 hunks)
  • sleap_io/model/video.py (6 hunks)
  • tests/io/test_slp.py (2 hunks)
  • tests/model/test_video.py (3 hunks)
🧰 Additional context used
🪛 Ruff
sleap_io/io/slp.py

184-184: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

sleap_io/model/video.py

212-215: Use return all(is_file_accessible(f) for f in self.filename) instead of for loop

Replace with return all(is_file_accessible(f) for f in self.filename)

(SIM110)

tests/io/test_slp.py

361-361: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

tests/model/test_video.py

96-96: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


154-154: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


163-163: Local variable img is assigned to but never used

Remove assignment to unused variable img

(F841)


165-165: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

🔇 Additional comments (12)
docs/index.md (1)

137-138: LGTM! This change aligns with the PR objectives.

The addition of the open_videos=False parameter to sio.load_file() provides explicit control over whether video files should be opened during label loading, as mentioned in the PR objectives. This enhancement improves flexibility and safety when working with label files, especially in scenarios where video files may not be accessible.

sleap_io/model/video.py (3)

12-12: LGTM: New import enhances file accessibility checks.

The addition of is_file_accessible from sleap_io.io.utils aligns with the PR objectives of improving safety in video loading.


62-62: LGTM: Updated post-init logic respects open_backend flag.

The modification to the __attrs_post_init__ method ensures that the backend is only opened automatically if open_backend is True and the video exists. This change aligns with the new open_backend parameter's functionality.


Line range hint 1-324: Overall assessment: Excellent improvements to video loading safety and flexibility.

The changes in this file significantly enhance the SLEAP library's video loading capabilities:

  1. The new open_backend parameter provides users with explicit control over backend initialization.
  2. Enhanced error handling in the __getitem__ method improves the user experience.
  3. The exists method now uses is_file_accessible, ensuring safer file checks.
  4. The open method's new filename parameter increases flexibility in handling video files.

These changes align well with the PR objectives of improving safety and flexibility in video loading. The minor suggestions provided in the review comments could further enhance the code's clarity and efficiency.

🧰 Tools
🪛 Ruff

212-215: Use return all(is_file_accessible(f) for f in self.filename) instead of for loop

Replace with return all(is_file_accessible(f) for f in self.filename)

(SIM110)

sleap_io/io/slp.py (5)

43-57: LGTM: Well-implemented filename sanitization function

The sanitize_filename function is a great addition. It handles both single filenames and lists, ensures POSIX-compatible format, and maintains cross-platform compatibility. The implementation is clean and well-documented.


Line range hint 147-166: LGTM: Consistent update to read_videos function

The changes to the read_videos function are well-implemented and consistent with the modifications made to make_video. The new open_backend parameter is correctly passed to make_video, and the default value of True maintains backward compatibility. The updated docstring accurately reflects the new functionality.


Line range hint 180-221: LGTM: Consistent filename sanitization in video_to_dict

The changes to the video_to_dict function are well-implemented. The use of the sanitize_filename function ensures consistent filename formatting across different video types. This improvement enhances the robustness of the function and maintains consistency with the rest of the codebase.

🧰 Tools
🪛 Ruff

184-184: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


Line range hint 1039-1080: LGTM: Consistent update to read_labels function

The changes to the read_labels function are well-implemented and consistent with the modifications made to read_videos. The new open_videos parameter is correctly passed to read_videos, and the default value of True maintains backward compatibility. The updated docstring accurately reflects the new functionality.


Line range hint 1-1139: Overall: Excellent improvements to SLEAP I/O functionality

The changes in this file significantly enhance the flexibility and robustness of SLEAP I/O operations. Key improvements include:

  1. Addition of the sanitize_filename function for consistent filename handling.
  2. New open_backend and open_videos parameters in various functions, allowing more control over video backend operations.
  3. Improved error handling and file accessibility checks.

These changes maintain backward compatibility while adding valuable new features. The code style is consistent, and the documentation has been updated appropriately.

Great job on these improvements!

tests/model/test_video.py (1)

162-164: Verify the effect of setting video.open_backend post-initialization

Setting video.open_backend = True after the Video object has been instantiated with open_backend=False may not reopen the backend automatically. Ensure that the Video class supports dynamic changes to the open_backend attribute, or explicitly call video.open() to open the backend before accessing frames.

🧰 Tools
🪛 Ruff

163-163: Local variable img is assigned to but never used

Remove assignment to unused variable img

(F841)

sleap_io/io/utils.py (1)

6-7: Imports for type annotations added correctly

The addition of Any, Union, Optional, and Path imports is appropriate for the type annotations used in the code. This enhances code clarity and aids in static type checking.

sleap_io/io/nwb.py (1)

75-77: Good error handling for empty predictions

The check for an empty data_list and raising a ValueError if no predicted instances are found ensures that the function fails gracefully in the absence of data. This is a good practice for robust error handling.

Comment on lines +359 to +365
def test_lazy_video_read(slp_real_data):
labels = read_labels(slp_real_data)
assert type(labels.video.backend) == MediaVideo
assert labels.video.exists()

labels = read_labels(slp_real_data, open_videos=False)
assert labels.video.backend is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance test coverage and improve type checking

While the test function effectively verifies the behavior of the open_videos parameter, consider the following improvements:

  1. Use isinstance() for type checking instead of direct type comparison. This adheres to Python best practices and is more flexible.
  2. Add assertions to verify that labels are still correctly loaded when open_videos=False. This ensures that the lazy loading doesn't affect the integrity of the label data.
  3. Consider using assertIsNone() for clarity in the second assertion.

Here's a suggested improvement:

def test_lazy_video_read(slp_real_data):
    # Test default behavior
    labels = read_labels(slp_real_data)
    assert isinstance(labels.video.backend, MediaVideo)
    assert labels.video.exists()
    
    # Test lazy loading behavior
    lazy_labels = read_labels(slp_real_data, open_videos=False)
    assert lazy_labels.video.backend is None
    
    # Verify that labels are still correctly loaded
    assert len(lazy_labels) == len(labels)
    assert lazy_labels.skeleton == labels.skeleton
    # Add more assertions to verify the integrity of lazy-loaded labels

This refactored version improves type checking, clarifies the assertions, and adds a placeholder for additional verification of lazy-loaded labels.

🧰 Tools
🪛 Ruff

361-361: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

if video.backend is None:
return {"filename": video.filename, "backend": video.backend_metadata}
return {"filename": video_filename, "backend": video.backend_metadata}

if type(video.backend) == MediaVideo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use isinstance() for type checking

Instead of using type(video.backend) == MediaVideo, it's more pythonic and flexible to use isinstance() for type checking. This allows for subclass comparisons and is the recommended approach in Python.

Apply this change:

-    if type(video.backend) == MediaVideo:
+    if isinstance(video.backend, MediaVideo):

Make similar changes for other type checks in this function (e.g., for HDF5Video and ImageVideo).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if type(video.backend) == MediaVideo:
if isinstance(video.backend, MediaVideo):
🧰 Tools
🪛 Ruff

184-184: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

Comment on lines +94 to +97
video.open(centered_pair_frame_paths)
assert video.shape == (3, 384, 384, 1)
assert type(video.backend) == ImageVideo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Verify the usage of video.open() with arguments

The video.open() method is being called with centered_pair_frame_paths as an argument. Ensure that the open() method accepts centered_pair_frame_paths as a parameter. If the intention is to replace the video source with new frame paths, consider using video.replace_filename(centered_pair_frame_paths) followed by video.open().

🧰 Tools
🪛 Ruff

96-96: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

def test_open_backend_preference(centered_pair_low_quality_path):
video = Video(centered_pair_low_quality_path)
assert video.is_open
assert type(video.backend) == MediaVideo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use isinstance() for type checking

Consistent with best practices, use isinstance() for type checking rather than direct type comparison with type().

Apply this diff to fix the issue:

-    assert type(video.backend) == MediaVideo
+    assert isinstance(video.backend, MediaVideo)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert type(video.backend) == MediaVideo
assert isinstance(video.backend, MediaVideo)
🧰 Tools
🪛 Ruff

154-154: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

video.open_backend = True
img = video[0]
assert video.is_open
assert type(video.backend) == MediaVideo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use isinstance() for type checking

As with previous instances, prefer using isinstance() for type checking to handle inheritance correctly and enhance code clarity.

Apply this diff to fix the issue:

-    assert type(video.backend) == MediaVideo
+    assert isinstance(video.backend, MediaVideo)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert type(video.backend) == MediaVideo
assert isinstance(video.backend, MediaVideo)
🧰 Tools
🪛 Ruff

165-165: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

@@ -91,6 +91,10 @@ def test_video_open_close(centered_pair_low_quality_path):
video.open(grayscale=True)
assert video.shape == (1100, 384, 384, 1)

video.open(centered_pair_frame_paths)
assert video.shape == (3, 384, 384, 1)
assert type(video.backend) == ImageVideo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use isinstance() for type checking

For type checking, it's recommended to use isinstance() instead of comparing types directly. This provides better support for inheritance and improves code readability.

Apply this diff to fix the issue:

-    assert type(video.backend) == ImageVideo
+    assert isinstance(video.backend, ImageVideo)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert type(video.backend) == ImageVideo
assert isinstance(video.backend, ImageVideo)
🧰 Tools
🪛 Ruff

96-96: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

Comment on lines +177 to +196
def is_file_accessible(filename: str | Path) -> bool:
"""Check if a file is accessible.

Args:
labels: A general label object.
filename: Path to a file.

Returns:
pd.DataFrame: A pandas data frame with the structured data with
hierarchical columns. The column hierarchy is:
"video_path",
"skeleton_name",
"track_name",
"node_name",
And it is indexed by the frames.

Raises:
ValueError: If no frames in the label objects contain predicted instances.
`True` if the file is accessible, `False` otherwise.

Notes:
This checks if the file readable by the current user by reading one byte from
the file.
"""
# Form pairs of labeled_frames and predicted instances
labeled_frames = labels.labeled_frames
all_frame_instance_tuples: Generator[
tuple[LabeledFrame, PredictedInstance], None, None
] = (
(label_frame, instance) # type: ignore
for label_frame in labeled_frames
for instance in label_frame.predicted_instances
)

# Extract the data
data_list = list()
for labeled_frame, instance in all_frame_instance_tuples:
# Traverse the nodes of the instances's skeleton
skeleton = instance.skeleton
for node in skeleton.nodes:
row_dict = dict(
frame_idx=labeled_frame.frame_idx,
x=instance.points[node].x,
y=instance.points[node].y,
score=instance.points[node].score, # type: ignore[attr-defined]
node_name=node.name,
skeleton_name=skeleton.name,
track_name=instance.track.name if instance.track else "untracked",
video_path=labeled_frame.video.filename,
)
data_list.append(row_dict)

if not data_list:
raise ValueError("No predicted instances found in labels object")

labels_df = pd.DataFrame(data_list)

# Reformat the data with columns for dict-like hierarchical data access.
index = [
"skeleton_name",
"track_name",
"node_name",
"video_path",
"frame_idx",
]

labels_tidy_df = (
labels_df.set_index(index)
.unstack(level=[0, 1, 2, 3])
.swaplevel(0, -1, axis=1) # video_path on top while x, y score on bottom
.sort_index(axis=1) # Better format for columns
.sort_index(axis=0) # Sorts by frames
)

return labels_tidy_df
filename = Path(filename)
try:
with open(filename, "rb") as f:
f.read(1)
return True
except (FileNotFoundError, PermissionError, OSError, ValueError):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify exception handling by catching OSError

Since FileNotFoundError, PermissionError, and ValueError are subclasses or related to OSError, you can simplify the exception handling by catching OSError alone.

Apply this diff to simplify the exception handling:

 def is_file_accessible(filename: str | Path) -> bool:
     ...
     try:
         with open(filename, "rb") as f:
             f.read(1)
         return True
-    except (FileNotFoundError, PermissionError, OSError, ValueError):
+    except OSError:
         return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def is_file_accessible(filename: str | Path) -> bool:
"""Check if a file is accessible.
Args:
labels: A general label object.
filename: Path to a file.
Returns:
pd.DataFrame: A pandas data frame with the structured data with
hierarchical columns. The column hierarchy is:
"video_path",
"skeleton_name",
"track_name",
"node_name",
And it is indexed by the frames.
Raises:
ValueError: If no frames in the label objects contain predicted instances.
`True` if the file is accessible, `False` otherwise.
Notes:
This checks if the file readable by the current user by reading one byte from
the file.
"""
# Form pairs of labeled_frames and predicted instances
labeled_frames = labels.labeled_frames
all_frame_instance_tuples: Generator[
tuple[LabeledFrame, PredictedInstance], None, None
] = (
(label_frame, instance) # type: ignore
for label_frame in labeled_frames
for instance in label_frame.predicted_instances
)
# Extract the data
data_list = list()
for labeled_frame, instance in all_frame_instance_tuples:
# Traverse the nodes of the instances's skeleton
skeleton = instance.skeleton
for node in skeleton.nodes:
row_dict = dict(
frame_idx=labeled_frame.frame_idx,
x=instance.points[node].x,
y=instance.points[node].y,
score=instance.points[node].score, # type: ignore[attr-defined]
node_name=node.name,
skeleton_name=skeleton.name,
track_name=instance.track.name if instance.track else "untracked",
video_path=labeled_frame.video.filename,
)
data_list.append(row_dict)
if not data_list:
raise ValueError("No predicted instances found in labels object")
labels_df = pd.DataFrame(data_list)
# Reformat the data with columns for dict-like hierarchical data access.
index = [
"skeleton_name",
"track_name",
"node_name",
"video_path",
"frame_idx",
]
labels_tidy_df = (
labels_df.set_index(index)
.unstack(level=[0, 1, 2, 3])
.swaplevel(0, -1, axis=1) # video_path on top while x, y score on bottom
.sort_index(axis=1) # Better format for columns
.sort_index(axis=0) # Sorts by frames
)
return labels_tidy_df
filename = Path(filename)
try:
with open(filename, "rb") as f:
f.read(1)
return True
except (FileNotFoundError, PermissionError, OSError, ValueError):
return False
def is_file_accessible(filename: str | Path) -> bool:
"""Check if a file is accessible.
Args:
filename: Path to a file.
Returns:
`True` if the file is accessible, `False` otherwise.
Notes:
This checks if the file readable by the current user by reading one byte from
the file.
"""
filename = Path(filename)
try:
with open(filename, "rb") as f:
f.read(1)
return True
except OSError:
return False

🛠️ Refactor suggestion

Suggestion: Use os.access to check file readability

Consider using os.access(filename, os.R_OK) to check if the file is readable by the current user without opening it. This method is more efficient and avoids potential side effects from opening files.

Apply this diff to refactor the function:

+import os

 def is_file_accessible(filename: str | Path) -> bool:
     """Check if a file is accessible.

     Args:
         filename: Path to a file.

     Returns:
         `True` if the file is accessible, `False` otherwise.

     Notes:
-        This checks if the file readable by the current user by reading one byte from
+        This checks if the file is readable by the current user using os.access.
         the file.
     """
     filename = Path(filename)
-    try:
-        with open(filename, "rb") as f:
-            f.read(1)
-        return True
-    except (FileNotFoundError, PermissionError, OSError, ValueError):
-        return False
+    return os.access(filename, os.R_OK)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def is_file_accessible(filename: str | Path) -> bool:
"""Check if a file is accessible.
Args:
labels: A general label object.
filename: Path to a file.
Returns:
pd.DataFrame: A pandas data frame with the structured data with
hierarchical columns. The column hierarchy is:
"video_path",
"skeleton_name",
"track_name",
"node_name",
And it is indexed by the frames.
Raises:
ValueError: If no frames in the label objects contain predicted instances.
`True` if the file is accessible, `False` otherwise.
Notes:
This checks if the file readable by the current user by reading one byte from
the file.
"""
# Form pairs of labeled_frames and predicted instances
labeled_frames = labels.labeled_frames
all_frame_instance_tuples: Generator[
tuple[LabeledFrame, PredictedInstance], None, None
] = (
(label_frame, instance) # type: ignore
for label_frame in labeled_frames
for instance in label_frame.predicted_instances
)
# Extract the data
data_list = list()
for labeled_frame, instance in all_frame_instance_tuples:
# Traverse the nodes of the instances's skeleton
skeleton = instance.skeleton
for node in skeleton.nodes:
row_dict = dict(
frame_idx=labeled_frame.frame_idx,
x=instance.points[node].x,
y=instance.points[node].y,
score=instance.points[node].score, # type: ignore[attr-defined]
node_name=node.name,
skeleton_name=skeleton.name,
track_name=instance.track.name if instance.track else "untracked",
video_path=labeled_frame.video.filename,
)
data_list.append(row_dict)
if not data_list:
raise ValueError("No predicted instances found in labels object")
labels_df = pd.DataFrame(data_list)
# Reformat the data with columns for dict-like hierarchical data access.
index = [
"skeleton_name",
"track_name",
"node_name",
"video_path",
"frame_idx",
]
labels_tidy_df = (
labels_df.set_index(index)
.unstack(level=[0, 1, 2, 3])
.swaplevel(0, -1, axis=1) # video_path on top while x, y score on bottom
.sort_index(axis=1) # Better format for columns
.sort_index(axis=0) # Sorts by frames
)
return labels_tidy_df
filename = Path(filename)
try:
with open(filename, "rb") as f:
f.read(1)
return True
except (FileNotFoundError, PermissionError, OSError, ValueError):
return False
import os
def is_file_accessible(filename: str | Path) -> bool:
"""Check if a file is accessible.
Args:
filename: Path to a file.
Returns:
`True` if the file is accessible, `False` otherwise.
Notes:
This checks if the file is readable by the current user using os.access.
"""
filename = Path(filename)
return os.access(filename, os.R_OK)

Comment on lines +52 to +55
(label_frame, instance) # type: ignore
for label_frame in labeled_frames
for instance in label_frame.predicted_instances
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Resolve type checking issues instead of using # type: ignore

The use of # type: ignore suggests that the type checker is encountering issues with the generator expression. To improve code quality and maintainability, consider updating the type annotations or refactoring the code to address the underlying type errors instead of suppressing them.

frame_idx=labeled_frame.frame_idx,
x=instance.points[node].x,
y=instance.points[node].y,
score=instance.points[node].score, # type: ignore[attr-defined]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid suppressing type checker warnings with # type: ignore[attr-defined]

Using # type: ignore[attr-defined] indicates that the type checker does not recognize the score attribute on instance.points[node]. To enhance type safety and code clarity, consider updating the type annotations for instance.points[node] to include the score attribute, or ensure that the Point class has the score attribute properly defined in its type hints.

Comment on lines +31 to +97
def convert_predictions_to_dataframe(labels: Labels) -> pd.DataFrame:
"""Convert predictions data to a Pandas dataframe.

Args:
labels: A general label object.

Returns:
pd.DataFrame: A pandas data frame with the structured data with
hierarchical columns. The column hierarchy is:
"video_path",
"skeleton_name",
"track_name",
"node_name",
And it is indexed by the frames.

Raises:
ValueError: If no frames in the label objects contain predicted instances.
"""
# Form pairs of labeled_frames and predicted instances
labeled_frames = labels.labeled_frames
all_frame_instance_tuples = (
(label_frame, instance) # type: ignore
for label_frame in labeled_frames
for instance in label_frame.predicted_instances
)

# Extract the data
data_list = list()
for labeled_frame, instance in all_frame_instance_tuples:
# Traverse the nodes of the instances's skeleton
skeleton = instance.skeleton
for node in skeleton.nodes:
row_dict = dict(
frame_idx=labeled_frame.frame_idx,
x=instance.points[node].x,
y=instance.points[node].y,
score=instance.points[node].score, # type: ignore[attr-defined]
node_name=node.name,
skeleton_name=skeleton.name,
track_name=instance.track.name if instance.track else "untracked",
video_path=labeled_frame.video.filename,
)
data_list.append(row_dict)

if not data_list:
raise ValueError("No predicted instances found in labels object")

labels_df = pd.DataFrame(data_list)

# Reformat the data with columns for dict-like hierarchical data access.
index = [
"skeleton_name",
"track_name",
"node_name",
"video_path",
"frame_idx",
]

labels_tidy_df = (
labels_df.set_index(index)
.unstack(level=[0, 1, 2, 3])
.swaplevel(0, -1, axis=1) # video_path on top while x, y score on bottom
.sort_index(axis=1) # Better format for columns
.sort_index(axis=0) # Sorts by frames
)

return labels_tidy_df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Include unit tests for convert_predictions_to_dataframe

To ensure that the new convert_predictions_to_dataframe function behaves as expected, consider adding unit tests that cover various scenarios, including cases with:

  • Multiple videos, skeletons, tracks, and nodes.
  • Missing or empty predicted_instances.
  • Instances without associated tracks (i.e., instance.track is None).

This will help detect any potential issues early and ensure robustness.

@talmo talmo linked an issue Sep 29, 2024 that may be closed by this pull request
@talmo talmo merged commit 391df6e into main Sep 29, 2024
9 checks passed
@talmo talmo deleted the talmo/lazier-slp-loading branch September 29, 2024 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PermissionError when loading a .slp file pointing to a video in /root
1 participant