Updated migration #8543

bsekachev · 2024-10-15T10:04:12Z

Motivation and context

How has this been tested?

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues (see GitHub docs)
I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

New Features
- Introduced new validation models: ValidationParams, ValidationLayout, and ValidationFrame to enhance data validation capabilities.
- Implemented a function to clean up redundant ground truth jobs, improving data integrity.
Improvements
- Updated the RelatedFile model to strengthen its relationship with the Image model, enhancing data management.

coderabbitai · 2024-10-15T10:04:20Z

Walkthrough

The changes in the migration file 0084_honeypot_support.py introduce a new function cleanup_invalid_data for managing ground truth jobs, ensuring data integrity by deleting redundant jobs while retaining at least one per task. Additionally, three new models—ValidationParams, ValidationLayout, and ValidationFrame—are created to enhance validation capabilities. The RelatedFile model is modified to establish a ManyToMany relationship with the Image model, removing the previous ForeignKey relationship with primary_image and updating the images field for better reverse lookups.

Changes

File Path	Change Summary
cvat/apps/engine/migrations/0084_honeypot_support.py	- Added function `cleanup_invalid_data`. - Created models: `ValidationParams`, `ValidationLayout`, `ValidationFrame`. - Modified `RelatedFile` model: added ManyToMany relationship with `Image`, removed `primary_image`, updated `images` field with `related_name="related_files"`.

Poem

In the garden where data grows,
A rabbit hops where the clean stream flows.
With valid frames and layouts bright,
We tidy up jobs, making things right.
Hooray for the changes, let’s dance and play,
For a better tomorrow, hip-hip-hooray! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)

cvat/apps/engine/migrations/0084_honeypot_support.py (1)

Line range hint 118-146: Handle exceptions in reverse migration revert_m2m_for_related_files

The reverse migration raises an exception if any RelatedFile has more than one associated Image. This could prevent rolling back the migration in certain cases.

Provide a clear message and guidance on how to resolve the issue if the exception is raised.

Consider whether it's feasible to programmatically resolve or merge multiple images into a single primary_image or adjust the data model accordingly.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ac01fff and f82de29.

📒 Files selected for processing (1)

cvat/apps/engine/migrations/0084_honeypot_support.py (3 hunks)

🧰 Additional context used

🔇 Additional comments (6)

cvat/apps/engine/migrations/0084_honeypot_support.py (6)
248-251: Ensure data integrity during the migration

When performing data cleanup in migrations, it's crucial to handle exceptions and ensure that the database remains in a consistent state if an error occurs.

Wrap the data manipulation code in a transaction to ensure atomicity:
from django.db import transaction

@transaction.atomic
def cleanup_invalid_data(apps):
    # Existing code
Line range hint 98-112: Check ManyToMany initialization for related files and images

The function init_m2m_for_related_files populates the intermediate table for the ManyToMany relationship between RelatedFile and Image. Ensure that:

The bulk creation handles all existing RelatedFile instances with a non-null primary_image.

Data integrity is maintained, and there are no duplicate entries.

Consider adding logging or progress indicators if the dataset is large to monitor the migration progress.

Line range hint 188-241: Review field choices and default values in new models

In the ValidationParams and ValidationLayout models:

Ensure that the choices for mode and frame_selection_method fields accurately reflect all valid options.

Verify that fields like random_seed, frame_count, and frame_share handle null values appropriately.

Line range hint 68-93: Verify correct initialization of validation layouts

The init_validation_layout_in_tasks_with_gt_job function initializes ValidationLayout instances. Ensure that:

The frames field is correctly calculated using get_segment_rel_frame_set.

All possible db_segment.type values are handled appropriately in get_segment_rel_frame_set.

To confirm, you can run:
#!/bin/bash
# Description: Verify all segment types are accounted for in get_segment_rel_frame_set.

# Expect: No unhandled segment types.
ast-grep --lang python --pattern '
def get_segment_rel_frame_set($_) -> $_:
    $_
    else:
        raise ValueError($_)
' 0084_honeypot_support.py
Line range hint 252-261: Update references due to changes in RelatedFile model fields

The primary_image field is removed, and an images ManyToMany field with related_name="related_files" is added to the RelatedFile model. Ensure that all code referencing primary_image is updated to use the new relationship.

Run this script to identify potential code that needs updating:
#!/bin/bash
# Description: Find all references to 'primary_image' in the codebase.

# Expect: All references should be reviewed and updated.
rg --type python 'primary_image' cvat/apps/
61-64: Ensure at least one ground truth job remains per task

The loop removes ground truth jobs until only one remains. However, without safeguards, there's a risk of accidentally removing all ground truth jobs for a task if groups[task_id] becomes empty due to unexpected data conditions.

Consider adding a check to ensure that the loop stops when one job remains:
while len(groups[task_id]) > 1:
    # Existing deletion logic
To verify that the function behaves correctly, run the following script:

coderabbitai · 2024-10-15T10:14:53Z

cvat/apps/engine/migrations/0084_honeypot_support.py

+        migrations.RunPython(
+            cleanup_invalid_data,
+            reverse_code=migrations.RunPython.noop,
+        ),


⚠️ Potential issue

Provide a meaningful reverse migration for cleanup_invalid_data

Currently, reverse_code is set to migrations.RunPython.noop, which means there is no operation to reverse the data changes made by cleanup_invalid_data. This could be problematic if a rollback is necessary.

Consider implementing a reverse function that can restore the deleted ground truth jobs if possible or document clearly why a reverse migration is not feasible due to data constraints.

coderabbitai · 2024-10-15T10:14:53Z

cvat/apps/engine/migrations/0084_honeypot_support.py

+def cleanup_invalid_data(apps):
+    Task = apps.get_model("engine", "Task")
+    Job = apps.get_model("engine", "Job")
+
+    broken_tasks = Task.objects.annotate(
+        ground_truth_jobs_count=Count(
+            'segment__job', filter=Q(segment__job__type='ground_truth')
+        )
+    ).filter(ground_truth_jobs_count__gt=1).values_list('segment__task__id', flat=True)
+    gt_jobs = Job.objects.filter(
+        segment__task__id__in=broken_tasks
+    ).filter(type='ground_truth').order_by('created_date').all()
+
+    groups = defaultdict(list)
+    for gt_job in gt_jobs:
+        groups[gt_job.segment.task.id].append(gt_job)
+
+    for task_id in groups:
+        while len(groups[task_id]) > 1:
+            gt_job = groups[task_id].pop()
+
+            assert gt_job.type == 'ground_truth'
+            gt_job.delete()


🛠️ Refactor suggestion

Optimize database queries for better performance

The cleanup_invalid_data function retrieves and processes data in a way that may not be optimal for large datasets.

Use iterator() with a suitable chunk_size when dealing with large querysets to avoid loading all objects into memory at once.

Consider combining queries or using subqueries to minimize database hits.

For example:

gt_jobs = ( Job.objects.filter( segment__task__id__in=broken_tasks, type='ground_truth' ) .order_by('created_date') .iterator(chunk_size=1000) )

⚠️ Potential issue

Replace assert with explicit exception handling for reliability

In the cleanup_invalid_data function, using assert statements can be bypassed if Python is run with optimizations (-O flag), as assertions are removed. It's safer to use explicit exception handling to ensure that the check is always performed.

Apply this diff to replace the assert statement with a conditional raise:

while len(groups[task_id]) > 1: gt_job = groups[task_id].pop() - assert gt_job.type == 'ground_truth' + if gt_job.type != 'ground_truth': + raise ValueError(f"Expected gt_job.type to be 'ground_truth', got {gt_job.type}") gt_job.delete()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def cleanup_invalid_data(apps):

Task = apps.get_model("engine", "Task")

Job = apps.get_model("engine", "Job")

broken_tasks = Task.objects.annotate(

ground_truth_jobs_count=Count(

'segment__job', filter=Q(segment__job__type='ground_truth')

)

).filter(ground_truth_jobs_count__gt=1).values_list('segment__task__id', flat=True)

gt_jobs = Job.objects.filter(

segment__task__id__in=broken_tasks

).filter(type='ground_truth').order_by('created_date').all()

groups = defaultdict(list)

for gt_job in gt_jobs:

groups[gt_job.segment.task.id].append(gt_job)

for task_id in groups:

while len(groups[task_id]) > 1:

gt_job = groups[task_id].pop()

assert gt_job.type == 'ground_truth'

gt_job.delete()

def cleanup_invalid_data(apps):

Task = apps.get_model("engine", "Task")

Job = apps.get_model("engine", "Job")

broken_tasks = Task.objects.annotate(

ground_truth_jobs_count=Count(

'segment__job', filter=Q(segment__job__type='ground_truth')

)

).filter(ground_truth_jobs_count__gt=1).values_list('segment__task__id', flat=True)

gt_jobs = Job.objects.filter(

segment__task__id__in=broken_tasks

).filter(type='ground_truth').order_by('created_date').all()

groups = defaultdict(list)

for gt_job in gt_jobs:

groups[gt_job.segment.task.id].append(gt_job)

for task_id in groups:

while len(groups[task_id]) > 1:

gt_job = groups[task_id].pop()

if gt_job.type != 'ground_truth':

raise ValueError(f"Expected gt_job.type to be 'ground_truth', got {gt_job.type}")

gt_job.delete()

cvat/apps/engine/migrations/0084_honeypot_support.py

bsekachev · 2024-10-16T07:04:24Z

@SpecLad applied proposed changes

sonarqubecloud · 2024-10-16T08:30:38Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

codecov-commenter · 2024-10-16T09:11:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.23%. Comparing base (ac01fff) to head (9e4a03e).
Report is 1 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8543      +/-   ##
===========================================
- Coverage    74.30%   74.23%   -0.07%     
===========================================
  Files          400      400              
  Lines        43218    43218              
  Branches      3909     3909              
===========================================
- Hits         32114    32085      -29     
- Misses       11104    11133      +29

Components	Coverage Δ
cvat-ui	`78.66% <ø> (-0.07%)`	⬇️
cvat-server	`70.47% <ø> (-0.08%)`	⬇️

bsekachev added 2 commits October 15, 2024 13:00

Updated migration

81e9ebb

Added assert

f82de29

bsekachev requested a review from Marishka17 as a code owner October 15, 2024 10:04

Moved assert

286a089

coderabbitai bot reviewed Oct 15, 2024

View reviewed changes

Updated migration

ba24c73

bsekachev changed the title ~~[WIP] Updated migration~~ Updated migration Oct 15, 2024

SpecLad reviewed Oct 15, 2024

View reviewed changes

cvat/apps/engine/migrations/0084_honeypot_support.py Outdated Show resolved Hide resolved

cvat/apps/engine/migrations/0084_honeypot_support.py Outdated Show resolved Hide resolved

cvat/apps/engine/migrations/0084_honeypot_support.py Outdated Show resolved Hide resolved

Applied comments

9e96237

bsekachev added 2 commits October 16, 2024 11:13

Run CI

c386f8a

added missing argument

9e4a03e

SpecLad approved these changes Oct 16, 2024

View reviewed changes

bsekachev merged commit c557f70 into develop Oct 16, 2024
34 of 36 checks passed

bsekachev deleted the bs/updated_migration branch October 24, 2024 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated migration #8543

Updated migration #8543

bsekachev commented Oct 15, 2024 •

edited

Loading

coderabbitai bot commented Oct 15, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

bsekachev commented Oct 16, 2024

sonarqubecloud bot commented Oct 16, 2024

codecov-commenter commented Oct 16, 2024

Updated migration #8543

Updated migration #8543

Conversation

bsekachev commented Oct 15, 2024 • edited Loading

Motivation and context

How has this been tested?

Checklist

License

Summary by CodeRabbit

coderabbitai bot commented Oct 15, 2024 • edited Loading

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

bsekachev commented Oct 16, 2024

sonarqubecloud bot commented Oct 16, 2024

Quality Gate passed

codecov-commenter commented Oct 16, 2024

Codecov Report

bsekachev commented Oct 15, 2024 •

edited

Loading

coderabbitai bot commented Oct 15, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)