-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated migration #8543
Updated migration #8543
Conversation
WalkthroughThe changes in the migration file Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
cvat/apps/engine/migrations/0084_honeypot_support.py (1)
Line range hint
118-146
: Handle exceptions in reverse migrationrevert_m2m_for_related_files
The reverse migration raises an exception if any
RelatedFile
has more than one associatedImage
. This could prevent rolling back the migration in certain cases.
- Provide a clear message and guidance on how to resolve the issue if the exception is raised.
- Consider whether it's feasible to programmatically resolve or merge multiple images into a single
primary_image
or adjust the data model accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- cvat/apps/engine/migrations/0084_honeypot_support.py (3 hunks)
🧰 Additional context used
🔇 Additional comments (6)
cvat/apps/engine/migrations/0084_honeypot_support.py (6)
248-251
: Ensure data integrity during the migrationWhen performing data cleanup in migrations, it's crucial to handle exceptions and ensure that the database remains in a consistent state if an error occurs.
Wrap the data manipulation code in a transaction to ensure atomicity:
from django.db import transaction @transaction.atomic def cleanup_invalid_data(apps): # Existing code
Line range hint
98-112
: Check ManyToMany initialization for related files and imagesThe function
init_m2m_for_related_files
populates the intermediate table for the ManyToMany relationship betweenRelatedFile
andImage
. Ensure that:
- The bulk creation handles all existing
RelatedFile
instances with a non-nullprimary_image
.- Data integrity is maintained, and there are no duplicate entries.
Consider adding logging or progress indicators if the dataset is large to monitor the migration progress.
Line range hint
188-241
: Review field choices and default values in new modelsIn the
ValidationParams
andValidationLayout
models:
- Ensure that the
choices
formode
andframe_selection_method
fields accurately reflect all valid options.- Verify that fields like
random_seed
,frame_count
, andframe_share
handlenull
values appropriately.
Line range hint
68-93
: Verify correct initialization of validation layoutsThe
init_validation_layout_in_tasks_with_gt_job
function initializesValidationLayout
instances. Ensure that:
- The
frames
field is correctly calculated usingget_segment_rel_frame_set
.- All possible
db_segment.type
values are handled appropriately inget_segment_rel_frame_set
.To confirm, you can run:
#!/bin/bash # Description: Verify all segment types are accounted for in get_segment_rel_frame_set. # Expect: No unhandled segment types. ast-grep --lang python --pattern ' def get_segment_rel_frame_set($_) -> $_: $_ else: raise ValueError($_) ' 0084_honeypot_support.py
Line range hint
252-261
: Update references due to changes inRelatedFile
model fieldsThe
primary_image
field is removed, and animages
ManyToMany field withrelated_name="related_files"
is added to theRelatedFile
model. Ensure that all code referencingprimary_image
is updated to use the new relationship.Run this script to identify potential code that needs updating:
#!/bin/bash # Description: Find all references to 'primary_image' in the codebase. # Expect: All references should be reviewed and updated. rg --type python 'primary_image' cvat/apps/
61-64
: Ensure at least one ground truth job remains per taskThe loop removes ground truth jobs until only one remains. However, without safeguards, there's a risk of accidentally removing all ground truth jobs for a task if
groups[task_id]
becomes empty due to unexpected data conditions.Consider adding a check to ensure that the loop stops when one job remains:
while len(groups[task_id]) > 1: # Existing deletion logicTo verify that the function behaves correctly, run the following script:
migrations.RunPython( | ||
cleanup_invalid_data, | ||
reverse_code=migrations.RunPython.noop, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a meaningful reverse migration for cleanup_invalid_data
Currently, reverse_code
is set to migrations.RunPython.noop
, which means there is no operation to reverse the data changes made by cleanup_invalid_data
. This could be problematic if a rollback is necessary.
Consider implementing a reverse function that can restore the deleted ground truth jobs if possible or document clearly why a reverse migration is not feasible due to data constraints.
def cleanup_invalid_data(apps): | ||
Task = apps.get_model("engine", "Task") | ||
Job = apps.get_model("engine", "Job") | ||
|
||
broken_tasks = Task.objects.annotate( | ||
ground_truth_jobs_count=Count( | ||
'segment__job', filter=Q(segment__job__type='ground_truth') | ||
) | ||
).filter(ground_truth_jobs_count__gt=1).values_list('segment__task__id', flat=True) | ||
gt_jobs = Job.objects.filter( | ||
segment__task__id__in=broken_tasks | ||
).filter(type='ground_truth').order_by('created_date').all() | ||
|
||
groups = defaultdict(list) | ||
for gt_job in gt_jobs: | ||
groups[gt_job.segment.task.id].append(gt_job) | ||
|
||
for task_id in groups: | ||
while len(groups[task_id]) > 1: | ||
gt_job = groups[task_id].pop() | ||
|
||
assert gt_job.type == 'ground_truth' | ||
gt_job.delete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize database queries for better performance
The cleanup_invalid_data
function retrieves and processes data in a way that may not be optimal for large datasets.
- Use
iterator()
with a suitablechunk_size
when dealing with large querysets to avoid loading all objects into memory at once. - Consider combining queries or using subqueries to minimize database hits.
For example:
gt_jobs = (
Job.objects.filter(
segment__task__id__in=broken_tasks,
type='ground_truth'
)
.order_by('created_date')
.iterator(chunk_size=1000)
)
Replace assert
with explicit exception handling for reliability
In the cleanup_invalid_data
function, using assert
statements can be bypassed if Python is run with optimizations (-O
flag), as assertions are removed. It's safer to use explicit exception handling to ensure that the check is always performed.
Apply this diff to replace the assert
statement with a conditional raise:
while len(groups[task_id]) > 1:
gt_job = groups[task_id].pop()
- assert gt_job.type == 'ground_truth'
+ if gt_job.type != 'ground_truth':
+ raise ValueError(f"Expected gt_job.type to be 'ground_truth', got {gt_job.type}")
gt_job.delete()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def cleanup_invalid_data(apps): | |
Task = apps.get_model("engine", "Task") | |
Job = apps.get_model("engine", "Job") | |
broken_tasks = Task.objects.annotate( | |
ground_truth_jobs_count=Count( | |
'segment__job', filter=Q(segment__job__type='ground_truth') | |
) | |
).filter(ground_truth_jobs_count__gt=1).values_list('segment__task__id', flat=True) | |
gt_jobs = Job.objects.filter( | |
segment__task__id__in=broken_tasks | |
).filter(type='ground_truth').order_by('created_date').all() | |
groups = defaultdict(list) | |
for gt_job in gt_jobs: | |
groups[gt_job.segment.task.id].append(gt_job) | |
for task_id in groups: | |
while len(groups[task_id]) > 1: | |
gt_job = groups[task_id].pop() | |
assert gt_job.type == 'ground_truth' | |
gt_job.delete() | |
def cleanup_invalid_data(apps): | |
Task = apps.get_model("engine", "Task") | |
Job = apps.get_model("engine", "Job") | |
broken_tasks = Task.objects.annotate( | |
ground_truth_jobs_count=Count( | |
'segment__job', filter=Q(segment__job__type='ground_truth') | |
) | |
).filter(ground_truth_jobs_count__gt=1).values_list('segment__task__id', flat=True) | |
gt_jobs = Job.objects.filter( | |
segment__task__id__in=broken_tasks | |
).filter(type='ground_truth').order_by('created_date').all() | |
groups = defaultdict(list) | |
for gt_job in gt_jobs: | |
groups[gt_job.segment.task.id].append(gt_job) | |
for task_id in groups: | |
while len(groups[task_id]) > 1: | |
gt_job = groups[task_id].pop() | |
if gt_job.type != 'ground_truth': | |
raise ValueError(f"Expected gt_job.type to be 'ground_truth', got {gt_job.type}") | |
gt_job.delete() |
@SpecLad applied proposed changes |
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8543 +/- ##
===========================================
- Coverage 74.30% 74.23% -0.07%
===========================================
Files 400 400
Lines 43218 43218
Branches 3909 3909
===========================================
- Hits 32114 32085 -29
- Misses 11104 11133 +29
|
Motivation and context
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit