-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update in_clinvar flag in smallvariants (#2222) #2224
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA new Django management command file "in_clinvar.py" has been introduced. This command updates the "in_clinvar" field in the SmallVariant model by comparing variant coordinates with entries from the Clinvar database. It supports two update modes – a full update for a specific case and a light update at the project level. The command parses arguments (project UUID and update mode flag) and aggregates update results accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant User as CLI User
participant CMD as Command.handle()
participant RUN as run()
participant FULL as update_in_clinvar_full()
participant LIGHT as update_in_clinvar_light()
User->>CMD: Invoke command with options
CMD->>RUN: Call run(*args, **options)
alt Full update option provided
RUN->>FULL: Call update_in_clinvar_full(case)
FULL-->>RUN: Return detailed update summary
else Light update
RUN->>LIGHT: Call update_in_clinvar_light(project)
LIGHT-->>RUN: Return light update count
end
RUN-->>CMD: Aggregate update summaries
CMD-->>User: Output final summary and logs
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
deps-report 🔍Commit scanned: a8a38cc Vulnerable dependencies3 dependencies have vulnerabilities 😱
Outdated dependencies83 outdated dependencies found (including 33 outdated major versions)😢
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
backend/clinvar/management/commands/in_clinvar.py (3)
11-33
: Consider using bulk operations for better performance.The current implementation updates and saves each variant individually, which could be inefficient for large datasets. Consider using Django's bulk update operations.
Here's an example implementation using bulk updates:
def update_in_clinvar_full(case): """Update the in_clinvar field in SmallVariant with information from Clinvar database.""" variants = SmallVariant.objects.filter(case_id=case.id) result = { "true": 0, "false": 0, "total": 0, } + to_update = [] for var in variants: coords = { "release": var.release, "chromosome": var.chromosome, "start": var.start, "end": var.end, "reference": var.reference, "alternative": var.alternative, } var.in_clinvar = Clinvar.objects.filter(**coords).exists() - var.save() + to_update.append(var) result["true" if var.in_clinvar else "false"] += 1 result["total"] += 1 + + # Perform bulk update in batches of 1000 + batch_size = 1000 + for i in range(0, len(to_update), batch_size): + batch = to_update[i:i + batch_size] + SmallVariant.objects.bulk_update(batch, ['in_clinvar']) return result
97-97
: Remove commented out code.The commented out argument
--all
should be removed if it's not being used. If it's planned for future implementation, consider tracking it in an issue instead.
99-106
: Fix f-string usage and enhance logging.The f-strings on lines 103 and 106 don't contain any placeholders.
Apply this diff to fix the issues and enhance the logging:
def handle(self, *args, **options): self.stdout.write("Call options:") self.stdout.write(f"- {options['project_uuid']=}") self.stdout.write(f"- {options['full']=}") - self.stdout.write(f"Performing ...") + self.stdout.write("Performing update...") updates = run(*args, **options) self.stdout.write(f"{updates=}") - self.stdout.write(f"Done.") + self.stdout.write("Done.")🧰 Tools
🪛 Ruff (0.8.2)
103-103: f-string without any placeholders
Remove extraneous
f
prefix(F541)
106-106: f-string without any placeholders
Remove extraneous
f
prefix(F541)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/clinvar/management/commands/in_clinvar.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
backend/clinvar/management/commands/in_clinvar.py
103-103: f-string without any placeholders
Remove extraneous f
prefix
(F541)
106-106: f-string without any placeholders
Remove extraneous f
prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Python-Test
- GitHub Check: Python-Lint
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript)
- GitHub Check: build-and-push-image
🔇 Additional comments (1)
backend/clinvar/management/commands/in_clinvar.py (1)
56-77
: LGTM! Good use of transaction management.The function is well-structured with proper transaction handling using
@transaction.atomic
and clear result aggregation.
def update_in_clinvar_light(project, printer=lambda x: None): | ||
result = 0 | ||
for cv in Clinvar.objects.all(): | ||
coords = { | ||
"release": cv.release, | ||
"chromosome": cv.chromosome, | ||
"start": cv.start, | ||
"end": cv.end, | ||
"reference": cv.reference, | ||
"alternative": cv.alternative, | ||
} | ||
for case in project.case_set.all(): | ||
for var in SmallVariant.objects.filter(case_id=case.id, **coords): | ||
if var.in_clinvar: | ||
continue | ||
var.in_clinvar = True | ||
var.save() | ||
result += 1 | ||
return result | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize database queries and remove unused parameter.
The function has performance and code quality issues:
- Triple nested loops with database queries could be inefficient.
- The
printer
parameter is unused.
Consider this optimized implementation:
-def update_in_clinvar_light(project, printer=lambda x: None):
+def update_in_clinvar_light(project):
result = 0
+ batch_size = 1000
+ to_update = []
for cv in Clinvar.objects.all():
coords = {
"release": cv.release,
"chromosome": cv.chromosome,
"start": cv.start,
"end": cv.end,
"reference": cv.reference,
"alternative": cv.alternative,
}
- for case in project.case_set.all():
- for var in SmallVariant.objects.filter(case_id=case.id, **coords):
- if var.in_clinvar:
- continue
- var.in_clinvar = True
- var.save()
- result += 1
+ # Get all cases in one query
+ case_ids = project.case_set.values_list('id', flat=True)
+ # Update variants in bulk
+ variants = SmallVariant.objects.filter(
+ case_id__in=case_ids,
+ in_clinvar=False,
+ **coords
+ )
+ to_update.extend(variants)
+ result += variants.count()
+
+ # Perform bulk update when batch size is reached
+ if len(to_update) >= batch_size:
+ SmallVariant.objects.bulk_update(to_update, ['in_clinvar'])
+ to_update = []
+
+ # Update remaining variants
+ if to_update:
+ SmallVariant.objects.bulk_update(to_update, ['in_clinvar'])
return result
Committable suggestion skipped: line range outside the PR's diff.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2224 +/- ##
=====================================
Coverage 91% 91%
=====================================
Files 678 678
Lines 38537 38537
=====================================
Hits 35140 35140
Misses 3397 3397 |
Summary by CodeRabbit