Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of VariantLookup and related code #115

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Jan 17, 2025

Tests will continue to fail until bumped to use a version of fgpyo that includes fulcrumgenomics/fgpyo#215

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🔭 Outside diff range comments (1)
prymer/api/coordmath.py (1)

Line range hint 6-25: Add input validation.

Length should be non-negative.

 def get_closed_end(start: int, length: int) -> int:
+    if length < 0:
+        raise ValueError("Length must be non-negative")
     return start + length - 1
🧹 Nitpick comments (13)
tests/primer3/test_primer3_input_tag.py (1)

4-11: Test looks good but add docstring.

Test validates new properties correctly. Add docstring to explain test purpose.

 def test_primer3_input_tag() -> None:
+    """Verify Primer3InputTag properties for sequence and global arguments."""
     for tag in Primer3InputTag:
prymer/primer3/primer3_input_tag.py (1)

32-33: Make developer note more visible.

Important ordering requirement. Move to class docstring.

-    # Developer note: sequence-specific tags must be specified prior to global tags
+    """
+    Enumeration of Primer3 input tags.
+    
+    Please see the Primer3 manual for additional details:
+     https://primer3.org/manual.html#commandLineTags
+    
+    This class represents two categories of Primer3 input tags:
+     * `SEQUENCE_` tags are those that control sequence-specific attributes of Primer3 jobs.
+       These can be modified after each query submitted to Primer3.
+       Note: These tags must be specified prior to global tags.
+     * `PRIMER_` tags are those that describe more general parameters of Primer3 jobs.
+       These attributes persist across queries to Primer3 unless they are explicitly reset.
+       Errors in these "global" input tags are fatal.
+    """
prymer/thermo.py (1)

51-56: Clarify constant names to distinguish Tm methods and salt correction methods.

The constants TM_METHOD_SANTALUCIA and SALT_CORRECTION_METHOD_SANTALUCIA both have the value "santalucia". To avoid confusion, consider making the constant names more explicit or namespacing them to clearly differentiate between Tm methods and salt correction methods.

prymer/model.py (1)

95-335: Implement rich comparison methods for Span.

Adding __lt__, __eq__, and other comparison methods to the Span class would allow instances to be compared using standard operators, enhancing readability and usability.

prymer/__init__.py (1)

1-7: Group related imports.

Organize imports by module.

-from prymer.model import MinOptMax
-from prymer.model import Oligo
-from prymer.model import PrimerPair
-from prymer.model import Span
-from prymer.model import Strand
+from prymer.model import (
+    MinOptMax,
+    Oligo,
+    PrimerPair,
+    Span,
+    Strand,
+)
 from prymer.thermo import Thermo
 from prymer.variant import VariantLookup
tests/conftest.py (1)

31-38: Improve seq_dict fixture.

Add docstring and parameterize values.

 @pytest.fixture
 def seq_dict() -> SequenceDictionary:
+    """Provides a SequenceDictionary with three chromosomes for testing."""
+    chr_length = 1_000_000
     metadatas: list[SequenceMetadata] = [
-        SequenceMetadata(name="chr1", length=1000000, index=0),
-        SequenceMetadata(name="chr2", length=1000000, index=1),
-        SequenceMetadata(name="chr3", length=1000000, index=2),
+        SequenceMetadata(name=f"chr{i}", length=chr_length, index=i)
+        for i in range(1, 4)
     ]
     return SequenceDictionary(metadatas)
tests/test_thermo.py (3)

33-36: Parametrize hairpin tests.

Add multiple known hairpin structures with their expected Tm ranges.

-def test_hairpin_tm() -> None:
-    thermo = Thermo()
-    tm = thermo.hairpin_tm("CCCTTTTTAAAGGG")
-    assert 50 < tm < 70
+@pytest.mark.parametrize("sequence,min_tm,max_tm", [
+    ("CCCTTTTTAAAGGG", 50, 70),
+    ("GCGCTTTTGCGC", 45, 65),
+    ("ATCGATTTCGAT", 40, 60),
+])
+def test_hairpin_tm(sequence: str, min_tm: float, max_tm: float) -> None:
+    thermo = Thermo()
+    tm = thermo.hairpin_tm(sequence)
+    assert min_tm < tm < max_tm

39-45: Add docstring.

Document test purpose and expectations.

 def test_homodimer_tm() -> None:
+    """Test homodimer Tm calculation.
+    
+    Tests:
+    1. Sequence that cannot form homodimer (poly-A)
+    2. Sequence that forms perfect homodimer (A10T10)
+    """
     thermo = Thermo()

48-73: Combine heterodimer tests.

Merge related heterodimer tests for better organization.

-def test_heterodimer_tm() -> None:
-    thermo = Thermo()
-    tm1 = thermo.heterodimer_tm("AAAAAAAAAA", "CCCCCCCCCC")
-    tm2 = thermo.heterodimer_tm("AAAAAAAAAA", "TTTTTTTTTT")
-
-    assert tm1 == 0
-    assert tm2 > 0
-
-def test_heterodimer_3p_anchored_tm() -> None:
-    thermo = Thermo()
-    tm1 = thermo.heterodimer_3p_anchored_tm("AAAAAAAAAA", "CCCCCCCCCC")
-    tm2 = thermo.heterodimer_3p_anchored_tm("AAAAAAAAAA", "TTTTTTTTTT")
-
-    assert tm1 == 0
-    assert tm2 > 0
+@pytest.mark.parametrize("method,seq1,seq2,expected_tm", [
+    ("heterodimer_tm", "AAAAAAAAAA", "CCCCCCCCCC", 0),
+    ("heterodimer_tm", "AAAAAAAAAA", "TTTTTTTTTT", 1),
+    ("heterodimer_3p_anchored_tm", "AAAAAAAAAA", "CCCCCCCCCC", 0),
+    ("heterodimer_3p_anchored_tm", "AAAAAAAAAA", "TTTTTTTTTT", 1),
+])
+def test_heterodimer_calculations(method: str, seq1: str, seq2: str, expected_tm: int) -> None:
+    """Test both regular and 3' anchored heterodimer Tm calculations."""
+    thermo = Thermo()
+    tm = getattr(thermo, method)(seq1, seq2)
+    if expected_tm == 0:
+        assert tm == 0
+    else:
+        assert tm > 0
tests/model/test_span.py (1)

Line range hint 8-18: Add edge cases for invalid spans.

Test missing edge cases:

  • Empty string refname
  • Negative coordinates
  • Maximum coordinate values
     [
         ("chr1", 0, 10, "+"),  # start < 1
         ("chr1", 30, 20, "-"),  # start > end
         (" ", 0, 10, "+"),  # empty refname
+        ("", 1, 10, "+"),  # empty string refname
+        ("chr1", -1, 10, "+"),  # negative start
+        ("chr1", 1, -10, "+"),  # negative end
+        ("chr1", 2**31-1, 2**31, "+"),  # max int coords
     ],
tests/test_variant.py (1)

156-161: Use constants for test data.

Move test sequences to constants for better maintainability.

+    # Constants for test sequences
+    TEST_SEQUENCE = "ACGCTCGATCACCGGACTGAATGACTGACTAACTGATGCTGCGTGTCTGATGATGCTGCT"
+    EXPECTED_SOFT_MASKED = "aCGCTCGATCAccgGACTGAatGACTGACTaaCTGATGCTGCGTGTCTGATGATGCTGCT"
+    EXPECTED_HARD_MASKED = "NCGCTCGATCANNNGACTGANNGACTGACTNNCTGATGCTGCGTGTCTGATGATGCTGCT"
-    bases_fresh = "ACGCTCGATCACCGGACTGAATGACTGACTAACTGATGCTGCGTGTCTGATGATGCTGCT"
-    soft_masked = "aCGCTCGATCAccgGACTGAatGACTGACTaaCTGATGCTGCGTGTCTGATGATGCTGCT"
-    hard_masked = "NCGCTCGATCANNNGACTGANNGACTGACTNNCTGATGCTGCGTGTCTGATGATGCTGCT"
+    bases_fresh = TEST_SEQUENCE
+    soft_masked = EXPECTED_SOFT_MASKED
+    hard_masked = EXPECTED_HARD_MASKED
prymer/api/picking.py (1)

212-214: Consider caching the heterodimer calculation.

The heterodimer calculation is potentially expensive and called in a loop.

+    # Cache heterodimer calculations
+    if max_heterodimer_tm is not None:
+        heterodimer_cache = {}
+
     for i, j, penalty, tm in pairings:
         lp = left_primers[i]
         rp = right_primers[j]
 
         if max_heterodimer_tm is not None:
-            if thermo.heterodimer_tm(lp.bases, rp.bases) > max_heterodimer_tm:
+            pair_key = (lp.bases, rp.bases)
+            if pair_key not in heterodimer_cache:
+                heterodimer_cache[pair_key] = thermo.heterodimer_tm(lp.bases, rp.bases)
+            if heterodimer_cache[pair_key] > max_heterodimer_tm:
                 continue
.github/workflows/tests.yml (1)

60-63: Track documentation build re-enablement.

Temporarily disabled documentation build needs tracking.

Would you like me to create an issue to track re-enabling the documentation build after refactoring?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa21269 and 34c05b2.

⛔ Files ignored due to path filters (2)
  • poetry.lock is excluded by !**/*.lock
  • tests/primer3/data/miniref.variants.vcf.gz is excluded by !**/*.gz
📒 Files selected for processing (48)
  • .github/workflows/tests.yml (2 hunks)
  • .github/workflows/wheels.yml (1 hunks)
  • prymer/__init__.py (1 hunks)
  • prymer/api/__init__.py (0 hunks)
  • prymer/api/clustering.py (0 hunks)
  • prymer/api/coordmath.py (2 hunks)
  • prymer/api/melting.py (0 hunks)
  • prymer/api/minoptmax.py (0 hunks)
  • prymer/api/oligo.py (0 hunks)
  • prymer/api/oligo_like.py (0 hunks)
  • prymer/api/picking.py (7 hunks)
  • prymer/api/primer.py (0 hunks)
  • prymer/api/primer_pair.py (0 hunks)
  • prymer/api/span.py (0 hunks)
  • prymer/api/variant_lookup.py (0 hunks)
  • prymer/model.py (1 hunks)
  • prymer/ntthal/__init__.py (0 hunks)
  • prymer/offtarget/offtarget_detector.py (2 hunks)
  • prymer/primer3/primer3.py (17 hunks)
  • prymer/primer3/primer3_input.py (3 hunks)
  • prymer/primer3/primer3_input_tag.py (1 hunks)
  • prymer/primer3/primer3_parameters.py (1 hunks)
  • prymer/primer3/primer3_task.py (2 hunks)
  • prymer/primer3/primer3_weights.py (1 hunks)
  • prymer/thermo.py (1 hunks)
  • prymer/variant.py (1 hunks)
  • pyproject.toml (4 hunks)
  • tests/api/conftest.py (0 hunks)
  • tests/api/test_clustering.py (0 hunks)
  • tests/api/test_coordmath.py (1 hunks)
  • tests/api/test_melting.py (0 hunks)
  • tests/api/test_oligo_like.py (0 hunks)
  • tests/api/test_picking.py (4 hunks)
  • tests/api/test_variant_lookup.py (0 hunks)
  • tests/conftest.py (2 hunks)
  • tests/model/test_minoptmax.py (1 hunks)
  • tests/model/test_oligo.py (9 hunks)
  • tests/model/test_primer_pair.py (2 hunks)
  • tests/model/test_span.py (1 hunks)
  • tests/ntthal/test_ntthal.py (0 hunks)
  • tests/offtarget/test_offtarget.py (1 hunks)
  • tests/primer3/test_primer3.py (11 hunks)
  • tests/primer3/test_primer3_input.py (1 hunks)
  • tests/primer3/test_primer3_input_tag.py (1 hunks)
  • tests/primer3/test_primer3_parameters.py (1 hunks)
  • tests/primer3/test_primer3_task.py (1 hunks)
  • tests/test_thermo.py (1 hunks)
  • tests/test_variant.py (1 hunks)
💤 Files with no reviewable changes (17)
  • prymer/api/primer.py
  • tests/api/conftest.py
  • tests/api/test_melting.py
  • prymer/api/oligo.py
  • prymer/api/melting.py
  • tests/ntthal/test_ntthal.py
  • prymer/api/minoptmax.py
  • prymer/api/init.py
  • tests/api/test_variant_lookup.py
  • prymer/ntthal/init.py
  • prymer/api/variant_lookup.py
  • prymer/api/clustering.py
  • tests/api/test_clustering.py
  • prymer/api/primer_pair.py
  • tests/api/test_oligo_like.py
  • prymer/api/oligo_like.py
  • prymer/api/span.py
✅ Files skipped from review due to trivial changes (9)
  • tests/model/test_minoptmax.py
  • prymer/primer3/primer3_parameters.py
  • prymer/primer3/primer3_task.py
  • prymer/primer3/primer3_weights.py
  • tests/primer3/test_primer3_task.py
  • tests/offtarget/test_offtarget.py
  • prymer/offtarget/offtarget_detector.py
  • tests/primer3/test_primer3_input.py
  • tests/primer3/test_primer3_parameters.py
🧰 Additional context used
🪛 GitHub Actions: tests
tests/test_variant.py

[error] 78-78: Test failure in test_simple_variant_build_with_ac_and_an: TypeError: values expected to be 1048575-tuple, given len=1

🔇 Additional comments (22)
prymer/primer3/primer3_input_tag.py (1)

34-42: Properties look good.

Clean implementation of mutually exclusive properties.

prymer/primer3/primer3.py (3)

123-123: Verify Python version compatibility for typing.Self

The Self type hint is available in Python 3.11 and later. Ensure the project requires Python 3.11+ or adjust the code for compatibility.


237-237: Verify use of Self in type hints

The return type Self in __enter__ may not be supported in Python versions before 3.11. Confirm compatibility with the project's Python version.


358-367: Refactored argument separation improves code clarity

Splitting assembled_primer3_tags into seq_args and global_args enhances readability and organization.

prymer/primer3/primer3_input.py (1)

105-105: LGTM!

Import path update aligns with module restructuring.

prymer/api/picking.py (3)

28-32: LGTM! Import restructuring looks good.

Imports consolidated from prymer.api to prymer.model for better organization.


104-104: LGTM! Well-documented parameter addition.

The thermo parameter is properly optional with good documentation.

Also applies to: 119-119, 122-123


149-151: LGTM! Good default handling.

Proper fallback to default Thermo instance when none provided.

tests/model/test_oligo.py (3)

9-11: LGTM! Simplified imports.

Direct imports from prymer improve clarity.


26-26: LGTM! Better test function names.

Renamed for clarity: test_valid_primer_construction and test_invalid_primer_construction_raises.

Also applies to: 43-43


Line range hint 135-135: LGTM! Clearer test data.

Using string multiplication ("A" * 40) improves readability.

tests/model/test_primer_pair.py (1)

9-12: LGTM! Simplified imports.

Direct imports from prymer improve clarity.

tests/api/test_picking.py (2)

10-14: LGTM! Simplified imports.

Direct imports from prymer improve clarity.


100-100: LGTM! Updated temperature calculations.

Properly using Thermo class with adjusted expected values.

Also applies to: 341-342, 344-344, 353-353

tests/primer3/test_primer3.py (4)

9-13: LGTM: Import path changes align with module consolidation.

Also applies to: 25-25


162-162: LGTM: More descriptive error message.


190-191: LGTM: Consistent property access pattern.

Also applies to: 237-238, 448-453, 469-471


411-412: LGTM: Case-insensitive string comparison improves test robustness.

.github/workflows/wheels.yml (1)

14-15: Verify Python version compatibility strategy.

Dropping Python 3.11 support is a breaking change. Ensure this aligns with the version bump to 3.0.0 in pyproject.toml.

.github/workflows/tests.yml (1)

19-20: LGTM: Consistent with wheels.yml Python version changes.

pyproject.toml (2)

3-3: LGTM: Version bump to 3.0.0 follows semver for breaking changes.


42-42: Verify primer3-py compatibility.

New dependency on primer3-py ^2.0.3 added.

Run this to check for known issues:

prymer/primer3/primer3.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
@@ -21,10 +14,10 @@

```python
>>> from pathlib import Path
>>> from prymer.api.variant_lookup import VariantLookup, VariantOverlapDetector
>>> from prymer import VariantLookup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update import statement to reflect module restructuring

VariantLookup should be imported from prymer.variant instead of prymer.

Apply this diff to fix the import:

-from prymer import VariantLookup
+from prymer.variant import VariantLookup
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
>>> from prymer import VariantLookup
from prymer.variant import VariantLookup

prymer/primer3/primer3.py Outdated Show resolved Hide resolved
A tuple of two sequences where the first is soft-masked and the second is hard masked
"""
if len(bases) != region.length:
raise ValueError("Region and bases have different lengths: {region} vs. {bases}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix error message formatting in mask method.

The ValueError exception message in the mask method lacks the f prefix needed for f-string formatting. Without it, the variables region and len(bases) will not be interpolated into the string.

Apply this diff to correct the error message:

-                raise ValueError("Region and bases have different lengths: {region} vs. {bases}")
+                raise ValueError(f"Region and bases have different lengths: {region} vs. {len(bases)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise ValueError("Region and bases have different lengths: {region} vs. {bases}")
raise ValueError(f"Region and bases have different lengths: {region} vs. {len(bases)}")

prymer/model.py Outdated Show resolved Hide resolved
tests/api/test_coordmath.py Outdated Show resolved Hide resolved
tests/test_thermo.py Outdated Show resolved Hide resolved
tests/test_thermo.py Outdated Show resolved Hide resolved
Comment on lines +78 to +81
rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": [10]})
variants = SimpleVariant.build(rec)
assert len(variants) == 1
assert variants[0].af == approx(0.1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix AC field format causing test failure.

Pipeline error shows tuple size mismatch for AC field.

-    rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": [10]})
+    rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": 10})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": [10]})
variants = SimpleVariant.build(rec)
assert len(variants) == 1
assert variants[0].af == approx(0.1)
rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": 10})
variants = SimpleVariant.build(rec)
assert len(variants) == 1
assert variants[0].af == approx(0.1)
🧰 Tools
🪛 GitHub Actions: tests

[error] 78-78: Test failure in test_simple_variant_build_with_ac_and_an: TypeError: values expected to be 1048575-tuple, given len=1

@tfenne tfenne changed the base branch from main to dev January 17, 2025 23:10
@fulcrumgenomics fulcrumgenomics deleted a comment from coderabbitai bot Jan 17, 2025
@tfenne tfenne force-pushed the tf_variants_strike_back branch from 34c05b2 to 7d76a43 Compare January 18, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants