-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of VariantLookup and related code #115
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🔭 Outside diff range comments (1)
prymer/api/coordmath.py (1)
Line range hint
6-25
: Add input validation.Length should be non-negative.
def get_closed_end(start: int, length: int) -> int: + if length < 0: + raise ValueError("Length must be non-negative") return start + length - 1
🧹 Nitpick comments (13)
tests/primer3/test_primer3_input_tag.py (1)
4-11
: Test looks good but add docstring.Test validates new properties correctly. Add docstring to explain test purpose.
def test_primer3_input_tag() -> None: + """Verify Primer3InputTag properties for sequence and global arguments.""" for tag in Primer3InputTag:
prymer/primer3/primer3_input_tag.py (1)
32-33
: Make developer note more visible.Important ordering requirement. Move to class docstring.
- # Developer note: sequence-specific tags must be specified prior to global tags + """ + Enumeration of Primer3 input tags. + + Please see the Primer3 manual for additional details: + https://primer3.org/manual.html#commandLineTags + + This class represents two categories of Primer3 input tags: + * `SEQUENCE_` tags are those that control sequence-specific attributes of Primer3 jobs. + These can be modified after each query submitted to Primer3. + Note: These tags must be specified prior to global tags. + * `PRIMER_` tags are those that describe more general parameters of Primer3 jobs. + These attributes persist across queries to Primer3 unless they are explicitly reset. + Errors in these "global" input tags are fatal. + """prymer/thermo.py (1)
51-56
: Clarify constant names to distinguish Tm methods and salt correction methods.The constants
TM_METHOD_SANTALUCIA
andSALT_CORRECTION_METHOD_SANTALUCIA
both have the value"santalucia"
. To avoid confusion, consider making the constant names more explicit or namespacing them to clearly differentiate between Tm methods and salt correction methods.prymer/model.py (1)
95-335
: Implement rich comparison methods forSpan
.Adding
__lt__
,__eq__
, and other comparison methods to theSpan
class would allow instances to be compared using standard operators, enhancing readability and usability.prymer/__init__.py (1)
1-7
: Group related imports.Organize imports by module.
-from prymer.model import MinOptMax -from prymer.model import Oligo -from prymer.model import PrimerPair -from prymer.model import Span -from prymer.model import Strand +from prymer.model import ( + MinOptMax, + Oligo, + PrimerPair, + Span, + Strand, +) from prymer.thermo import Thermo from prymer.variant import VariantLookuptests/conftest.py (1)
31-38
: Improve seq_dict fixture.Add docstring and parameterize values.
@pytest.fixture def seq_dict() -> SequenceDictionary: + """Provides a SequenceDictionary with three chromosomes for testing.""" + chr_length = 1_000_000 metadatas: list[SequenceMetadata] = [ - SequenceMetadata(name="chr1", length=1000000, index=0), - SequenceMetadata(name="chr2", length=1000000, index=1), - SequenceMetadata(name="chr3", length=1000000, index=2), + SequenceMetadata(name=f"chr{i}", length=chr_length, index=i) + for i in range(1, 4) ] return SequenceDictionary(metadatas)tests/test_thermo.py (3)
33-36
: Parametrize hairpin tests.Add multiple known hairpin structures with their expected Tm ranges.
-def test_hairpin_tm() -> None: - thermo = Thermo() - tm = thermo.hairpin_tm("CCCTTTTTAAAGGG") - assert 50 < tm < 70 +@pytest.mark.parametrize("sequence,min_tm,max_tm", [ + ("CCCTTTTTAAAGGG", 50, 70), + ("GCGCTTTTGCGC", 45, 65), + ("ATCGATTTCGAT", 40, 60), +]) +def test_hairpin_tm(sequence: str, min_tm: float, max_tm: float) -> None: + thermo = Thermo() + tm = thermo.hairpin_tm(sequence) + assert min_tm < tm < max_tm
39-45
: Add docstring.Document test purpose and expectations.
def test_homodimer_tm() -> None: + """Test homodimer Tm calculation. + + Tests: + 1. Sequence that cannot form homodimer (poly-A) + 2. Sequence that forms perfect homodimer (A10T10) + """ thermo = Thermo()
48-73
: Combine heterodimer tests.Merge related heterodimer tests for better organization.
-def test_heterodimer_tm() -> None: - thermo = Thermo() - tm1 = thermo.heterodimer_tm("AAAAAAAAAA", "CCCCCCCCCC") - tm2 = thermo.heterodimer_tm("AAAAAAAAAA", "TTTTTTTTTT") - - assert tm1 == 0 - assert tm2 > 0 - -def test_heterodimer_3p_anchored_tm() -> None: - thermo = Thermo() - tm1 = thermo.heterodimer_3p_anchored_tm("AAAAAAAAAA", "CCCCCCCCCC") - tm2 = thermo.heterodimer_3p_anchored_tm("AAAAAAAAAA", "TTTTTTTTTT") - - assert tm1 == 0 - assert tm2 > 0 +@pytest.mark.parametrize("method,seq1,seq2,expected_tm", [ + ("heterodimer_tm", "AAAAAAAAAA", "CCCCCCCCCC", 0), + ("heterodimer_tm", "AAAAAAAAAA", "TTTTTTTTTT", 1), + ("heterodimer_3p_anchored_tm", "AAAAAAAAAA", "CCCCCCCCCC", 0), + ("heterodimer_3p_anchored_tm", "AAAAAAAAAA", "TTTTTTTTTT", 1), +]) +def test_heterodimer_calculations(method: str, seq1: str, seq2: str, expected_tm: int) -> None: + """Test both regular and 3' anchored heterodimer Tm calculations.""" + thermo = Thermo() + tm = getattr(thermo, method)(seq1, seq2) + if expected_tm == 0: + assert tm == 0 + else: + assert tm > 0tests/model/test_span.py (1)
Line range hint
8-18
: Add edge cases for invalid spans.Test missing edge cases:
- Empty string refname
- Negative coordinates
- Maximum coordinate values
[ ("chr1", 0, 10, "+"), # start < 1 ("chr1", 30, 20, "-"), # start > end (" ", 0, 10, "+"), # empty refname + ("", 1, 10, "+"), # empty string refname + ("chr1", -1, 10, "+"), # negative start + ("chr1", 1, -10, "+"), # negative end + ("chr1", 2**31-1, 2**31, "+"), # max int coords ],tests/test_variant.py (1)
156-161
: Use constants for test data.Move test sequences to constants for better maintainability.
+ # Constants for test sequences + TEST_SEQUENCE = "ACGCTCGATCACCGGACTGAATGACTGACTAACTGATGCTGCGTGTCTGATGATGCTGCT" + EXPECTED_SOFT_MASKED = "aCGCTCGATCAccgGACTGAatGACTGACTaaCTGATGCTGCGTGTCTGATGATGCTGCT" + EXPECTED_HARD_MASKED = "NCGCTCGATCANNNGACTGANNGACTGACTNNCTGATGCTGCGTGTCTGATGATGCTGCT" - bases_fresh = "ACGCTCGATCACCGGACTGAATGACTGACTAACTGATGCTGCGTGTCTGATGATGCTGCT" - soft_masked = "aCGCTCGATCAccgGACTGAatGACTGACTaaCTGATGCTGCGTGTCTGATGATGCTGCT" - hard_masked = "NCGCTCGATCANNNGACTGANNGACTGACTNNCTGATGCTGCGTGTCTGATGATGCTGCT" + bases_fresh = TEST_SEQUENCE + soft_masked = EXPECTED_SOFT_MASKED + hard_masked = EXPECTED_HARD_MASKEDprymer/api/picking.py (1)
212-214
: Consider caching the heterodimer calculation.The heterodimer calculation is potentially expensive and called in a loop.
+ # Cache heterodimer calculations + if max_heterodimer_tm is not None: + heterodimer_cache = {} + for i, j, penalty, tm in pairings: lp = left_primers[i] rp = right_primers[j] if max_heterodimer_tm is not None: - if thermo.heterodimer_tm(lp.bases, rp.bases) > max_heterodimer_tm: + pair_key = (lp.bases, rp.bases) + if pair_key not in heterodimer_cache: + heterodimer_cache[pair_key] = thermo.heterodimer_tm(lp.bases, rp.bases) + if heterodimer_cache[pair_key] > max_heterodimer_tm: continue.github/workflows/tests.yml (1)
60-63
: Track documentation build re-enablement.Temporarily disabled documentation build needs tracking.
Would you like me to create an issue to track re-enabling the documentation build after refactoring?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
poetry.lock
is excluded by!**/*.lock
tests/primer3/data/miniref.variants.vcf.gz
is excluded by!**/*.gz
📒 Files selected for processing (48)
.github/workflows/tests.yml
(2 hunks).github/workflows/wheels.yml
(1 hunks)prymer/__init__.py
(1 hunks)prymer/api/__init__.py
(0 hunks)prymer/api/clustering.py
(0 hunks)prymer/api/coordmath.py
(2 hunks)prymer/api/melting.py
(0 hunks)prymer/api/minoptmax.py
(0 hunks)prymer/api/oligo.py
(0 hunks)prymer/api/oligo_like.py
(0 hunks)prymer/api/picking.py
(7 hunks)prymer/api/primer.py
(0 hunks)prymer/api/primer_pair.py
(0 hunks)prymer/api/span.py
(0 hunks)prymer/api/variant_lookup.py
(0 hunks)prymer/model.py
(1 hunks)prymer/ntthal/__init__.py
(0 hunks)prymer/offtarget/offtarget_detector.py
(2 hunks)prymer/primer3/primer3.py
(17 hunks)prymer/primer3/primer3_input.py
(3 hunks)prymer/primer3/primer3_input_tag.py
(1 hunks)prymer/primer3/primer3_parameters.py
(1 hunks)prymer/primer3/primer3_task.py
(2 hunks)prymer/primer3/primer3_weights.py
(1 hunks)prymer/thermo.py
(1 hunks)prymer/variant.py
(1 hunks)pyproject.toml
(4 hunks)tests/api/conftest.py
(0 hunks)tests/api/test_clustering.py
(0 hunks)tests/api/test_coordmath.py
(1 hunks)tests/api/test_melting.py
(0 hunks)tests/api/test_oligo_like.py
(0 hunks)tests/api/test_picking.py
(4 hunks)tests/api/test_variant_lookup.py
(0 hunks)tests/conftest.py
(2 hunks)tests/model/test_minoptmax.py
(1 hunks)tests/model/test_oligo.py
(9 hunks)tests/model/test_primer_pair.py
(2 hunks)tests/model/test_span.py
(1 hunks)tests/ntthal/test_ntthal.py
(0 hunks)tests/offtarget/test_offtarget.py
(1 hunks)tests/primer3/test_primer3.py
(11 hunks)tests/primer3/test_primer3_input.py
(1 hunks)tests/primer3/test_primer3_input_tag.py
(1 hunks)tests/primer3/test_primer3_parameters.py
(1 hunks)tests/primer3/test_primer3_task.py
(1 hunks)tests/test_thermo.py
(1 hunks)tests/test_variant.py
(1 hunks)
💤 Files with no reviewable changes (17)
- prymer/api/primer.py
- tests/api/conftest.py
- tests/api/test_melting.py
- prymer/api/oligo.py
- prymer/api/melting.py
- tests/ntthal/test_ntthal.py
- prymer/api/minoptmax.py
- prymer/api/init.py
- tests/api/test_variant_lookup.py
- prymer/ntthal/init.py
- prymer/api/variant_lookup.py
- prymer/api/clustering.py
- tests/api/test_clustering.py
- prymer/api/primer_pair.py
- tests/api/test_oligo_like.py
- prymer/api/oligo_like.py
- prymer/api/span.py
✅ Files skipped from review due to trivial changes (9)
- tests/model/test_minoptmax.py
- prymer/primer3/primer3_parameters.py
- prymer/primer3/primer3_task.py
- prymer/primer3/primer3_weights.py
- tests/primer3/test_primer3_task.py
- tests/offtarget/test_offtarget.py
- prymer/offtarget/offtarget_detector.py
- tests/primer3/test_primer3_input.py
- tests/primer3/test_primer3_parameters.py
🧰 Additional context used
🪛 GitHub Actions: tests
tests/test_variant.py
[error] 78-78: Test failure in test_simple_variant_build_with_ac_and_an: TypeError: values expected to be 1048575-tuple, given len=1
🔇 Additional comments (22)
prymer/primer3/primer3_input_tag.py (1)
34-42
: Properties look good.Clean implementation of mutually exclusive properties.
prymer/primer3/primer3.py (3)
123-123
: Verify Python version compatibility fortyping.Self
The
Self
type hint is available in Python 3.11 and later. Ensure the project requires Python 3.11+ or adjust the code for compatibility.
237-237
: Verify use ofSelf
in type hintsThe return type
Self
in__enter__
may not be supported in Python versions before 3.11. Confirm compatibility with the project's Python version.
358-367
: Refactored argument separation improves code claritySplitting
assembled_primer3_tags
intoseq_args
andglobal_args
enhances readability and organization.prymer/primer3/primer3_input.py (1)
105-105
: LGTM!Import path update aligns with module restructuring.
prymer/api/picking.py (3)
28-32
: LGTM! Import restructuring looks good.Imports consolidated from
prymer.api
toprymer.model
for better organization.
104-104
: LGTM! Well-documented parameter addition.The
thermo
parameter is properly optional with good documentation.Also applies to: 119-119, 122-123
149-151
: LGTM! Good default handling.Proper fallback to default Thermo instance when none provided.
tests/model/test_oligo.py (3)
9-11
: LGTM! Simplified imports.Direct imports from
prymer
improve clarity.
26-26
: LGTM! Better test function names.Renamed for clarity:
test_valid_primer_construction
andtest_invalid_primer_construction_raises
.Also applies to: 43-43
Line range hint
135-135
: LGTM! Clearer test data.Using string multiplication (
"A" * 40
) improves readability.tests/model/test_primer_pair.py (1)
9-12
: LGTM! Simplified imports.Direct imports from
prymer
improve clarity.tests/api/test_picking.py (2)
10-14
: LGTM! Simplified imports.Direct imports from
prymer
improve clarity.
100-100
: LGTM! Updated temperature calculations.Properly using Thermo class with adjusted expected values.
Also applies to: 341-342, 344-344, 353-353
tests/primer3/test_primer3.py (4)
9-13
: LGTM: Import path changes align with module consolidation.Also applies to: 25-25
162-162
: LGTM: More descriptive error message.
190-191
: LGTM: Consistent property access pattern.Also applies to: 237-238, 448-453, 469-471
411-412
: LGTM: Case-insensitive string comparison improves test robustness..github/workflows/wheels.yml (1)
14-15
: Verify Python version compatibility strategy.Dropping Python 3.11 support is a breaking change. Ensure this aligns with the version bump to 3.0.0 in pyproject.toml.
.github/workflows/tests.yml (1)
19-20
: LGTM: Consistent with wheels.yml Python version changes.pyproject.toml (2)
3-3
: LGTM: Version bump to 3.0.0 follows semver for breaking changes.
42-42
: Verify primer3-py compatibility.New dependency on primer3-py ^2.0.3 added.
Run this to check for known issues:
@@ -21,10 +14,10 @@ | |||
|
|||
```python | |||
>>> from pathlib import Path | |||
>>> from prymer.api.variant_lookup import VariantLookup, VariantOverlapDetector | |||
>>> from prymer import VariantLookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update import statement to reflect module restructuring
VariantLookup
should be imported from prymer.variant
instead of prymer
.
Apply this diff to fix the import:
-from prymer import VariantLookup
+from prymer.variant import VariantLookup
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
>>> from prymer import VariantLookup | |
from prymer.variant import VariantLookup |
A tuple of two sequences where the first is soft-masked and the second is hard masked | ||
""" | ||
if len(bases) != region.length: | ||
raise ValueError("Region and bases have different lengths: {region} vs. {bases}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix error message formatting in mask
method.
The ValueError
exception message in the mask
method lacks the f
prefix needed for f-string formatting. Without it, the variables region
and len(bases)
will not be interpolated into the string.
Apply this diff to correct the error message:
- raise ValueError("Region and bases have different lengths: {region} vs. {bases}")
+ raise ValueError(f"Region and bases have different lengths: {region} vs. {len(bases)}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
raise ValueError("Region and bases have different lengths: {region} vs. {bases}") | |
raise ValueError(f"Region and bases have different lengths: {region} vs. {len(bases)}") |
rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": [10]}) | ||
variants = SimpleVariant.build(rec) | ||
assert len(variants) == 1 | ||
assert variants[0].af == approx(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix AC field format causing test failure.
Pipeline error shows tuple size mismatch for AC field.
- rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": [10]})
+ rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": 10})
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": [10]}) | |
variants = SimpleVariant.build(rec) | |
assert len(variants) == 1 | |
assert variants[0].af == approx(0.1) | |
rec = builder.add("chr1", pos=1000, ref="A", alts="C", info={"AN": 100, "AC": 10}) | |
variants = SimpleVariant.build(rec) | |
assert len(variants) == 1 | |
assert variants[0].af == approx(0.1) |
🧰 Tools
🪛 GitHub Actions: tests
[error] 78-78: Test failure in test_simple_variant_build_with_ac_and_an: TypeError: values expected to be 1048575-tuple, given len=1
34c05b2
to
7d76a43
Compare
Tests will continue to fail until bumped to use a version of fgpyo that includes fulcrumgenomics/fgpyo#215