-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace contig with region in GWSS functions #691
base: master
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using a random region instead of a random contig in at least some of the tests would be better. The function random_region_str
from tests/anoph/conftest.py
exists and there are a few examples of use in test_snp_data.py
, for instance.
Just to note that there are other places in the code unrelated to this PR where a random contig is used for a region in testing, e.g.
There might be a valid reason for choosing a random contig over a random region string, I'm not sure, but we could deal with those cases in another PR, if needs be. |
I'm getting |
…ith_default_sites()
It looks like the error is coming from cases where, e.g. in x, g123 = self.g123_gwss(
region=region,
[...]
)
# determine X axis range
x_min = x[0] But |
Noting here that
which is calculated via x = allel.moving_statistic(pos, statistic=np.mean, size=window_size) in |
It looks like the test failures for regions were happening whenever the region was sufficiently small or in the wrong place so that no sites were captured. It looks like this problem is avoided in other places by setting the random region to a fixed size, usually This does not prevent these I'm not sure what a more robust solution to this looks like yet, but it seems less related to this particular issue and more to do with the fact that functions such as |
Noting test failure for (3.12, numpy~=2.0)
There was also another failure for "tests with coverage", of the same kind previously encountered, so this is not yet resolved:
|
Plan to add support and show some form of deprecation warning for the
I suspect a similar approach might be applied in the future if we ever support multiple I reckon the current plan is to drop support for the |
Noting more failures for tests (3.12, numpy==1.26.4), which might necessitate increasing the random region_size from 5000 to 10_000, at least in these tests.
|
…_sites() and test_h12_gwss_multi_with_default_analysis()
One complication is that many, if not all, of these functions currently use positional arguments, rather than requiring keyword-only arguments. Conveniently, this might not have the usual impact in this case, because |
@ahernank I don't know yet why some tests relating to cohorts and allele frequencies have randomly started failing, e.g.
|
Thanks @leehart, I believe these are failures related to the randomness of the region selected for tests rather than any changes in cohorts -- I've re-run the tests on this PR, and they have now passed with a different region. |
Thanks @ahernank , that would make sense but I can't see where random regions come into play for I can see random site_mask, transcript, cohorts, country for |
Unfortunately, with regards to providing a deprecation path for the For example, if we kept support for the fst_gwss = ag3.fst_gwss(
contig="2L",
window_size=10_000,
cohort1_query="cohort_admin2_year == 'ML-2_Kati_colu_2014'",
cohort2_query="cohort_admin2_year == 'ML-2_Kati_gamb_2014'",
site_mask="gamb_colu",
cohort_size=10,
sample_sets="3.0",
) ...then they wouldn't get the We can't solve that by making both the new Perhaps one way forwards is to fill in the missing defaults between the first param and the next param that has a default, which in this case would mean setting defaults for window_size, cohort1_query and cohort2_query. Perhaps if a no values are specified for those parameters then we should just raise a ValueError. 🤔 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #691 +/- ##
==========================================
- Coverage 94.93% 93.94% -0.99%
==========================================
Files 44 46 +2
Lines 4541 4627 +86
==========================================
+ Hits 4311 4347 +36
- Misses 230 280 +50 ☔ View full report in Codecov by Sentry. |
@alimanfoo @jonbrenas Before I apply the same to the other 24 public functions here, is this an agreeable approach to the deprecation of params such as def fst_gwss(
self,
region: Optional[base_params.region] = None,
window_size: Optional[fst_params.window_size] = None,
cohort1_query: Optional[base_params.sample_query] = None,
cohort2_query: Optional[base_params.sample_query] = None,
sample_query_options: Optional[base_params.sample_query_options] = None,
sample_sets: Optional[base_params.sample_sets] = None,
site_mask: Optional[base_params.site_mask] = base_params.DEFAULT,
cohort_size: Optional[base_params.cohort_size] = fst_params.cohort_size_default,
min_cohort_size: Optional[
base_params.min_cohort_size
] = fst_params.min_cohort_size_default,
max_cohort_size: Optional[
base_params.max_cohort_size
] = fst_params.max_cohort_size_default,
random_seed: base_params.random_seed = 42,
inline_array: base_params.inline_array = base_params.inline_array_default,
chunks: base_params.chunks = base_params.native_chunks,
clip_min: fst_params.clip_min = 0.0,
contig: Optional[base_params.region] = None, # Deprecated
) -> Tuple[np.ndarray, np.ndarray]:
# Change this name if you ever change the behaviour of this function, to
# invalidate any previously cached data.
name = "fst_gwss_v3"
# Specify which quasi-positional args are required.
required_args = ("window_size", "cohort1_query", "cohort2_query")
# Raise an error for any missing required args.
missing_args = []
for required_arg in required_args:
if locals().get(required_arg) is None:
missing_args.append(required_arg)
if missing_args:
raise ValueError(f"Missing required arguments: {missing_args}")
# Specify which sets of alternative args are required.
required_alternative_arg_sets = (("contig", "region"),)
# Raise an error for any missing required alternative args.
missing_alt_args = []
for args_set in required_alternative_arg_sets:
# Check if all alternative arguments are missing
args_set_values = []
for arg in args_set:
args_set_values.append(locals().get(arg))
if not any(args_set_values):
missing_alt_args.append(args_set)
if missing_alt_args:
raise ValueError(
f"Missing required alternative arguments: {missing_alt_args}"
) In this case, when
Since we have to enable that type of warning, I have also included code to switch it off again, to avoid unintended side-effects of warnings showing up where we want them switched off. In the case where the user provides an unnamed value in the first position, they should see no warning. Due to the issue around these functions have positional arguments, I needed to give some of the other parameters a default value, which is checked manually, such that missing arguments would raise a For example, if ag3.fst_gwss(
"2L",
10_000,
"cohort_admin2_year == 'ML-2_Kati_colu_2014'",
site_mask="gamb_colu",
cohort_size=10,
sample_sets="3.0",
) ....then the user would see a corresponding
To avoid a cryptic
...which would otherwise be caused by code like this: ag3.fst_gwss(
window_size=10_000,
cohort1_query="cohort_admin2_year == 'ML-2_Kati_colu_2014'",
cohort2_query="cohort_admin2_year == 'ML-2_Kati_gamb_2014'",
) ...instead, the user would instead see a corresponding
Note: the code here uses |
I've changed my mind! I plan change this code to use |
I've changed my mind again! Using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @leehart, couple of suggestions...
window_size: Optional[fst_params.window_size] = None, | ||
cohort1_query: Optional[base_params.sample_query] = None, | ||
cohort2_query: Optional[base_params.sample_query] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why the type annotations of these parameters needs to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I tried to explain the reason for this in a comment above but I should revisit this to double-check.
local_vars = locals().copy() | ||
|
||
# Specify which quasi-positional args are required. | ||
required_args = ("window_size", "cohort1_query", "cohort2_query") | ||
|
||
# Raise an error for any missing required args. | ||
missing_args = [] | ||
for required_arg in required_args: | ||
if local_vars.get(required_arg) is None: | ||
missing_args.append(required_arg) | ||
if missing_args: | ||
raise ValueError(f"Missing required arguments: {missing_args}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary I don't think, if the type annotations are left the same.
required_alternative_arg_sets = (("contig", "region"),) | ||
|
||
# Raise an error for any missing required alternative args. | ||
missing_alt_args = [] | ||
for args_set in required_alternative_arg_sets: | ||
# Check if all alternative arguments are missing | ||
args_set_values = [] | ||
for arg in args_set: | ||
args_set_values.append(local_vars.get(arg)) | ||
if not any(args_set_values): | ||
missing_alt_args.append(args_set) | ||
if missing_alt_args: | ||
raise ValueError( | ||
f"Missing required alternative arguments: {missing_alt_args}" | ||
) | ||
|
||
if contig is not None: | ||
# Get the current warning filters. | ||
original_warning_filters = warnings.filters[:] | ||
|
||
# Trigger the warning. | ||
warnings.simplefilter("default", DeprecationWarning) | ||
warnings.warn( | ||
"The 'contig' parameter has been deprecated. Please use 'region' instead.", | ||
DeprecationWarning, | ||
) | ||
|
||
# Restore the original warning filters. | ||
warnings.filters = original_warning_filters | ||
|
||
# If contig and region are both given, then prefer region. | ||
region = contig if region is None else region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to handle this within a helper function. E.g., replace all of this with:
region = _handle_deprecated_contig_param(region=region, contig=contig)
del contig
The implementation if this helper function could then live in a convenient common location somewhere, and could look something like:
def _handle_deprecated_contig_param(region, contig):
if contig is None:
# User is not using the old 'contig' parameter, all good.
return region
elif region is None:
# User is using the old 'contig' parameter, raise a warning.
warnings.warn(
"The 'contig' parameter has been deprecated. Please use 'region' instead.",
DeprecationWarning,
)
# A contig is a valid region, so return the contig as the region.
return contig
else:
# User is using both 'region' and 'contig' parameters, raise an error.
raise ValueError("Found both 'region' and 'contig' parameters, please provide 'region' parameter only.")
FWIW I would just raise a warning and not try to override any warning filters. |
Resolves #375