Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding check_parameter_positivity() function to seir.py #428

Open
wants to merge 24 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions flepimop/gempyor_pkg/src/gempyor/seir.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,54 @@
logger = logging.getLogger(__name__)


# TO DO: Write documentation for this function
emprzy marked this conversation as resolved.
Show resolved Hide resolved
def check_parameter_positivity(
parsed_parameters: np.ndarray,
parameter_names: list[str],
dates: pd.DatetimeIndex,
subpop_names: list[str],
) -> None:
"""
Identifies and reports earliest negative values for
parameters after modifiers have been applied.

Args:
parsed_parameters: An array of parameter values.
parameter_names: A list of the names of parameters.
dates: A pandas DatetimeIndex containing the dates.
subpop_names: A list of the names of subpopulations.

Raises:
ValueError: Negative parameter values were detected.

Returns:
None
"""
if ((parsed_parameters) < 0).any():
negative_index_parameters = np.argwhere(parsed_parameters < 0)
emprzy marked this conversation as resolved.
Show resolved Hide resolved
unique_param_sp_combinations = []
row_index = -1
redundant_rows = []
for row in negative_index_parameters:
row_index += 1
if (row[0], row[2]) in unique_param_sp_combinations:
redundant_rows.append(row_index)
if (row[0], row[2]) not in unique_param_sp_combinations:
unique_param_sp_combinations.append((row[0], row[2]))
non_redundant_negative_parameters = np.delete(
negative_index_parameters, (redundant_rows), axis=0
)

error_message = (
"The earliest date negative for each subpop and unique parameter are:\n"
)
for param_idx, day_idx, sp_idx in non_redundant_negative_parameters:
error_message += f"subpop: {subpop_names[sp_idx]}, parameter {parameter_names[param_idx]}: {dates[day_idx].date()}\n"
raise ValueError(
f"There are negative parsed-parameters, which is likely to result in incorrect integration.\n{error_message}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this error message will be quite lengthy and I'm not a fan of multi-line error messages. Is there a way that we could condense this down to one line? Maybe just error on the first negative parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I agree the message is lengthy, and it is sub-optimal for it to be multiple lines. But, I'm curious what your thoughts are on the usefulness of having an error message that only returns the first negative parameter, even if the function output knows where all of negative parameters are. Is there not a lot of added value in telling the user all of the columns that have negative values, so they can more quickly address the issue? I'm happy to change it to only show the first negative parameter value, but since the function inherently finds the others, I thought it would be useful to include.

Copy link
Contributor

@pearsonca pearsonca Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that spewing out all the errors is too much noise - multiple errors often arise from single mistakes that need correcting.

I recommend that we limit the detailed part of the error message to the earliest time, for any parameter or subpop that there is a problem. However, I also think its worthwhile to indicate the totality of the problem. Something like

There are negative parameter error(s) for config FFFF: the first at date DD-MM-YY, subpopulation XX, parameter YY.
Affected subpopulations include: {...}. Affected parameters include {...}. There are NNN total negative entries.

(possibly some of those elements can be curtailed if there are not multple subpops, etc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in that I asked @emprzy for this verbose error message (first date negative for all parameter and subpop negative) as it helps to debug configs without retrying (which, e.g for a RSV config takes 6/7 minutes), sorry Emily. Totally understand that we want to keep it light but I think a good diagnosis (e.g. a graph or something) is useful here, though perhaps not inside the simulate command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more clear the error message is the most useful it would be practically, though understand it's long to print out everything for each subpop for example. In particular I think it would be useful as much information about the specific parameters? Maybe a simplification/modification of this:

There are negative parameter errors in subpops {...}, starting from date XXXX:
parameters: eta_X0toX3_highIE*1*1*nuage18to64HR, eta_X1toX4_highIE*1*1*nuage0to17, eta_X1toX4_highIE*1*1*nuage18to64LR.... 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saraloo's suggestion seems like a reasonable compromise to me.

Minor: but maybe starting from date YYYY-MM-DD, in parameters: ... instead, notably no newline. Newlines can be annoying to format in unit tests matching on exception, see this prior version of Parameters unit tests as an example:

with pytest.raises(
ValueError,
match=(
rf"^ERROR loading file {tmp_file} for parameter sigma\:\s+the \'date\' "
rf"entries of the provided file do not include all the days specified "
rf"to be modeled by\s+the config\. the provided file includes "
rf"{(timeseries_end_date - timeseries_start_date).days + 1} days "
rf"between {timeseries_start_date}( 00\:00\:00)? to "
rf"{timeseries_end_date}( 00\:00\:00)?,\s+while there are "
rf"{mock_inputs.number_of_days()} days in the config time span of "
rf"{mock_inputs.ti}->{mock_inputs.tf}\. The file must contain entries "
rf"for the\s+the exact start and end dates from the config\. $"
),
):
mock_inputs.create_parameters_instance()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re unit test matching, I'd reiterate that its overkill to match exact messages - for example here, should only be matching first bad date string (irrespective of what's around it), subpop id (ibid), and offending parameters (ibid).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting from date YYYY-MM-DD, in parameters: ...

@TimothyWillard , does this mean you propose leaving out the subpop information and just including the parameter names that are negative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyWillard , does this mean you propose leaving out the subpop information and just including the parameter names that are negative?

My bad, no, I just was conveying an edit to a portion of @saraloo's suggestion. The full change with my edit would be:

There are negative parameter errors in subpops {...}, starting from date YYYY-MM-DD in parameters: eta_X0toX3_highIE*1*1*nuage18to64HR, eta_X1toX4_highIE*1*1*nuage0to17, eta_X1toX4_highIE*1*1*nuage18to64LR.... 

Copy link
Collaborator Author

@emprzy emprzy Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyWillard @saraloo @jcblemai
The error message now reads as follows:

ValueError: There are negative parameter errors in subpops ['56000', '44000', '30000'], starting from date 2023-03-19 in parameters ['alpha*1*1*1', 'sigma_OMICRON*1*1*1', '3*gamma*1*1*1'].

Is this what you had in mind? Happy to change it, just wanted to confirm before pushing.



def build_step_source_arg(
modinf: ModelInfo,
parsed_parameters,
Expand Down Expand Up @@ -119,6 +167,11 @@ def build_step_source_arg(
"population": modinf.subpop_pop,
"stochastic_p": modinf.stoch_traj_flag,
}

check_parameter_positivity(
fnct_args["parameters"], modinf.parameters.pnames, modinf.dates, modinf.subpop_pop
)

emprzy marked this conversation as resolved.
Show resolved Hide resolved
return fnct_args


Expand Down
73 changes: 72 additions & 1 deletion flepimop/gempyor_pkg/tests/seir/test_seir.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import pytest
import warnings
import shutil
from random import randint
import pandas as pd

import pathlib
import pyarrow as pa
Expand All @@ -16,8 +18,77 @@
os.chdir(os.path.dirname(__file__))


def test_check_parameter_positivity():

parameter_names = [
"alpha*1*1*1",
"sigma_OMICRON*1*1*1",
"3*gamma*1*1*1",
"epsilon+omegaph4*1*1*1",
"1*zeta*1*1",
"r0*gamma*theta10*1*chi_OMICRON*1",
"r0*gamma*theta9*1*chi_OMICRON*1",
"eta_X0toX3_highIE*1*1*nuage0to17",
"eta_X0toX3_highIE*1*1*nuage18to64LR",
"eta_X0toX3_highIE*1*1*nuage18to64HR",
"eta_X0toX3_highIE*1*1*nuage65to100",
]
dates = pd.date_range("2023-03-19", "2025-04-30", freq="D")
subpop_names = [
"56000",
"50000",
"11000",
"02000",
"38000",
"46000",
"10000",
"30000",
"44000",
"23000",
]

# No negative params
test_array1 = np.zeros(
(len(parameter_names) - 1, len(dates) - 1, len(subpop_names) - 1)
)

# Randomized negative params
test_array2 = np.zeros(
(len(parameter_names) - 1, len(dates) - 1, len(subpop_names) - 1)
)
for _ in range(5):
test_array2[randint(0, len(parameter_names) - 1)][randint(0, len(dates) - 1)][
randint(0, len(subpop_names) - 1)
] = -1

# Set negative params with intentional redundancy
test_array3 = np.zeros((len(parameter_names), len(dates), len(subpop_names)))
randint_first_dim = randint(0, len(parameter_names) - 1)
randint_second_dim = randint(0, len(dates) - 2)
randint_third_dim = randint(0, len(subpop_names) - 1)
test_array3[0][0][0] = -1
test_array3[
randint(0, len(parameter_names) - 1),
randint(0, len(dates) - 1) :,
randint(0, len(subpop_names) - 1),
] = -1
test_array3[randint_first_dim][randint_second_dim][randint_third_dim] = -1
test_array3[randint_first_dim][randint_second_dim + 1][randint_third_dim] = -1

seir.check_parameter_positivity(
test_array1, parameter_names, dates, subpop_names
) # NoError
emprzy marked this conversation as resolved.
Show resolved Hide resolved

with pytest.raises(ValueError):
assert seir.check_parameter_positivity(
test_array2, parameter_names, dates, subpop_names
) # ValueError
assert seir.check_parameter_positivity(
test_array3, parameter_names, dates, subpop_names
) # ValueError
emprzy marked this conversation as resolved.
Show resolved Hide resolved


def test_check_values():
os.chdir(os.path.dirname(__file__))
config.set_file(f"{DATA_DIR}/config.yml")

modinf = model_info.ModelInfo(
Expand Down
Loading