Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_summary_statistics could have option to use standard-error #1555

Open
tbhallett opened this issue Dec 12, 2024 · 7 comments · May be fixed by #1558
Open

compute_summary_statistics could have option to use standard-error #1555

tbhallett opened this issue Dec 12, 2024 · 7 comments · May be fixed by #1558
Assignees

Comments

@tbhallett
Copy link
Collaborator

Following discussion with @andrew-phillips-1 and @joehcollins it seems that some analysts and summarising uncertainty in model results using the 'standard error'.
We could implement this in the utility method compute_summary_statistics to provide ease of access. An implementation could be as follows (budding on #1457):

    if not use_standard_error:   # <--- `use_standard_error` is new boolean argument, defaulting to False
        lower_quantile = (1. - width_of_range) / 2.
        stats["lower"] = grouped_results.quantile(lower_quantile)
        stats["upper"] = grouped_results.quantile(1 - lower_quantile)
    else:
       #  Use standard error concept whereby we're using the intervals to express a 95% CI on the value of the mean. This will make width of uncertainty become narrower with more runs.
        std_deviation = grouped_results.std()
        std_error = std_deviation / np.sqrt(len(grouped_results))
        z_value = st.norm.ppf(1 - (1. - width_of_range) / 2.)  # (import scipy.stats as st)
        stats["lower"] = stats['central'] - z_value * std_error
        stats["upper"] = stats['central'] + z_value * std_error

A question for @andrew-phillips-1 -- I presume it's only appropriate to use this concept in certain circumstances - what would these be? And also, I presume we should raise an error if sometimes tries to use a summary measure other than mean?

@andrew-phillips-1
Copy link
Collaborator

Yes we would want the summary measure to be the mean or else raise an error. Since this is based on a mean over multiple runs with the same parameter values I can't think of exceptions where the mean and 95% CI would not be of value, but the analyst should obviously think for themselves also about what they are presenting they think makes sense.

@tbhallett
Copy link
Collaborator Author

Ok, when #1457 is merged, I'll add this in.

@BinglingICL
Copy link
Collaborator

Thanks very much all. This is very clear and helpful.

May I ask
(1) for each run of the same draw (i.e., the same parameter values), the population is independently sampled from the whole population? or, the symptoms/conditions are independently assigned to the same population sample?
(2) for each run, we have a result measure, such as the DALYs (which is scaled up for the whole population), are we treating it a single estimate of the whole population's health burden so that the mean of results from multiple runs represent the estimated mean health burden of the whole population and the standard error measure the variability of the estimated mean to the true mean of the population?

@tbhallett
Copy link
Collaborator Author

tbhallett commented Dec 13, 2024

May I ask (1) for each run of the same draw (i.e., the same parameter values), the population is independently sampled from the whole population? or, the symptoms/conditions are independently assigned to the same population sample?

Yes, different random-number-generator sees for each run, so an "independent" draw of the same 'model'. This includes the properties of the population at the start of the simulation.

(2) for each run, we have a result measure, such as the DALYs (which is scaled up for the whole population), are we treating it a single estimate of the whole population's health burden so that the mean of results from multiple runs represent the estimated mean health burden of the whole population and the standard error measure the variability of the estimated mean to the true mean of the population?

Yes, I believe so. (check with @andrew-phillips-1)

@andrew-phillips-1
Copy link
Collaborator

Yes this seems right. if you can imagine doing a million runs with a given set of parameter values and doing this say 3 times, the mean for the output should be essentially identical over those three times. Call this the true mean. You can think of the 95% confidence interval for the mean based on a limited number of runs as the interval in which there is a 95% chance that the true mean lies.

@BinglingICL
Copy link
Collaborator

Yes this seems right. if you can imagine doing a million runs with a given set of parameter values and doing this say 3 times, the mean for the output should be essentially identical over those three times. Call this the true mean. You can think of the 95% confidence interval for the mean based on a limited number of runs as the interval in which there is a 95% chance that the true mean lies.

Thanks very much Andrew. This is super clear and helpful!! And thanks Tim, too.

@joehcollins
Copy link
Collaborator

Thanks everyone - think this will be really helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants