Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ratio comparison (different populations) MC sampling duplication #1086

Closed
Yuri05 opened this issue Jul 20, 2023 · 5 comments
Closed

Ratio comparison (different populations) MC sampling duplication #1086

Yuri05 opened this issue Jul 20, 2023 · 5 comments

Comments

@Yuri05
Copy link
Member

Yuri05 commented Jul 20, 2023

At the moment, the MC sampling is duplicated for each PK-Parameter.
This should be optimized (sample only once; then calculate PK-Ratios for each parameter from sampled populations)

pchelle added a commit to pchelle/OSPSuite.ReportingEngine that referenced this issue Oct 12, 2023
@Yuri05 Yuri05 closed this as completed in bec9879 Oct 17, 2023
@Yuri05
Copy link
Member Author

Yuri05 commented Oct 23, 2023

After the change, the computation became much slower.
E.g. in my example workflow (simulation results are precalculated; only CalculatePK and PlotPK tasks are active):

Previous run (before the change): 4.7 minutes
OSP Suite Package version: 11.2.305
OSP Reporting Engine version: 2.2.283
tlf version: 1.5.150
20/07/2023 - 13:45:53
Starting run of Population Workflow for ratioComparison
20/07/2023 - 13:45:53
Starting run of Plot PK Parameters task
20/07/2023 - 13:46:05
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:46:35
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.401, 0.279, 0.39, 2.016 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.497, 0.385, 0.39, 2.016 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:46:37
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:47:07
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.401, 0.279, 0.39, 2.016 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.497, 0.385, 0.39, 2.016 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:47:09
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:47:39
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 492.301, 134.785, 510.166, 1.305 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 528.437, 141.72, 510.166, 1.305 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:47:41
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:48:11
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0, 0, 0, 2.669 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0, 0.001, 0, 2.669 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:48:13
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:48:43
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.253, 0.19, 0.255, 2.003 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.324, 0.25, 0.255, 2.003 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:48:45
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:49:16
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.253, 0.19, 0.255, 2.003 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.324, 0.25, 0.255, 2.003 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:49:18
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:49:48
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.247, 0.198, 0.25, 2.012 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.318, 0.247, 0.25, 2.012 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:49:50
Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'. Ratio comparison analyzed statistics from Monte Carlo Sampling
20/07/2023 - 13:50:20
Analysis of PK Ratios between 'Midazolam Treatment' (n=1000) against reference 'Midazolam Control' (n=1000)
Analytical solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.247, 0.198, 0.25, 2.012 respectively
Monte Carlo solution for mean, standard deviation, geo mean, geo standard deviation resulted in 0.318, 0.247, 0.25, 2.012 respectively.
Number of repetitions was set to: 10000
Random seed number was set to: 123456
20/07/2023 - 13:50:34
Plot PK Parameters task completed in 4.7 min
20/07/2023 - 13:50:37
Population Workflow for ratioComparison completed in 4.7 min
Current run: 13.5 min
i Info	Reporting Engine Information:
	Date: 23/10/2023 - 12:30:47
	System versions:
	R version: R version 4.2.2 (2022-10-31 ucrt)
	OSP Suite Package version: 12.0.391
	OSP Reporting Engine version: 2.2.302
	tlf version: 1.5.156
23/10/2023 - 12:30:48
i Info	Starting run of Population Workflow for ratioComparison
23/10/2023 - 12:30:48
i Info	Starting run of Calculate PK Parameters task
23/10/2023 - 12:30:48
i Info	Starting run of Calculate PK Parameters task for Midazolam Control
23/10/2023 - 12:30:50
i Info	Starting run of Calculate PK Parameters task for Midazolam Treatment
23/10/2023 - 12:30:55
i Info	Simulation Set 'Midazolam Treatment' was identified with population different from reference 'Midazolam Control'.
Ratio comparison will use Monte Carlo Sampling for analyzing statistics.
23/10/2023 - 12:30:59                                                                                                                                             
i Info	Monte Carlo Sampling for Midazolam Treatment will use 10000 repetitions with Random Seed 123456
23/10/2023 - 12:42:31
i Info	Monte Carlo Sampling completed in 11.5 min
23/10/2023 - 12:43:13
i Info	Calculate PK Parameters task completed in 12.4 min
23/10/2023 - 12:43:13
i Info	Starting run of Plot PK Parameters task
23/10/2023 - 12:43:48                                                                                                                                             
! Warning	font family not found in Windows font database
23/10/2023 - 12:44:07
i Info	Plot PK Parameters task completed in 0.9 min
Executing: pandoc --embed-resources --standalone --wrap=none --toc --from=markdown+tex_math_dollars+superscript+subscript+raw_attribute --reference-doc="C:/Dev/R-4.2.2/library/ospsuite.reportingengine/extdata/reference.docx" --resource-path="Report" -t docx -o "Report/Report-word.docx" "Report/Report-word.md"
23/10/2023 - 12:44:20
i Info	Population Workflow for ratioComparison completed in 13.5 min

@Yuri05 Yuri05 reopened this Oct 23, 2023
@pchelle
Copy link
Collaborator

pchelle commented Oct 23, 2023

I have started investigating why the process has become much slower.
The main culprit is the function group_by that I used at a few places in order to summarize each statistics by path and parameter.
Interestingly, the function summarise also includes a .by argument that works the same but since used internally in the function decrease almost by half its computation time.

@pchelle
Copy link
Collaborator

pchelle commented Oct 23, 2023

Removing the group_by, the monte carlo test with 1000 population size and 10000 repetitions end with a computation time of 3.4 minutes.
If I include parallel computation I may be able to make it drop a lot more.

@pchelle
Copy link
Collaborator

pchelle commented Oct 23, 2023

If I include parallel computation I may be able to make it drop a lot more.

With 7 cores, it went down to 59 seconds

@pchelle
Copy link
Collaborator

pchelle commented Oct 23, 2023

If I include parallel computation I may be able to make it drop a lot more.

Same test with 7 cores, it went down to 59 seconds !

pchelle added a commit to pchelle/OSPSuite.ReportingEngine that referenced this issue Oct 23, 2023
…utation and suggest parallel

- If parallel is not installed, the regular code is used.
- Regular code tries not to run group_by that slow down the computation by a lot
- The number of cores is identified from the SimulationSettings already set up for the PK parameter calculation task
pchelle added a commit to pchelle/OSPSuite.ReportingEngine that referenced this issue Oct 25, 2023
@Yuri05 Yuri05 closed this as completed in c862b2c Oct 25, 2023
@Yuri05 Yuri05 moved this to Verified in Version 2.1 / 2.2 Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Verified
Development

No branches or pull requests

2 participants