You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Through Pull Request #1168, the HEC-FDA development team redesigned the way that HEC-FDA generates random numbers for Monte Carlo simulation.
The team discovered two issues with the original design for random number generation.
Sample space appeared to be inadequately covered.
The application of random numbers per compute was not independent of the addition or removal of variables for said compute.
Background
The software was originally designed to use a unique random number generator for each iteration of the Monte Carlo simulation. This random number generator was used for a handful of random numbers, one for each summary relationship sampled. The iteration-specific random number generators were seeded through an array of seeds that was itself created by a seeded random number generator, with an array the size of the maximum number of iterations. In other words, the software created a very large number of random number generators for a given compute, and the software used each random number generator a small number of times. This approach can be examined in the image below, where threadlocalRandomProvider is the iteration-specific random number generator. The threadlocalRandomProvider would provide less than 10 random numbers for a given iteration, and on the next iteration, a new threadlocalRandomProvider would be created.
With this approach, adding summary relationships to the sampling order of a compute configuration broke the sequence of random numbers distributed to the summary relationships for a given iteration. Breaking the sequence of random numbers passed to an uncertain variable caused a replicability issue for the inclusion of uncertainty about that variable in the Monte Carlo simulation. The replicability issue was the catalyst for the need for redesign. See issue #1156 for details. In short, a flow regulation variable was added to an expected annual damage compute and resulted in an increase in expected annual damage because for each iteration, the iteration-specific random number generator assigned different random numbers to the the variables sampled after flow regulation than the random numbers assigned without the flow regulation variable.
The problem of replicability can be viewed in the image below. There are two code blocks in the image, the bottom code block applies flow regulation, the top code block does not. If the regulation variable reflects zero regulation so that regulated flows are equivalent to unregulated flows, then the two code blocks should produce the same result. However, the additional sampling on line 295 for flow regulation means that the random number passed into the stage-discharge relationship on line 297 is different than the random number that would be passed into the same relationship on line 289 for the same iteration, breaking replicability. The fix to this solution is generation of random numbers for a given variable in a way that is independent of generation of random numbers for any other variable.
With consideration of a redesign, a choice needed to be made about keeping the current approach with many random number generators used a small number of times or moving to a small number of random number generators many times. The development team pursued a more traditional approach inline with Microsoft's recommendation for the use of random number generation with .NET tools: "Both to improve performance and to avoid inadvertently creating separate random number generators that generate identical numeric sequences, we recommend that you create one Random object to generate many random numbers over time, instead of creating new Random objects to generate one random number."
The new approach for creating one random number generator per object that can be sampled can be examined below. Each object that can be sampled, whether a summary relationship with uncertainty, which exists as an uncertain paired data object in the software, or a content-to-structure value ratio, is given its own random number generator, and that generator produces as many random numbers for that object as there will be iterations.
The move to a more traditional approach in line with Microsoft's recommendation fixed the first issue, which reflected poor sample space coverage. The coinciding change in the results produced by the Monte Carlo simulation is stark. The two images below reflect the distribution of annual exceedance probability for the exact same compute configuration and input study data, the top with the original approach using very many random number generators a small number of times, the bottom with the new approach using a small number of random number generators very many times. It is clear in the result distribution that sample space coverage is much more complete.
Lessons Learned
There are two really important lessons learned from this change. First, the design of random number generation needs to allow for random number assignment in a way that is independent of the addition or removal of variables in a simulation so that inclusion of uncertainty about a given variable can be perfectly replicated. Second, use of many random number generators a small number of times results in imperfect sample space coverage. The design of random number generation should comply with Microsoft's recommendation whenever possible, using a small number of random number generators many times over.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Introduction
Through Pull Request #1168, the HEC-FDA development team redesigned the way that HEC-FDA generates random numbers for Monte Carlo simulation.
The team discovered two issues with the original design for random number generation.
Background
The software was originally designed to use a unique random number generator for each iteration of the Monte Carlo simulation. This random number generator was used for a handful of random numbers, one for each summary relationship sampled. The iteration-specific random number generators were seeded through an array of seeds that was itself created by a seeded random number generator, with an array the size of the maximum number of iterations. In other words, the software created a very large number of random number generators for a given compute, and the software used each random number generator a small number of times. This approach can be examined in the image below, where
threadlocalRandomProvider
is the iteration-specific random number generator. ThethreadlocalRandomProvider
would provide less than 10 random numbers for a given iteration, and on the next iteration, a newthreadlocalRandomProvider
would be created.With this approach, adding summary relationships to the sampling order of a compute configuration broke the sequence of random numbers distributed to the summary relationships for a given iteration. Breaking the sequence of random numbers passed to an uncertain variable caused a replicability issue for the inclusion of uncertainty about that variable in the Monte Carlo simulation. The replicability issue was the catalyst for the need for redesign. See issue #1156 for details. In short, a flow regulation variable was added to an expected annual damage compute and resulted in an increase in expected annual damage because for each iteration, the iteration-specific random number generator assigned different random numbers to the the variables sampled after flow regulation than the random numbers assigned without the flow regulation variable.
The problem of replicability can be viewed in the image below. There are two code blocks in the image, the bottom code block applies flow regulation, the top code block does not. If the regulation variable reflects zero regulation so that regulated flows are equivalent to unregulated flows, then the two code blocks should produce the same result. However, the additional sampling on line 295 for flow regulation means that the random number passed into the stage-discharge relationship on line 297 is different than the random number that would be passed into the same relationship on line 289 for the same iteration, breaking replicability. The fix to this solution is generation of random numbers for a given variable in a way that is independent of generation of random numbers for any other variable.
With consideration of a redesign, a choice needed to be made about keeping the current approach with many random number generators used a small number of times or moving to a small number of random number generators many times. The development team pursued a more traditional approach inline with Microsoft's recommendation for the use of random number generation with .NET tools: "Both to improve performance and to avoid inadvertently creating separate random number generators that generate identical numeric sequences, we recommend that you create one Random object to generate many random numbers over time, instead of creating new Random objects to generate one random number."
The new approach for creating one random number generator per object that can be sampled can be examined below. Each object that can be sampled, whether a summary relationship with uncertainty, which exists as an uncertain paired data object in the software, or a content-to-structure value ratio, is given its own random number generator, and that generator produces as many random numbers for that object as there will be iterations.
HEC-FDA/HEC.FDA.Model/paireddata/GraphicalUncertainPairedData.cs
Line 59 in dc91469
The move to a more traditional approach in line with Microsoft's recommendation fixed the first issue, which reflected poor sample space coverage. The coinciding change in the results produced by the Monte Carlo simulation is stark. The two images below reflect the distribution of annual exceedance probability for the exact same compute configuration and input study data, the top with the original approach using very many random number generators a small number of times, the bottom with the new approach using a small number of random number generators very many times. It is clear in the result distribution that sample space coverage is much more complete.
Lessons Learned
There are two really important lessons learned from this change. First, the design of random number generation needs to allow for random number assignment in a way that is independent of the addition or removal of variables in a simulation so that inclusion of uncertainty about a given variable can be perfectly replicated. Second, use of many random number generators a small number of times results in imperfect sample space coverage. The design of random number generation should comply with Microsoft's recommendation whenever possible, using a small number of random number generators many times over.
Beta Was this translation helpful? Give feedback.
All reactions