-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize physics load balancing twin algorithm initialization #1347
Conversation
When load balancing physics computation on a latlon mesh, there is an advantage to assigning columns to chunks in pairs, pairing each column with its 'twin' in the other hemisphere, latitude reflected around the equator, and offset 180 degrees in longitude. While a similar approach should be advantageous, or at least no worse than no doing it, with a non-latlon mesh, the cost of initialization of this 'twin' algorithm was too high for the HOMME spectral element mesh to be acceptable even though only part of the initialization cost. Because of this, the twin algorithm is disabled by default when using a non-latlon mesh. (For a latlon mesh, the twin algorithm is enabled by default.) Note that a runtime parameter can be used to override the default. A simple change to the initialization algortihm drops the cost to an acceptable level, less than 2 minutes even on one core of Mira and for an ne120 HOMME mesh. In performance experiments, using the twin algorithm has not improved the load balance compared to not using it for current ACME production-like cases, and the default disabling the twin algorithm is not being changed. However, this optimization to the initialization algorithm will make it feasible to re-examine this option in the future. This change will not be tested by the ACME test suites, as the twin algorithm is disabled, and it is also irrelevant unless atmospheric physics load balancing is enabled. Experiments with F compsets and the CAM-SE dycore for both ne30 and ne120 meshes have been conducted, indicating that the same load balancing redistribution of columns is computed by the twin algorithm with and without this optimization. Tests with FV and Spectral Eulerian dycores have not been conducted. [BFB]
@mt5555 , assigned this to you since you might be the only one interested ... Am embarrased that I did not look more carefully into the reason that the twin algorithm was so expensive when you brought it to my attention way back when. Added timers this time, and the reason became immediately obvious. Not sure that this will ever be useful, but eliminating the high initialization cost still seems like something worth doing. |
Moved new code to come after high-level explanatory comments, for improved readability.
…into next PR #1347 When load balancing physics computation on a latlon mesh, there is an advantage to assigning columns to chunks in pairs, pairing each column with its 'twin' in the other hemisphere, latitude reflected around the equator, and offset 180 degrees in longitude. While a similar approach should be advantageous, or at least no worse than no doing it, with a non-latlon mesh, the cost of initialization of this 'twin' algorithm was too high for the HOMME spectral element mesh to be acceptable even though only part of the initialization cost. Because of this, the twin algorithm is disabled by default when using a non-latlon mesh. (For a latlon mesh, the twin algorithm is enabled by default.) Note that a runtime parameter can be used to override the default. A simple change to the initialization algortihm drops the cost to an acceptable level, less than 2 minutes even on one core of Mira and for an ne120 HOMME mesh. In performance experiments, using the twin algorithm has not improved the load balance compared to not using it for current ACME production-like cases, and the default disabling the twin algorithm is not being changed. However, this optimization to the initialization algorithm will make it feasible to re-examine this option in the future. This change will not be tested by the ACME test suites, as the twin algorithm is disabled, and it is also irrelevant unless atmospheric physics load balancing is enabled. Experiments with F compsets and the CAM-SE dycore for both ne30 and ne120 meshes have been conducted, indicating that the same load balancing redistribution of columns is computed by the twin algorithm with and without this optimization. Tests with FV and Spectral Eulerian dycores have not been conducted. [BFB]
#1347 Optimize physics load balancing twin algorithm initialization When load balancing physics computation on a latlon mesh, there is an advantage to assigning columns to chunks in pairs, pairing each column with its 'twin' in the other hemisphere, latitude reflected around the equator, and offset 180 degrees in longitude. While a similar approach should be advantageous, or at least no worse than no doing it, with a non-latlon mesh, the cost of initialization of this 'twin' algorithm was too high for the HOMME spectral element mesh to be acceptable even though only part of the initialization cost. Because of this, the twin algorithm is disabled by default when using a non-latlon mesh. (For a latlon mesh, the twin algorithm is enabled by default.) Note that a runtime parameter can be used to override the default. A simple change to the initialization algortihm drops the cost to an acceptable level, less than 2 minutes even on one core of Mira and for an ne120 HOMME mesh. In performance experiments, using the twin algorithm has not improved the load balance compared to not using it for current ACME production-like cases, and the default disabling the twin algorithm is not being changed. However, this optimization to the initialization algorithm will make it feasible to re-examine this option in the future. This change will not be tested by the ACME test suites, as the twin algorithm is disabled, and it is also irrelevant unless atmospheric physics load balancing is enabled. Experiments with F compsets and the CAM-SE dycore for both ne30 and ne120 meshes have been conducted, indicating that the same load balancing redistribution of columns is computed by the twin algorithm with and without this optimization. Tests with FV and Spectral Eulerian dycores have not been conducted. [BFB]
#1347 Optimize physics load balancing twin algorithm initialization When load balancing physics computation on a latlon mesh, there is an advantage to assigning columns to chunks in pairs, pairing each column with its 'twin' in the other hemisphere, latitude reflected around the equator, and offset 180 degrees in longitude. While a similar approach should be advantageous, or at least no worse than no doing it, with a non-latlon mesh, the cost of initialization of this 'twin' algorithm was too high for the HOMME spectral element mesh to be acceptable even though only part of the initialization cost. Because of this, the twin algorithm is disabled by default when using a non-latlon mesh. (For a latlon mesh, the twin algorithm is enabled by default.) Note that a runtime parameter can be used to override the default. A simple change to the initialization algortihm drops the cost to an acceptable level, less than 2 minutes even on one core of Mira and for an ne120 HOMME mesh. In performance experiments, using the twin algorithm has not improved the load balance compared to not using it for current ACME production-like cases, and the default disabling the twin algorithm is not being changed. However, this optimization to the initialization algorithm will make it feasible to re-examine this option in the future. This change will not be tested by the ACME test suites, as the twin algorithm is disabled, and it is also irrelevant unless atmospheric physics load balancing is enabled. Experiments with F compsets and the CAM-SE dycore for both ne30 and ne120 meshes have been conducted, indicating that the same load balancing redistribution of columns is computed by the twin algorithm with and without this optimization. Tests with FV and Spectral Eulerian dycores have not been conducted. [BFB]
Update needed for ACME Framework features brought in: * f444d0f Merge PR E3SM-Project#1418 'matthewhoffman/framework/output_record_reference_time' into develop * 263e14f Merge PR E3SM-Project#1428 'mark-petersen/framework/couple_fixes' into develop * bcce31d Merge PR E3SM-Project#1424 'amametjanov:az/tools/cp-prebuilt-tools' into develop * 98cfeea Merge PR# 1349 'akturner/framework/forcing_cleanup' into develop * 9359319 Merge PR E3SM-Project#1347 'akturner/framework/forcing_restart_timestamp' into develop * e9ce203 Merge PR E3SM-Project#1348 'akturner/framework/forcing_at_init' into develop * 4974284 Merge PR E3SM-Project#1368 'akturner/framework/improved_messages_in_driver' into develop * 86d50c5 Merge PR E3SM-Project#1417 'akturner/framework/forcing_multiple_blocks' into develop * 9116da3 Merge branch 'framework/validation-of-streams-using-interval_in-interval_out' into develop * e466b46 Merge branch 'framework/interval_in-interval_out-support-for-streams' into develop * 30dc955 Merge branch 'az/framework/mpas_dmpar-race-fix' into develop * b632938 Merge branch 'framework/i8_interval_division' into develop * 6dac06c Merge branch 'framework/log_write_IBM_error' into develop * 960a648 Merge branch 'framework/cleanup-logging-stream-manager' into develop * 504c282 Merge branch 'framework/make-streams-with-direction-none-inactive' into develop * 5903748 Merge branch 'framework/correctly_remove_blk_fields' into develop * 3565965 Merge branch 'framework/iostreams-real4dfield-bug' into develop * 8b60591 Merge branch 'framework/missing-deallocate-nEdgesOnCellField-bootstrapping' into develop * 70b953b Merge branch 'master' into develop
#1347 Optimize physics load balancing twin algorithm initialization When load balancing physics computation on a latlon mesh, there is an advantage to assigning columns to chunks in pairs, pairing each column with its 'twin' in the other hemisphere, latitude reflected around the equator, and offset 180 degrees in longitude. While a similar approach should be advantageous, or at least no worse than no doing it, with a non-latlon mesh, the cost of initialization of this 'twin' algorithm was too high for the HOMME spectral element mesh to be acceptable even though only part of the initialization cost. Because of this, the twin algorithm is disabled by default when using a non-latlon mesh. (For a latlon mesh, the twin algorithm is enabled by default.) Note that a runtime parameter can be used to override the default. A simple change to the initialization algortihm drops the cost to an acceptable level, less than 2 minutes even on one core of Mira and for an ne120 HOMME mesh. In performance experiments, using the twin algorithm has not improved the load balance compared to not using it for current ACME production-like cases, and the default disabling the twin algorithm is not being changed. However, this optimization to the initialization algorithm will make it feasible to re-examine this option in the future. This change will not be tested by the ACME test suites, as the twin algorithm is disabled, and it is also irrelevant unless atmospheric physics load balancing is enabled. Experiments with F compsets and the CAM-SE dycore for both ne30 and ne120 meshes have been conducted, indicating that the same load balancing redistribution of columns is computed by the twin algorithm with and without this optimization. Tests with FV and Spectral Eulerian dycores have not been conducted. [BFB]
#1347 Optimize physics load balancing twin algorithm initialization When load balancing physics computation on a latlon mesh, there is an advantage to assigning columns to chunks in pairs, pairing each column with its 'twin' in the other hemisphere, latitude reflected around the equator, and offset 180 degrees in longitude. While a similar approach should be advantageous, or at least no worse than no doing it, with a non-latlon mesh, the cost of initialization of this 'twin' algorithm was too high for the HOMME spectral element mesh to be acceptable even though only part of the initialization cost. Because of this, the twin algorithm is disabled by default when using a non-latlon mesh. (For a latlon mesh, the twin algorithm is enabled by default.) Note that a runtime parameter can be used to override the default. A simple change to the initialization algortihm drops the cost to an acceptable level, less than 2 minutes even on one core of Mira and for an ne120 HOMME mesh. In performance experiments, using the twin algorithm has not improved the load balance compared to not using it for current ACME production-like cases, and the default disabling the twin algorithm is not being changed. However, this optimization to the initialization algorithm will make it feasible to re-examine this option in the future. This change will not be tested by the ACME test suites, as the twin algorithm is disabled, and it is also irrelevant unless atmospheric physics load balancing is enabled. Experiments with F compsets and the CAM-SE dycore for both ne30 and ne120 meshes have been conducted, indicating that the same load balancing redistribution of columns is computed by the twin algorithm with and without this optimization. Tests with FV and Spectral Eulerian dycores have not been conducted. [BFB]
When load balancing physics computation on a latlon mesh,
there is an advantage to assigning columns to chunks in pairs,
pairing each column with its 'twin' in the other hemisphere,
latitude reflected around the equator, and offset 180 degrees
in longitude. While a similar approach should be advantageous,
or at least no worse than no doing it, with a non-latlon mesh,
the cost of initialization of this 'twin' algorithm was too high
for the HOMME spectral element mesh to be acceptable even though
only part of the initialization cost. Because of this, the twin
algorithm is disabled by default when using a non-latlon mesh.
(For a latlon mesh, the twin algorithm is enabled by default.)
Note that a runtime parameter can be used to override the default.
A simple change to the initialization algortihm drops the cost
to an acceptable level, less than 2 minutes even on one core of
Mira and for an ne120 HOMME mesh. In performance experiments, using
the twin algorithm has not improved the load balance compared to
not using it for current ACME production-like cases, and the default
disabling the twin algorithm is not being changed. However, this
optimization to the initialization algorithm will make it feasible
to re-examine this option in the future.
This change will not be tested by the ACME test suites, as the
twin algorithm is disabled, and it is also irrelevant unless
atmospheric physics load balancing is enabled.
Experiments with F compsets and the CAM-SE dycore for both
ne30 and ne120 meshes have been conducted, indicating that the
same load balancing redistribution of columns is computed by
the twin algorithm with and without this optimization. Tests with
FV and Spectral Eulerian dycores have not been conducted.
[BFB]