Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage for timeseries jac computation #1001

Merged
merged 5 commits into from
Oct 16, 2023

Conversation

johnjasa
Copy link
Member

@johnjasa johnjasa commented Oct 12, 2023

Summary

After a good amount of digging into cases that scaled poorly as number of procs increased, I found a toarray() call that was turning a sparse array into a dense before going back to sparse. In the case of @kanekosh's run script with relatively high num_segments and a memory-expensive parallel ODE, this caused a large increase in memory usage during setup().

This new implementation keeps the jac in sparse format throughout. I've changed it in the two places where it happened -- and maybe those files could be combined into one? Or @robfalck were they two separate files on purpose?

Prior setup() mem usage:
without_fix

Mem usage for setup() with the fix:
with_fix

This PR does not address potentially large memory usage in final_setup() as that is a separate but related issue regarding how PETScVectors are created and used. We should further discuss if we have action items there.

Related Issues

Backwards incompatibilities

None

New Dependencies

None

@johnjasa johnjasa requested a review from robfalck October 12, 2023 17:23
@kanekosh
Copy link
Contributor

Thank you @johnjasa ! This fixes the issue I was facing.

Here is a summary of the memory usage for my Dymos+OAS case. The memory usage is shown in % of the total memory I have on my machine (64GB).

Before this fix (Dymos 1.9.0)

n_procs = 1
setup:          12.1%
run_model:       5.5%
compute_totals: 10.6%

n_procs = 2
setup:          20.6%
run_model:       5.2%
compute_totals: 11.0%

n_procs = 4
setup:          37.6%
run_model:       5.6%
compute_totals: 12.0%

n_procs = 8
setup:          72.0%
run_model:       7.2%
compute_totals: 13.6%

n_procs = 16
runs out of memory during setup

And with this fix:

n_procs = 1
setup:           3.8% 
run_model:       4.4%
compute_totals: 10.6% 

n_procs = 2
setup:           4.2%
run_model:       5.2%
compute_totals: 11.0%

n_procs = 4
setup:           4.8%
run_model:       5.6%
compute_totals: 12.0%

n_procs = 8   
setup:           6.3%
run_model:       7.2%
compute_totals: 13.6%

n_procs = 16
setup:           9.6%
run_model:      11.2%
compute_totals: 16.0%

As far as I observe,final_setup() uses approximately the same amount of memory as run_model and is not a bottleneck. But I've only monitored memory from top with 0.1sec frequency, so I might have overlooked if there was any "spike" during final_setup.


if rate:
mat = self.differentiation_matrix
else:
mat = self.interpolation_matrix

for i in range(size):
if _USE_SPARSE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _USE_SPARSE setting was just used to quickly toggle and check performance between sparse and dense implementations. Let's remove the _USE_SPARSE variable and just assume that we always do so. Any performance benefits of dense are typically only present in smaller problems.

@coveralls
Copy link

Coverage Status

coverage: 92.55% (+0.02%) from 92.53% when pulling fba061d on johnjasa:jac_sparse_fix into 9e02030 on OpenMDAO:master.

@robfalck robfalck merged commit 2953f09 into OpenMDAO:master Oct 16, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running out of memory with MPI (parallel trajectory optimization)
4 participants