Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resampling during outlier_detection generates unused products #8638

Closed
stscijgbot-jp opened this issue Jul 9, 2024 · 7 comments · Fixed by #8866
Closed

resampling during outlier_detection generates unused products #8638

stscijgbot-jp opened this issue Jul 9, 2024 · 7 comments · Fixed by #8866

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-3685 was created on JIRA by Brett Graham:

When resampling is performed during outlier detection, the underlying GWCSDrizzle generates a context array which is unused. This array can be quite large, for N 2d inputs of size H x W the context array will be of size H x W x ((N // 32) + 1). Investigate disabling this (and any other unnecessary intermediate products) to save on memory and computation during outlier detection.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Mihai Cara on JIRA:

The code that would allow this is pending in a couple of PRs in drizzle and jwst packages. However, there is an issue with models that creates context arrays regardless. Specifically, here is an example:

 

In [1]: from stdatamodels.jwst.datamodels import ImageModel
In [2]: m = ImageModel((5, 5))
In [3]: m.data
Out[3]:
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]], dtype=float32)
In [4]: m.con
Out[4]:
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int32)
In [5]: m.con = None
In [6]: m.con
Out[6]:
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int32)
In [7]: del m.con
In [8]: m.con
Out[8]:
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int32)

So, until data model are not modified in a way as to allow some arrays to be None, I am afraid these arrays will always re-created.

 

@braingram
Copy link
Collaborator

The con array is created when it accessed. This can be seen by looking at the instance attribute of the datamodel:

>> from stdatamodels.jwst.datamodels import ImageModel
>> m = ImageModel((5, 5))
>> 'con' in m.instance
False
>> m.con
>> 'con' in m.instance
True

There are more details in the documentation about the datamodels:
https://stdatamodels.readthedocs.io/en/latest/jwst/datamodels/structure.html#the-structure-of-datamodels

@mcara
Copy link
Member

mcara commented Oct 18, 2024

Correct. But that means we cannot completely get rid of this memory: H x W x 1 is the minimum.

@braingram
Copy link
Collaborator

If we never assign to con it will never be allocated (making the minimum 0).

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Ned Molter on JIRA:

I'm attaching memory profiling before (outlier_mihai.html) and after (outlier_mihai_2.html) the change that removed allocation of the context array.  It does appear that the context array is no longer allocated, so I think this ticket can be resolved once that PR is merged.

The new peak memory usage shows that sigma_clip is using a factor of ~4 more memory than the weight array input into it.  This does beg the question of whether that, too, can be optimized.  But I think this should be its own ticket, especially given that the offending line lives in stcal. https://github.com/spacetelescope/stcal/blob/dfe1d6d51f0c2a400dff918a52c779328ae19b7d/src/stcal/outlier_detection/utils.py#L80 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Ned Molter on JIRA:

Attaching two more flamegraphs here, showing the effect of my fix to AL-875 along with Mihai's PR.  The memory usage for on-disk=True decreased by 25% from ~4 GB to ~3 GB.

The file names are outlier_before_ondisk.html and outlier_AL875_mihai_ondisk.html for the before and after cases, respectively.

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Melanie Clarke on JIRA:

Fixed by #8866

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants