Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunking in frequency may cause excessive memory use #163

Closed
JSKenyon opened this issue Jun 8, 2021 · 2 comments
Closed

Chunking in frequency may cause excessive memory use #163

JSKenyon opened this issue Jun 8, 2021 · 2 comments

Comments

@JSKenyon
Copy link
Collaborator

JSKenyon commented Jun 8, 2021

  • dask-ms version: master
  • Python version: 3.8.8
  • Operating System: Pop!_OS 18.04

Description

While running my small example case, chunking in frequency causes bizarre memory behaviour.

What I Did

I don't have a reproducer, but it boils down to reading a measurement set while chunking in frequency.

What happened?

Screenshot from 2021-06-08 11-14-10

The memory usage grows, seemingly without bound, over time. Contrast this with reading from zarr:

Screenshot from 2021-06-08 11-18-11

Why?

My best guess is that this is a problem with the way casacore caches data for tiled columns. Dask does not process chunks in a consistent order - I believe that this leads to the all the tiled columns being cached in their entirety. This may only happen when the frequency chunk is smaller than the frequency tiling of the MS.

@sjperkins
Copy link
Member

Related to:

Also, recalling our conversation, changing the number of frequency chunks did not increase or decrease the amount of memory used in the CASA Table case.

@JSKenyon
Copy link
Collaborator Author

Closing as this is mainly an edge case. This should only happen when data is poorly tiled/very small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants