-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate benefits of pre-allocation of arrays #2412
Comments
My email to yt-dev (archive link) is reproduced here:
|
I think now that #2416 is merged, we can close this. |
At present, the frontends do an indexing check to identify the size of arrays to allocate before conducting any IO. For grid-based frontends, this is not terribly onerous (it does cost floating point operations, but a best effort is made to cache those operations) but for particle frontends, it is quite taxing as it requires an IO pass.
The reason this decision was made was to avoid having to do large-scale concatenation of arrays, which results in a memory doubling at the finalization step. However, this finalization step is often not even required, as nearly all of the operations are chunked anyway.
It is my suspicion that we could speed up considerably the operations in yt that use particles if we dropped the pre-allocation requirement, and moved instead to concatenating arrays (or reallocing them) and provide only upper bounds on the size of the arrays we expect, rather than exact bounds. But, this is just intuition -- which is how the decision was made initially to do the preallocation!
This issue is a placeholder for evaluating this. I believe this could be tested in small-scale by changing how
_count_particles
operates, to have it returnNone
and in the case ofNone
, to have the IO operations (which are mostly consolidated inyt/utilities/io_handler.py:BaseIOHandler._read_particle_selection
) to grow a list of values rather than filling-as-they-go with a running index.The text was updated successfully, but these errors were encountered: