Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate benefits of pre-allocation of arrays #2412

Closed
matthewturk opened this issue Jan 7, 2020 · 2 comments
Closed

Evaluate benefits of pre-allocation of arrays #2412

matthewturk opened this issue Jan 7, 2020 · 2 comments
Assignees
Labels
demeshening Removing the global mesh for particles index: particle performance

Comments

@matthewturk
Copy link
Member

At present, the frontends do an indexing check to identify the size of arrays to allocate before conducting any IO. For grid-based frontends, this is not terribly onerous (it does cost floating point operations, but a best effort is made to cache those operations) but for particle frontends, it is quite taxing as it requires an IO pass.

The reason this decision was made was to avoid having to do large-scale concatenation of arrays, which results in a memory doubling at the finalization step. However, this finalization step is often not even required, as nearly all of the operations are chunked anyway.

It is my suspicion that we could speed up considerably the operations in yt that use particles if we dropped the pre-allocation requirement, and moved instead to concatenating arrays (or reallocing them) and provide only upper bounds on the size of the arrays we expect, rather than exact bounds. But, this is just intuition -- which is how the decision was made initially to do the preallocation!

This issue is a placeholder for evaluating this. I believe this could be tested in small-scale by changing how _count_particles operates, to have it return None and in the case of None, to have the IO operations (which are mostly consolidated in yt/utilities/io_handler.py:BaseIOHandler._read_particle_selection) to grow a list of values rather than filling-as-they-go with a running index.

@matthewturk matthewturk added performance demeshening Removing the global mesh for particles index: particle labels Jan 7, 2020
@matthewturk matthewturk self-assigned this Jan 7, 2020
@matthewturk
Copy link
Member Author

My email to yt-dev (archive link) is reproduced here:

Hi folks,

Over the last little while, I've been looking at this issue and having some thoughts:

#2383

Back when yt-3.0 was first set up, it was built-in that the arrays were all pre-allocated before being read.  (An important note about this is that this pre-dates the demeshening.)  What this means is that typically the process of reading data from any type of frontend goes something like this:

 * _identify_base_chunk -> figure out how big the whole thing is
 * either read, or subdivide into chunks

When a chunk is read -- which includes the chunk style "all" that reads all the data in a given data object in a single go -- the destination buffer is preallocated.  For grid objects, this can be done without reading any data off of disk.  The process is still expensive, but we cache the most-recently-used grid mask every time we count a given grid.

For particles, however, the case is different.  Determining selections of particles requires IO (our bitmap indices can only provide an upper bound) and so calling identify_base_chunk, if it needs to size the thing, will read data from disk.  But, by the time we had implemented a nice indexing scheme for particles, we had wedded ourselves to this, and so it got implemented this way.  It was one unrelated design decision that was applied out of context.

Now, you might ask yourself, why do we do that anyway?  Why know how big something is?  Well, it's because our old method (pre-3.0) was along these lines:

 * read each grid, one at a time (remember we only had grid support) and then mask them
 * at the end, concatenate the arrays all together

For reasonably sized data, this isn't so bad.  The biggest problem is that of fragmentation and copying -- we're making lots of little-ish arrays in step 1, and in step 2, we make a single big array and copy each one in, then de-allocate.  This was the most painful when we read a gigantic data object in all at once.  We want to avoid the issue of reading some 1024^3 dataset and then having a moment when the memory doubles.

But, in contrast to before, now almost all of our operations are chunked (what we used to call "lazy") and so don't need to be able to read a gigantic array for most things.  Mind you, if you do ds.r[:]["something"] it'll still read it in, but that is much, much less common than it was before.

So what I have explored a bit is what happens if, for particle datasets, we get rid of this notion that we need to know how big something is (or rather, exactly how big) before we do any IO?  And, it turns out, it does make a difference -- a non-negligible one, in fact, for times when the cost of reallocating and copying is not that big.

(There are some numbers on the issue I linked above.)

So that's a pretty long, rambly dive into things, but here's the thing I wanted to bring up: I'd like to explore not pre-allocating memory or pre-counting items in the chunking and IO systems.  But it's exploratory, and I don't know if it will pan out.  So, I'd like to invite if anyone is interested, to try it out with me, and if you have objections, now would be a perfect time to raise them.

I'll post this email here as well: #2412 so if you're interested, subscribe to that issue.
If this turns out to be a worthwhile change, I will propose these steps, all of which would be in the yt-4.0 branch:
 1) Turn it off one frontend at a time and examine performance and memory use in a few different cases 2) Once all frontends have been disabled, disable support for it in the base chunking system itself, or at least make it optional and consolidate references to it in the codebase 3) Update YTEP-0001 to reflect this new state of affairs
yt's performance needs to be improved, and this could be a good first step at finding ways to do so that doesn't require a ton of surgery and would overall reduce the complexity of yt.
-Matt

@matthewturk
Copy link
Member Author

I think now that #2416 is merged, we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demeshening Removing the global mesh for particles index: particle performance
Projects
None yet
Development

No branches or pull requests

1 participant