-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow data retrieval for some YTRay objects #2383
Comments
Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue. |
@matthewturk , got a chance to open the issue, continuing from our conversation on Slack. Thanks for your help so far! |
Hi @zhafen, I looked at this, and it looked to me like the data was reading the entirety of the dataset every single time -- which definitely does not make sense! So I dug in a bit more and I think that it's a bug, which I took a shot at in #2391. But, it did not speed it up as much as I'd like; there are two additional sets of parameters that might help. The first is to set I am looking at other possible speedups as well. |
Thanks @matthewturk ! PR #2391 provided a ~7.5x speedup. I played around with Playing with Between PR #2391 and modifying |
Hi @matthewturk, the tests I applied did indeed indicate a huge speed-up for the sightline I used. Unfortunately it appears that for other sightlines PR #2391 only gives a ~2x speed-up (e.g. this is the case for sightlines that pass through the center of the zoom-in region). |
Any chance you could share a starting and ending point that show superduper
slowness?
…On Tue, Dec 17, 2019 at 3:24 PM Zachary Hafen ***@***.***> wrote:
Hi @matthewturk <https://github.com/matthewturk>, the tests I applied did
indeed give a huge speed-up for the sightline I used. Unfortunately it
appears that for other sightlines PR #2391
<#2391> only gives a ~2x speed-up
(e.g. this is the case for sightlines that pass through the center of the
zoom-in region).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2383?email_source=notifications&email_token=AAAVXO2AJ77IG2JXI5PZKM3QZE7SHA5CNFSM4JX7B32KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHEACDQ#issuecomment-566755598>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAVXO3A535SSFZFHGWYTC3QZE7SHANCNFSM4JX7B32A>
.
|
I'm finding that under the default settings in the uploaded example notebook (which sends a ray straight through the center of a halo) generating a Trident ray can take ~4-6 minutes depending on YTRay itself only takes 10-15 seconds to retrieve data for a given field. However it repeats this process ~20 times to get all the data needed for a Trident ray, thus Trident takes 200-300 seconds. |
Could this be because the ray you're making extends outside of the domain boundaries? Ray objects in yt do not work with periodic boundary conditions, so Trident gets around this by making a series of non-periodic rays and appending their data. Each separate ray object will do those data queries all over again. If this is the case, then perhaps the ray selection can be rewritten to take periodicity into account and do the selection for all sub-segments at once. |
The length of the above ray is of order the halo's virial radius, so extending beyond the domain boundaries should not be an issue. I did experience the expected slowness you're describing when I tested a periodic ray previously. |
I'm going through the profiling results, and what I'm seeing is two specific things that are taking lots of time:
I believe the problem is the first -- the second is a side-effect, and just showing that we are reading data too many times. In fact, it seems we are reading data ... way, way too many times. (I get 838 calls to Many of the decisions about how to read data came from the idea that we should minimize over-allocation, when in reality that is not the problem if we have subchunking. I am going to spend a little bit of time looking over how we could potentially eliminate that decision. |
Actually, I just issued #2410 that speeds it up by a fair bit on mine. |
Looks like that wasn't quite it, although I do have an idea about how to apply the ideas more broadly. This did get me a 2x speedup and I will issue it in another PR (and likely retract #2410) , though, and if you could check it that would be great.
|
Looks like a nice, simple change to reduce the amount of times the data is accessed. I can confirm that I got a 2x speed-up, and that the results are consistent before and after the change. |
I implemented a simple version of this, where we don't do as much (if any? I think I may have missed something somewhere, as this is just a first pass) pre-allocation. I get more speedups. Can you try it out? There may be errors in it, especially with vector fields, so it's not ready for a PR yet.
|
Nice! This produced another solid speed-up. With this change and the previous one combined I'm seeing speed-ups of ~2.5-4x. I did find a minor bug, with the following suggested fix: @@ -199,6 +199,8 @@ class BaseIOHandler(metaclass = RegisteredIOHandler):
shape = (fsize[field], )
rv[field] = np.empty(shape, dtype="float64")
ind[field] = 0
- rv[field] = []
- ind[field] = 0
+ for field in fields:
+ rv[field] = []
+ ind[field] = 0
# Now we read.
for field_r, vals in self._read_particle_fields(chunks, ptf, selector):
# Note that we now need to check the mappings The pre-allocation removal changes make sense because for most simulation datasets reading the data will take longer than even typically-slow functions like |
Oh, thanks for the fix! Yes, I think there are likely cases where pre-allocation could help -- specifically, those cases where you are accessing a large collection of data like a dictionary (i.e., a big sim, then doing I'm looking into other potential issues with this before proceeding on a PR or proposal, but I am really glad that it too provided a good speedup. |
With #2416 merged, this issue may be ready to close. I think it'd be good to re-run the initial profiling just to verify that the version on main does improve performance. |
Thanks so much for your work on this! I'm very excited for the related PR. I'll try re-running the initial profiling. |
Profiling just finished. With #2416 merged this is a 3x speed-up! Thank you @chrishavlin, @neutrinoceros, @matthewturk, @jzuhone! |
awesome! thanks for re-running!! |
Bug report
Bug summary
yt appears to be very slow at reading data from h5py files for use in
YTRay
objects, especially for particle-based simulations with snapshot size >~ 7 GB.Code for reproduction
A notebook for reproducing the issue is available here, including ewah file, results of cProfile, and data download links.
Actual outcome
Ray generation is ~30 s per field, for a total of ~10 min when enough fields are retrieved.
Expected outcome
Using h5py to open the full data directly and perform simple calculations takes <5 seconds, so at minimum a factor of ~6 faster. If caching is possible then the speed-up could be much greater.
Notes
yt/utilities/io_handler.py:158(_read_particle_selection)
.Version Information
Installation Method
The text was updated successfully, but these errors were encountered: