Slow data retrieval for some YTRay objects #2383

zhafen · 2019-12-08T18:33:47Z

Bug report

Bug summary

yt appears to be very slow at reading data from h5py files for use in YTRay objects, especially for particle-based simulations with snapshot size >~ 7 GB.

Code for reproduction

A notebook for reproducing the issue is available here, including ewah file, results of cProfile, and data download links.

Actual outcome
Ray generation is ~30 s per field, for a total of ~10 min when enough fields are retrieved.

Expected outcome
Using h5py to open the full data directly and perform simple calculations takes <5 seconds, so at minimum a factor of ~6 faster. If caching is possible then the speed-up could be much greater.

Notes

Most of the time is spent in yt/utilities/io_handler.py:158(_read_particle_selection).
The simulation capable of creating this problem has high dynamic range: Particle mass resolution ~7000 Msun, for a total gas mass of a few times 1e12 Msun (i.e. out to a few times the virial radius of a L* galaxy).
The raw data is a Lagrangian cosmological zoom-in simulation with data separated into four files. Total file size ~8GB.
Intersecting the box edge doesn't significantly change computation time. (Note that this is true for the YT ray, but it will take twice as long for the Trident ray because Trident will break up rays that go beyond the box into smaller rays, each of which is equally expensive.)

Version Information

Operating System: Linux
Python Version: 3.7.1
yt version: 4.0.dev0 changeset 947ebdc
Other Libraries (if applicable): h5py 2.8.0, Trident 1.3.dev1

Installation Method

$ # install yt
$ git clone https://github.com/yt-project/yt --branch=yt-4.0
$ cd yt
$ pip install -e .
$ cd ..
$ # install trident
$ git clone https://github.com/trident-project/trident
$ cd trident
$ pip install -e .

The text was updated successfully, but these errors were encountered:

welcome · 2019-12-08T18:33:49Z

Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue.

zhafen · 2019-12-08T18:34:49Z

@matthewturk , got a chance to open the issue, continuing from our conversation on Slack. Thanks for your help so far!

matthewturk · 2019-12-13T18:40:33Z

Hi @zhafen, I looked at this, and it looked to me like the data was reading the entirety of the dataset every single time -- which definitely does not make sense! So I dug in a bit more and I think that it's a bug, which I took a shot at in #2391. But, it did not speed it up as much as I'd like; there are two additional sets of parameters that might help. The first is to set index_order to a tuple of values (the default is 5, 7) that better divides up the domain when you instantiate ds. The other is to check the CHUNKSIZE value in particle_geometry_handler.py.

I am looking at other possible speedups as well.

zhafen · 2019-12-13T22:13:48Z

Thanks @matthewturk ! PR #2391 provided a ~7.5x speedup.

I played around with index_order but didn't find any choices that provided a further speed-up. Where's the documentation for index_order? I unfortunately wasn't able to find it.

Playing with CHUNKSIZE and decreasing it to 32**3 can result in a speedup of up to ~4x. However the speedup depends on what other actions have been done with yt, when the CHUNKSIZE variable is modified, and the .ewah file. In addition, it can cause a slowdown for other actions, e.g. centering the data.

Between PR #2391 and modifying CHUNKSIZE, YTRays can take ~2 seconds to retrieve data and Trident rays can take ~20 seconds, which is excellent! Thanks so much!

zhafen · 2019-12-17T21:24:50Z

Hi @matthewturk, the tests I applied did indeed indicate a huge speed-up for the sightline I used. Unfortunately it appears that for other sightlines PR #2391 only gives a ~2x speed-up (e.g. this is the case for sightlines that pass through the center of the zoom-in region).

matthewturk · 2019-12-18T14:23:38Z

Any chance you could share a starting and ending point that show superduper slowness?

…

On Tue, Dec 17, 2019 at 3:24 PM Zachary Hafen ***@***.***> wrote: Hi @matthewturk <https://github.com/matthewturk>, the tests I applied did indeed give a huge speed-up for the sightline I used. Unfortunately it appears that for other sightlines PR #2391 <#2391> only gives a ~2x speed-up (e.g. this is the case for sightlines that pass through the center of the zoom-in region). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2383?email_source=notifications&email_token=AAAVXO2AJ77IG2JXI5PZKM3QZE7SHA5CNFSM4JX7B32KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHEACDQ#issuecomment-566755598>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAVXO3A535SSFZFHGWYTC3QZE7SHANCNFSM4JX7B32A> .

zhafen · 2019-12-18T21:52:44Z

I'm finding that under the default settings in the uploaded example notebook (which sends a ray straight through the center of a halo) generating a Trident ray can take ~4-6 minutes depending on CHUNKSIZE.

YTRay itself only takes 10-15 seconds to retrieve data for a given field. However it repeats this process ~20 times to get all the data needed for a Trident ray, thus Trident takes 200-300 seconds.

brittonsmith · 2020-01-04T01:29:06Z

Could this be because the ray you're making extends outside of the domain boundaries? Ray objects in yt do not work with periodic boundary conditions, so Trident gets around this by making a series of non-periodic rays and appending their data. Each separate ray object will do those data queries all over again. If this is the case, then perhaps the ray selection can be rewritten to take periodicity into account and do the selection for all sub-segments at once.

zhafen · 2020-01-05T14:56:19Z

The length of the above ray is of order the halo's virial radius, so extending beyond the domain boundaries should not be an issue. I did experience the expected slowness you're describing when I tested a periodic ray previously.

matthewturk · 2020-01-06T18:04:43Z

I'm going through the profiling results, and what I'm seeing is two specific things that are taking lots of time:

count_particles_chunks which is used to pre-allocate arrays
dataset.py:__getitem__ which is in h5py, and used to read data

I believe the problem is the first -- the second is a side-effect, and just showing that we are reading data too many times. In fact, it seems we are reading data ... way, way too many times. (I get 838 calls to __getitem__ in h5py.)

Many of the decisions about how to read data came from the idea that we should minimize over-allocation, when in reality that is not the problem if we have subchunking. I am going to spend a little bit of time looking over how we could potentially eliminate that decision.

matthewturk · 2020-01-06T18:24:04Z

Actually, I just issued #2410 that speeds it up by a fair bit on mine.

matthewturk · 2020-01-06T18:46:14Z

Looks like that wasn't quite it, although I do have an idea about how to apply the ideas more broadly. This did get me a 2x speedup and I will issue it in another PR (and likely retract #2410) , though, and if you could check it that would be great.

diff --git a/yt/frontends/gadget/io.py b/yt/frontends/gadget/io.py
index 7c03f5cf0..336935f13 100644                          
--- a/yt/frontends/gadget/io.py                             
+++ b/yt/frontends/gadget/io.py                    
@@ -46,12 +46,13 @@ class IOHandlerGadgetHDF5(IOHandlerSPH):    
             for ptype, field_list in sorted(ptf.items()):
                 if data_file.total_particles[ptype] == 0:
                     continue
-                x = f["/%s/Coordinates" % ptype][si:ei, 0].astype("float64")
-                y = f["/%s/Coordinates" % ptype][si:ei, 1].astype("float64")
-                z = f["/%s/Coordinates" % ptype][si:ei, 2].astype("float64")
+                c = f["/%s/Coordinates" % ptype][si:ei, :].astype("float64")
+                x = c[:, 0]
+                y = c[:, 1]
+                z = c[:, 2]
                 if ptype == self.ds._sph_ptypes[0]:
-                    pdtype = f["/%s/Coordinates" % ptype].dtype
-                    pshape = f["/%s/Coordinates" % ptype].shape
+                    pdtype = c.dtype
+                    pshape = c.shape
                     hsml = self._get_smoothing_length(data_file, pdtype, pshape)
                 else:
                     hsml = 0.0

zhafen · 2020-01-06T19:35:02Z

Looks like a nice, simple change to reduce the amount of times the data is accessed. I can confirm that I got a 2x speed-up, and that the results are consistent before and after the change.

matthewturk · 2020-01-07T19:56:16Z

I implemented a simple version of this, where we don't do as much (if any? I think I may have missed something somewhere, as this is just a first pass) pre-allocation. I get more speedups. Can you try it out? There may be errors in it, especially with vector fields, so it's not ready for a PR yet.

diff --git a/yt/utilities/io_handler.py b/yt/utilities/io_handler.py
index c79b5cf4b..d919dfd44 100644
--- a/yt/utilities/io_handler.py
+++ b/yt/utilities/io_handler.py
@@ -181,16 +181,16 @@ class BaseIOHandler(metaclass = RegisteredIOHandler):
 
         # psize maps the names of particle types to the number of
         # particles of each type
-        self._count_particles_chunks(psize, chunks, ptf, selector)
+        #self._count_particles_chunks(psize, chunks, ptf, selector)
 
         # Now we allocate
-        for field in fields:
+        for field in []:#fields:
             if field[0] in unions:
                 for pt in unions[field[0]]:
                     fsize[field] += psize.get(pt, 0)
             else:
                 fsize[field] += psize.get(field[0], 0)
-        for field in fields:
+        for field in []:#fields:
             if field[1] in self._vector_fields:
                 shape = (fsize[field], self._vector_fields[field[1]])
             elif field[1] in self._array_fields:
@@ -199,6 +199,8 @@ class BaseIOHandler(metaclass = RegisteredIOHandler):
                 shape = (fsize[field], )
             rv[field] = np.empty(shape, dtype="float64")
             ind[field] = 0
+        rv[field] = []
+        ind[field] = 0
         # Now we read.
         for field_r, vals in self._read_particle_fields(chunks, ptf, selector):
             # Note that we now need to check the mappings
@@ -206,12 +208,17 @@ class BaseIOHandler(metaclass = RegisteredIOHandler):
                 my_ind = ind[field_f]
                 #mylog.debug("Filling %s from %s to %s with %s",
                 #    field_f, my_ind, my_ind+vals.shape[0], field_r)
-                rv[field_f][my_ind:my_ind + vals.shape[0],...] = vals
+                #rv[field_f][my_ind:my_ind + vals.shape[0],...] = vals
+                rv[field_f].append(vals)
                 ind[field_f] += vals.shape[0]
         # Now we need to truncate all our fields, since we allow for
         # over-estimating.
         for field_f in ind:
-            rv[field_f] = rv[field_f][:ind[field_f]]
+            #rv[field_f] = rv[field_f][:ind[field_f]]
+            if ind[field_f] == 0:
+                rv[field_f] = np.empty(0, dtype="float64")
+            else:
+                rv[field_f] = np.concatenate(rv[field_f], axis=0)
         return rv
 
 class IOHandlerExtracted(BaseIOHandler):

zhafen · 2020-01-07T20:39:41Z

Nice! This produced another solid speed-up. With this change and the previous one combined I'm seeing speed-ups of ~2.5-4x.

I did find a minor bug, with the following suggested fix:

@@ -199,6 +199,8 @@ class BaseIOHandler(metaclass = RegisteredIOHandler):
                 shape = (fsize[field], )
             rv[field] = np.empty(shape, dtype="float64")
             ind[field] = 0
-        rv[field] = []
-        ind[field] = 0
+        for field in fields:
+          rv[field] = []
+          ind[field] = 0
         # Now we read.
         for field_r, vals in self._read_particle_fields(chunks, ptf, selector):
             # Note that we now need to check the mappings

The pre-allocation removal changes make sense because for most simulation datasets reading the data will take longer than even typically-slow functions like np.concatenate. That said, are there datasets where pre-allocation would still be important? Maybe datasets that have a large amount of data divided into only a few categories?

matthewturk · 2020-01-07T20:45:46Z

Oh, thanks for the fix!

Yes, I think there are likely cases where pre-allocation could help -- specifically, those cases where you are accessing a large collection of data like a dictionary (i.e., a big sim, then doing ds.r[:]["something"]), but I think most operations will now not necessarily need that. One thing we could do, I suppose, is to add a heuristic and switch behavior, but that would be tunable and I'm not sure how productive it would be.

I'm looking into other potential issues with this before proceeding on a PR or proposal, but I am really glad that it too provided a good speedup.

chrishavlin · 2022-06-02T21:01:20Z

With #2416 merged, this issue may be ready to close. I think it'd be good to re-run the initial profiling just to verify that the version on main does improve performance.

zhafen · 2022-06-02T21:52:05Z

Thanks so much for your work on this! I'm very excited for the related PR. I'll try re-running the initial profiling.

zhafen · 2022-06-02T22:13:36Z

Profiling just finished. With #2416 merged this is a 3x speed-up! Thank you @chrishavlin, @neutrinoceros, @matthewturk, @jzuhone!

chrishavlin · 2022-06-02T22:26:27Z

awesome! thanks for re-running!!

triage-new-issues bot added the triage Triage needed label Dec 8, 2019

matthewturk mentioned this issue Dec 13, 2019

Switch logic for using all-data in particle IO #2391

Merged

matthewturk added demeshening Removing the global mesh for particles performance labels Jan 6, 2020

matthewturk self-assigned this Jan 6, 2020

triage-new-issues bot removed the triage Triage needed label Jan 6, 2020

matthewturk mentioned this issue Jan 6, 2020

Fix a few more is_all_data logic switches #2410

Closed

matthewturk mentioned this issue Jan 7, 2020

Gadget io fewer calls to reading #2411

Merged

This was referenced Jan 8, 2020

Evaluate benefits of pre-allocation of arrays #2412

Closed

ENH: Stop preallocating particles #2416

Merged

zhafen closed this as completed Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow data retrieval for some YTRay objects #2383

Slow data retrieval for some YTRay objects #2383

zhafen commented Dec 8, 2019

welcome bot commented Dec 8, 2019

zhafen commented Dec 8, 2019

matthewturk commented Dec 13, 2019

zhafen commented Dec 13, 2019 •

edited

Loading

zhafen commented Dec 17, 2019 •

edited

Loading

matthewturk commented Dec 18, 2019 via email

zhafen commented Dec 18, 2019

brittonsmith commented Jan 4, 2020

zhafen commented Jan 5, 2020

matthewturk commented Jan 6, 2020

matthewturk commented Jan 6, 2020

matthewturk commented Jan 6, 2020

zhafen commented Jan 6, 2020

matthewturk commented Jan 7, 2020

zhafen commented Jan 7, 2020

matthewturk commented Jan 7, 2020

chrishavlin commented Jun 2, 2022

zhafen commented Jun 2, 2022 •

edited

Loading

zhafen commented Jun 2, 2022

chrishavlin commented Jun 2, 2022

Slow data retrieval for some YTRay objects #2383

Slow data retrieval for some YTRay objects #2383

Comments

zhafen commented Dec 8, 2019

Bug report

welcome bot commented Dec 8, 2019

zhafen commented Dec 8, 2019

matthewturk commented Dec 13, 2019

zhafen commented Dec 13, 2019 • edited Loading

zhafen commented Dec 17, 2019 • edited Loading

matthewturk commented Dec 18, 2019 via email

zhafen commented Dec 18, 2019

brittonsmith commented Jan 4, 2020

zhafen commented Jan 5, 2020

matthewturk commented Jan 6, 2020

matthewturk commented Jan 6, 2020

matthewturk commented Jan 6, 2020

zhafen commented Jan 6, 2020

matthewturk commented Jan 7, 2020

zhafen commented Jan 7, 2020

matthewturk commented Jan 7, 2020

chrishavlin commented Jun 2, 2022

zhafen commented Jun 2, 2022 • edited Loading

zhafen commented Jun 2, 2022

chrishavlin commented Jun 2, 2022

zhafen commented Dec 13, 2019 •

edited

Loading

zhafen commented Dec 17, 2019 •

edited

Loading

zhafen commented Jun 2, 2022 •

edited

Loading