[yt-4.0] optimize accessing data through ds.all_data() for SPH #2146

ngoldbaum · 2019-02-07T22:08:15Z

This PR improves the performance of scripts like this one:

import yt
import h5py
import time
from pyinstrument import Profiler

ds = yt.load('GadgetDiskGalaxy/snapshot_200.hdf5')
ds.index
data = ds.all_data()

start = time.time()
data[('PartType0', 'Coordinates')]
print(time.time() - start)

Before this PR, on my mac laptop with an SSD, this script prints 1.6224758625030518. After it prints 0.17824316024780273, so about a 10x improvement by skipping a lot of unnecessary calls to the selection machinery. We're still not as fast as calling h5py directly because the chunking system needs to first load in the data and then copy it into a result array, so there are some extra copies and array allocations that don't happen for the pure h5py case.

This optimization was requested by @saethlin on slack a few days ago. It would be great if we could get some testing on this from users of yt-4.0 besides Ben, perhaps @qobilidop or @chummels? I would also appreciate review by @matthewturk, particularly my change to the RegionSelector. It might also be nice to refactor things to avoid some of the copy/paste across the frontends.

chummels · 2019-02-07T23:50:05Z

Running this on a compute node on Stampede2 using the z=0 Latte resolution FIRE2 run (m12i) yields these results:

yt-4.0 tip: 135.2613799571991
This PR: 20.31523370742798

qobilidop · 2019-02-08T03:59:51Z

This is a very nice improvement. Thanks for your work @ngoldbaum!

Running on my laptop using the same snapshot as @chummels:

Version	Time
Before	30.579400062561035
After	4.861728191375732
Pure h5py	4.127467155456543

The pure h5py time is measured like this (for a multi-part snapshot):

snap = [h5py.File(snapdir + f'/snapshot_600.{i}.hdf5') for i in range(4)]

start = time.time()
for i in range(4):
    snap[i]['/PartType0/Coordinates'].value
print(time.time() - start)

It looks like the chunking overhead is small.

I'm also interested in the indexing time, which might be unrelated to this PR. Currently it takes 97s = 26s (coarse) + 71s (refined). Could this time be reduced, through caching for example? (I feel this has been discussed somewhere. Sorry if it's an utter repetition).

ngoldbaum · 2019-02-08T04:12:55Z

The index should get stored in a sidecar file. I noticed today that loading the index is broken because of hash mismatches, which I’m planning to look at tomorrow.

chummels · 2019-02-08T18:59:50Z

Thanks for doing this--any optimizations on yt-4.0 are definitely beneficial for big datasets!

This halves the runtime of scripts like in yt-project#2146 for MassiveFIRE snapshots when run on HiPerGator's lustre filesystem. On such snapshots yt used to redundantly scan the backing HDF5 files 243 times for what Datasets they contain.

optimize accessing data through ds.all_data() for SPH

0548506

ngoldbaum merged commit 14ec989 into yt-project:yt-4.0 Feb 22, 2019

ngoldbaum mentioned this pull request Mar 6, 2019

fix tipsy IO via ds.all_data() #2193

Merged

saethlin mentioned this pull request Jun 11, 2019

Cache the result of _identify_fields to improve dataset initialization speed #2271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[yt-4.0] optimize accessing data through ds.all_data() for SPH #2146

[yt-4.0] optimize accessing data through ds.all_data() for SPH #2146

ngoldbaum commented Feb 7, 2019

chummels commented Feb 7, 2019

qobilidop commented Feb 8, 2019

ngoldbaum commented Feb 8, 2019

chummels commented Feb 8, 2019

[yt-4.0] optimize accessing data through ds.all_data() for SPH #2146

[yt-4.0] optimize accessing data through ds.all_data() for SPH #2146

Conversation

ngoldbaum commented Feb 7, 2019

chummels commented Feb 7, 2019

qobilidop commented Feb 8, 2019

ngoldbaum commented Feb 8, 2019

chummels commented Feb 8, 2019