Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[yt-4.0] optimize accessing data through ds.all_data() for SPH #2146

Merged
merged 1 commit into from
Feb 22, 2019

Conversation

ngoldbaum
Copy link
Member

This PR improves the performance of scripts like this one:

import yt
import h5py
import time
from pyinstrument import Profiler

ds = yt.load('GadgetDiskGalaxy/snapshot_200.hdf5')
ds.index
data = ds.all_data()

start = time.time()
data[('PartType0', 'Coordinates')]
print(time.time() - start)

Before this PR, on my mac laptop with an SSD, this script prints 1.6224758625030518. After it prints 0.17824316024780273, so about a 10x improvement by skipping a lot of unnecessary calls to the selection machinery. We're still not as fast as calling h5py directly because the chunking system needs to first load in the data and then copy it into a result array, so there are some extra copies and array allocations that don't happen for the pure h5py case.

This optimization was requested by @saethlin on slack a few days ago. It would be great if we could get some testing on this from users of yt-4.0 besides Ben, perhaps @qobilidop or @chummels? I would also appreciate review by @matthewturk, particularly my change to the RegionSelector. It might also be nice to refactor things to avoid some of the copy/paste across the frontends.

@chummels
Copy link
Member

chummels commented Feb 7, 2019

Running this on a compute node on Stampede2 using the z=0 Latte resolution FIRE2 run (m12i) yields these results:

yt-4.0 tip: 135.2613799571991
This PR: 20.31523370742798

@qobilidop
Copy link
Member

This is a very nice improvement. Thanks for your work @ngoldbaum!

Running on my laptop using the same snapshot as @chummels:

Version Time
Before 30.579400062561035
After 4.861728191375732
Pure h5py 4.127467155456543

The pure h5py time is measured like this (for a multi-part snapshot):

snap = [h5py.File(snapdir + f'/snapshot_600.{i}.hdf5') for i in range(4)]

start = time.time()
for i in range(4):
    snap[i]['/PartType0/Coordinates'].value
print(time.time() - start)

It looks like the chunking overhead is small.

I'm also interested in the indexing time, which might be unrelated to this PR. Currently it takes 97s = 26s (coarse) + 71s (refined). Could this time be reduced, through caching for example? (I feel this has been discussed somewhere. Sorry if it's an utter repetition).

@ngoldbaum
Copy link
Member Author

The index should get stored in a sidecar file. I noticed today that loading the index is broken because of hash mismatches, which I’m planning to look at tomorrow.

@chummels
Copy link
Member

chummels commented Feb 8, 2019

Thanks for doing this--any optimizations on yt-4.0 are definitely beneficial for big datasets!

@ngoldbaum ngoldbaum merged commit 14ec989 into yt-project:yt-4.0 Feb 22, 2019
saethlin added a commit to saethlin/yt that referenced this pull request Jun 11, 2019
This halves the runtime of scripts like in yt-project#2146 for MassiveFIRE
snapshots when run on HiPerGator's lustre filesystem.
On such snapshots yt used to redundantly scan the backing HDF5
files 243 times for what Datasets they contain.
saethlin added a commit to saethlin/yt that referenced this pull request Jun 11, 2019
This halves the runtime of scripts like in yt-project#2146 for MassiveFIRE
snapshots when run on HiPerGator's lustre filesystem.
On such snapshots yt used to redundantly scan the backing HDF5
files 243 times for what Datasets they contain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants