-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: reading speed optimisation for AMRVAC frontend #3508
Conversation
switching to draft for now. I've realised that the reshaping doesn't work in the general case if blocks don't have nx=ny(=nz). |
3d52ad1
to
846e250
Compare
should be ok now (results are still consistent). One can convince themselves that the reshaping is equivalent to the existing version: import numpy as np
# simulate on disk data as a 1D array
SHAPE = (5, 6, 7)
data = np.arange(np.product(SHAPE))
# reference method
a = np.reshape(data, SHAPE, order="F")
# proposed refactor
b = data.copy()
b.shape = SHAPE[::-1]
# actual check
np.testing.assert_array_equal(b.T, a) note that |
# Fortran ordering | ||
block_field_data = np.reshape(d, field_shape, order="F") | ||
return block_field_data | ||
istream.seek(byte_offset + byte_size_field * field_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reducing the number of seeks is really helpful. It might be possible to sort the calls to get_single_block_of_data
by offset
and field_idx
(even sorting by that tuple should work) to make sure we're reading them in the right order, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like a good idea, I'll try that !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was able to sort by field but it may not have a huge impact because of how AMRVAC dump files represent data (fields are on the inner loop of the writing routine, individual fields are never contiguous). I've tried sorting grids too and got no significant gain. The dataset I'm using to benchmark this also happens to use a single "data chunk", so there's no way to measure if sorting chunks would help. In conclusion I don't think there's anything I can do here.
block_field_data = np.reshape(d, field_shape, order="F") | ||
return block_field_data | ||
istream.seek(byte_offset + byte_size_field * field_idx) | ||
data = np.fromfile(istream, "=f8", count=np.prod(field_shape)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want =
here? Any chance endianness will be different on generating/accepting systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module was adapted from a python script that was meant to be manually edited by users, so the endianness is actually hardcoded globally as ALIGN = "="
(see the top lines of this module), and apparently no one ever complained about that since the frontend was released 2 yrs ago, but I agree that's there's a chance of failure here. If you have any advice on how to make this more robust I'll gladly hear them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.. ALIGN = "="
has been hardcoded in literally every script that read in amrvac datfile data over the last few years and nobody ever complained about it, so I don't foresee any issues with this :)
return block_field_data | ||
istream.seek(byte_offset + byte_size_field * field_idx) | ||
data = np.fromfile(istream, "=f8", count=np.prod(field_shape)) | ||
data.shape = field_shape[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One trick we've used for hdf5 files has been to create a destination array and read directly into that. To be honest this did make me stop and think what I was seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not super happy with how this looks. [::-1]
may be a common(ish) idiom but it doesn't feel right to me.
I don't think numpy.fromfile
has a "dest" (or similar) argument, where should I look for the technique you're mentioning ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also couldn't find a good one. All I could come up with was doing some kind of destination_buffer[;] = np.fromfile
but that's not really very helpful. Let me keep looking. I think one possibility would be to do a view
on the results of np.fromfile
.
switching to draft again, I want to try out Matt's ideas, but won't be able to for a couple days. |
…e operations and sort fields by byte offset order
Now at
|
the days just fly by |
figured out some meetings are perfect to sit and run quiet runs for 10mins at a time 🙈 |
…(reduce call stack) and avoid generating potentially large strings just to compute a byte count with struct
I did everything I could think of to improve perfs here. I only kept changes that visibly increased speed for my benchmark. |
…ffering no measurable perf gain
@matthewturk you be the judge. Anything more I can do with this PR ? |
PR Summary
Performance improvment for the AMRVAC frontend. Unpack data straight from dist to a numpy arrays
instead of unpacking as tuples and then converting to arrays.
I ran the following benchmark to test this (using https://yt-project.org/data/solarprom2d.tar.gz)
Results
main
this branch
So the estimated gain for this particular example is about 8% with a 450MiB dataset. Not game-changing, but still worth a
+4/-9
patch :-)