Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: reading speed optimisation for AMRVAC frontend #3508

Merged
merged 4 commits into from
Sep 16, 2021

Conversation

neutrinoceros
Copy link
Member

PR Summary

Performance improvment for the AMRVAC frontend. Unpack data straight from dist to a numpy arrays
instead of unpacking as tuples and then converting to arrays.

I ran the following benchmark to test this (using https://yt-project.org/data/solarprom2d.tar.gz)

# benchmark.py
import yt
import time
import sys

def main(n:int=100) -> int:
    yt.set_log_level("error")
    start = time.monotonic()

    for _ in range(n):
        ds = yt.load("solar_prom2d/output0001.dat")
        pp = yt.ProjectionPlot(ds, "z", ("gas", "density"))
    stop = time.monotonic()
    delta = stop - start
    print(f"Ran {n} loops in {delta:.1f} s")
    return 0

if __name__ == "__main__":
    sys.exit(main())

Results

main

Ran 100 loops in 483.0 s

this branch

Ran 100 loops in 444.4 s

So the estimated gain for this particular example is about 8% with a 450MiB dataset. Not game-changing, but still worth a +4/-9 patch :-)

@neutrinoceros neutrinoceros added enhancement Making something better code frontends Things related to specific frontends labels Sep 13, 2021
@neutrinoceros
Copy link
Member Author

I've checked that the produced plot is identical after the change, here's what it looks like
output0001 dat_Projection_z_density

@neutrinoceros neutrinoceros marked this pull request as draft September 13, 2021 16:23
@neutrinoceros
Copy link
Member Author

switching to draft for now. I've realised that the reshaping doesn't work in the general case if blocks don't have nx=ny(=nz).

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Sep 13, 2021

should be ok now (results are still consistent). One can convince themselves that the reshaping is equivalent to the existing version:

import numpy as np

# simulate on disk data as a 1D array
SHAPE = (5, 6, 7)
data = np.arange(np.product(SHAPE))

# reference method
a = np.reshape(data, SHAPE, order="F") 

# proposed refactor
b = data.copy()
b.shape = SHAPE[::-1]

# actual check
np.testing.assert_array_equal(b.T, a)

note that a and b.T are both F-contiguous, which is likely suboptimal (maybe neutral) when the data is processed downstream. Optimizing for this would certainly impact reading speed, so there would be a tradeoff to discuss, but maybe there's a clear answer to that ? in any case it's out of scope for this PR and I'm just doing the exact same thing, just faster, by leveraging numpy better.

@neutrinoceros neutrinoceros marked this pull request as ready for review September 13, 2021 18:35
# Fortran ordering
block_field_data = np.reshape(d, field_shape, order="F")
return block_field_data
istream.seek(byte_offset + byte_size_field * field_idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing the number of seeks is really helpful. It might be possible to sort the calls to get_single_block_of_data by offset and field_idx (even sorting by that tuple should work) to make sure we're reading them in the right order, too.

Copy link
Member Author

@neutrinoceros neutrinoceros Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good idea, I'll try that !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was able to sort by field but it may not have a huge impact because of how AMRVAC dump files represent data (fields are on the inner loop of the writing routine, individual fields are never contiguous). I've tried sorting grids too and got no significant gain. The dataset I'm using to benchmark this also happens to use a single "data chunk", so there's no way to measure if sorting chunks would help. In conclusion I don't think there's anything I can do here.

block_field_data = np.reshape(d, field_shape, order="F")
return block_field_data
istream.seek(byte_offset + byte_size_field * field_idx)
data = np.fromfile(istream, "=f8", count=np.prod(field_shape))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want = here? Any chance endianness will be different on generating/accepting systems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module was adapted from a python script that was meant to be manually edited by users, so the endianness is actually hardcoded globally as ALIGN = "=" (see the top lines of this module), and apparently no one ever complained about that since the frontend was released 2 yrs ago, but I agree that's there's a chance of failure here. If you have any advice on how to make this more robust I'll gladly hear them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.. ALIGN = "=" has been hardcoded in literally every script that read in amrvac datfile data over the last few years and nobody ever complained about it, so I don't foresee any issues with this :)

return block_field_data
istream.seek(byte_offset + byte_size_field * field_idx)
data = np.fromfile(istream, "=f8", count=np.prod(field_shape))
data.shape = field_shape[::-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One trick we've used for hdf5 files has been to create a destination array and read directly into that. To be honest this did make me stop and think what I was seeing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not super happy with how this looks. [::-1] may be a common(ish) idiom but it doesn't feel right to me.
I don't think numpy.fromfile has a "dest" (or similar) argument, where should I look for the technique you're mentioning ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also couldn't find a good one. All I could come up with was doing some kind of destination_buffer[;] = np.fromfile but that's not really very helpful. Let me keep looking. I think one possibility would be to do a view on the results of np.fromfile.

@neutrinoceros
Copy link
Member Author

switching to draft again, I want to try out Matt's ideas, but won't be able to for a couple days.

@neutrinoceros neutrinoceros marked this pull request as draft September 13, 2021 21:18
…e operations and sort fields by byte offset order
@neutrinoceros
Copy link
Member Author

Now at

Ran 100 loops in 429.6 s

@matthewturk
Copy link
Member

switching to draft again, I want to try out Matt's ideas, but won't be able to for a couple days.

the days just fly by

@neutrinoceros
Copy link
Member Author

the days just fly by

figured out some meetings are perfect to sit and run quiet runs for 10mins at a time 🙈

…(reduce call stack) and avoid generating potentially large strings just to compute a byte count with struct
@neutrinoceros
Copy link
Member Author

I did everything I could think of to improve perfs here. I only kept changes that visibly increased speed for my benchmark.
The failing tests are due to an instability upstream (windows, conda), so I'll try reruning jobs tomorrow and hope for the best then.

@neutrinoceros neutrinoceros marked this pull request as ready for review September 15, 2021 12:40
@neutrinoceros
Copy link
Member Author

@matthewturk you be the judge. Anything more I can do with this PR ?

@matthewturk matthewturk merged commit b016e8e into yt-project:main Sep 16, 2021
@neutrinoceros neutrinoceros deleted the ioperf_amrvac branch September 16, 2021 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code frontends Things related to specific frontends enhancement Making something better performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants