ENH: reading speed optimisation for AMRVAC frontend #3508

neutrinoceros · 2021-09-13T15:05:55Z

PR Summary

Performance improvment for the AMRVAC frontend. Unpack data straight from dist to a numpy arrays
instead of unpacking as tuples and then converting to arrays.

I ran the following benchmark to test this (using https://yt-project.org/data/solarprom2d.tar.gz)

# benchmark.py
import yt
import time
import sys

def main(n:int=100) -> int:
    yt.set_log_level("error")
    start = time.monotonic()

    for _ in range(n):
        ds = yt.load("solar_prom2d/output0001.dat")
        pp = yt.ProjectionPlot(ds, "z", ("gas", "density"))
    stop = time.monotonic()
    delta = stop - start
    print(f"Ran {n} loops in {delta:.1f} s")
    return 0

if __name__ == "__main__":
    sys.exit(main())

Results

`main`

Ran 100 loops in 483.0 s

this branch

Ran 100 loops in 444.4 s

So the estimated gain for this particular example is about 8% with a 450MiB dataset. Not game-changing, but still worth a +4/-9 patch :-)

neutrinoceros · 2021-09-13T15:08:51Z

I've checked that the produced plot is identical after the change, here's what it looks like

neutrinoceros · 2021-09-13T16:24:33Z

switching to draft for now. I've realised that the reshaping doesn't work in the general case if blocks don't have nx=ny(=nz).

neutrinoceros · 2021-09-13T17:34:13Z

should be ok now (results are still consistent). One can convince themselves that the reshaping is equivalent to the existing version:

import numpy as np

# simulate on disk data as a 1D array
SHAPE = (5, 6, 7)
data = np.arange(np.product(SHAPE))

# reference method
a = np.reshape(data, SHAPE, order="F") 

# proposed refactor
b = data.copy()
b.shape = SHAPE[::-1]

# actual check
np.testing.assert_array_equal(b.T, a)

note that a and b.T are both F-contiguous, which is likely suboptimal (maybe neutral) when the data is processed downstream. Optimizing for this would certainly impact reading speed, so there would be a tradeoff to discuss, but maybe there's a clear answer to that ? in any case it's out of scope for this PR and I'm just doing the exact same thing, just faster, by leveraging numpy better.

matthewturk · 2021-09-13T18:34:53Z

yt/frontends/amrvac/datfile_utils.py

- # Fortran ordering
- block_field_data = np.reshape(d, field_shape, order="F")
- return block_field_data
+ istream.seek(byte_offset + byte_size_field * field_idx)


Reducing the number of seeks is really helpful. It might be possible to sort the calls to get_single_block_of_data by offset and field_idx (even sorting by that tuple should work) to make sure we're reading them in the right order, too.

sounds like a good idea, I'll try that !

So I was able to sort by field but it may not have a huge impact because of how AMRVAC dump files represent data (fields are on the inner loop of the writing routine, individual fields are never contiguous). I've tried sorting grids too and got no significant gain. The dataset I'm using to benchmark this also happens to use a single "data chunk", so there's no way to measure if sorting chunks would help. In conclusion I don't think there's anything I can do here.

matthewturk · 2021-09-13T18:35:25Z

yt/frontends/amrvac/datfile_utils.py

- block_field_data = np.reshape(d, field_shape, order="F")
- return block_field_data
+ istream.seek(byte_offset + byte_size_field * field_idx)
+ data = np.fromfile(istream, "=f8", count=np.prod(field_shape))


Are you sure you want = here? Any chance endianness will be different on generating/accepting systems?

This module was adapted from a python script that was meant to be manually edited by users, so the endianness is actually hardcoded globally as ALIGN = "=" (see the top lines of this module), and apparently no one ever complained about that since the frontend was released 2 yrs ago, but I agree that's there's a chance of failure here. If you have any advice on how to make this more robust I'll gladly hear them.

Good point.. ALIGN = "=" has been hardcoded in literally every script that read in amrvac datfile data over the last few years and nobody ever complained about it, so I don't foresee any issues with this :)

matthewturk · 2021-09-13T18:37:17Z

yt/frontends/amrvac/datfile_utils.py

- return block_field_data
+ istream.seek(byte_offset + byte_size_field * field_idx)
+ data = np.fromfile(istream, "=f8", count=np.prod(field_shape))
+ data.shape = field_shape[::-1]


One trick we've used for hdf5 files has been to create a destination array and read directly into that. To be honest this did make me stop and think what I was seeing.

I'm also not super happy with how this looks. [::-1] may be a common(ish) idiom but it doesn't feel right to me.
I don't think numpy.fromfile has a "dest" (or similar) argument, where should I look for the technique you're mentioning ?

I also couldn't find a good one. All I could come up with was doing some kind of destination_buffer[;] = np.fromfile but that's not really very helpful. Let me keep looking. I think one possibility would be to do a view on the results of np.fromfile.

neutrinoceros · 2021-09-13T21:18:38Z

switching to draft again, I want to try out Matt's ideas, but won't be able to for a couple days.

…e operations and sort fields by byte offset order

neutrinoceros · 2021-09-14T12:48:16Z

Now at

Ran 100 loops in 429.6 s

matthewturk · 2021-09-14T12:50:25Z

switching to draft again, I want to try out Matt's ideas, but won't be able to for a couple days.

the days just fly by

neutrinoceros · 2021-09-14T13:10:45Z

the days just fly by

figured out some meetings are perfect to sit and run quiet runs for 10mins at a time 🙈

…(reduce call stack) and avoid generating potentially large strings just to compute a byte count with struct

neutrinoceros · 2021-09-14T21:04:01Z

I did everything I could think of to improve perfs here. I only kept changes that visibly increased speed for my benchmark.
The failing tests are due to an instability upstream (windows, conda), so I'll try reruning jobs tomorrow and hope for the best then.

…ffering no measurable perf gain

neutrinoceros · 2021-09-16T10:56:39Z

@matthewturk you be the judge. Anything more I can do with this PR ?

neutrinoceros added enhancement Making something better code frontends Things related to specific frontends labels Sep 13, 2021

neutrinoceros requested a review from n-claes September 13, 2021 15:07

neutrinoceros requested a review from Xarthisius September 13, 2021 16:17

neutrinoceros marked this pull request as draft September 13, 2021 16:23

ENH: optimize reading speed for AMRVAC

846e250

neutrinoceros force-pushed the ioperf_amrvac branch from 3d52ad1 to 846e250 Compare September 13, 2021 17:27

neutrinoceros marked this pull request as ready for review September 13, 2021 18:35

matthewturk reviewed Sep 13, 2021

View reviewed changes

neutrinoceros marked this pull request as draft September 13, 2021 21:18

ENH: optimize reading speed for AMRVAC (2/n): minimize file open/clos…

8feeca0

…e operations and sort fields by byte offset order

ENH: optimize reading speed for AMRVAC (3/n): inline a helper method …

ae63d0e

…(reduce call stack) and avoid generating potentially large strings just to compute a byte count with struct

neutrinoceros added the performance label Sep 14, 2021

ENH: optimize reading speed for AMRVAC (4/4): cleanup a sorted call o…

d202029

…ffering no measurable perf gain

neutrinoceros marked this pull request as ready for review September 15, 2021 12:40

Xarthisius approved these changes Sep 15, 2021

View reviewed changes

n-claes approved these changes Sep 15, 2021

View reviewed changes

matthewturk merged commit b016e8e into yt-project:main Sep 16, 2021

neutrinoceros deleted the ioperf_amrvac branch September 16, 2021 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: reading speed optimisation for AMRVAC frontend #3508

ENH: reading speed optimisation for AMRVAC frontend #3508

neutrinoceros commented Sep 13, 2021

neutrinoceros commented Sep 13, 2021

neutrinoceros commented Sep 13, 2021

neutrinoceros commented Sep 13, 2021 •

edited

Loading

matthewturk Sep 13, 2021

neutrinoceros Sep 13, 2021 •

edited

Loading

neutrinoceros Sep 14, 2021

matthewturk Sep 13, 2021

neutrinoceros Sep 13, 2021

n-claes Sep 15, 2021

matthewturk Sep 13, 2021

neutrinoceros Sep 13, 2021

matthewturk Sep 13, 2021

neutrinoceros commented Sep 13, 2021

neutrinoceros commented Sep 14, 2021

matthewturk commented Sep 14, 2021

neutrinoceros commented Sep 14, 2021

neutrinoceros commented Sep 14, 2021

neutrinoceros commented Sep 16, 2021

ENH: reading speed optimisation for AMRVAC frontend #3508

ENH: reading speed optimisation for AMRVAC frontend #3508

Conversation

neutrinoceros commented Sep 13, 2021

PR Summary

Results

main

this branch

neutrinoceros commented Sep 13, 2021

neutrinoceros commented Sep 13, 2021

neutrinoceros commented Sep 13, 2021 • edited Loading

Choose a reason for hiding this comment

neutrinoceros Sep 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neutrinoceros commented Sep 13, 2021

neutrinoceros commented Sep 14, 2021

matthewturk commented Sep 14, 2021

neutrinoceros commented Sep 14, 2021

neutrinoceros commented Sep 14, 2021

neutrinoceros commented Sep 16, 2021

`main`

neutrinoceros commented Sep 13, 2021 •

edited

Loading

neutrinoceros Sep 13, 2021 •

edited

Loading