Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve order_by speed for GRIB data #261

Open
sandorkertesz opened this issue Nov 16, 2023 · 0 comments
Open

Improve order_by speed for GRIB data #261

sandorkertesz opened this issue Nov 16, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@sandorkertesz
Copy link
Collaborator

sandorkertesz commented Nov 16, 2023

The speed of order_by() is primarily a problem for "file" sources and it is down to the fact that for each field metadata access call in the sorting algorithm the GRIB message has to be loaded and encoded from the GRIB file over and over again.

E.g. this code runs in 10.43 s

import earthkit.data
ds = earthkit.data.from_source("file", "docs/examples/tuv_pl.grib")
for _ in range(200):
    ds.order_by("shortName")  

If we store all the messages in memory the running time goes down significantly to 4.65 s

import earthkit.data
ds = earthkit.data.from_source("file", "docs/examples/tuv_pl.grib")
x = ds.to_fieldlist("numpy")
for _ in range(200):
    x.order_by("shortName")  
@sandorkertesz sandorkertesz added the enhancement New feature or request label Nov 16, 2023
@sandorkertesz sandorkertesz self-assigned this Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant