Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the stream API #364

Closed
sandorkertesz opened this issue Apr 16, 2024 · 0 comments · Fixed by #371
Closed

Refactor the stream API #364

sandorkertesz opened this issue Apr 16, 2024 · 0 comments · Fixed by #371
Assignees
Labels
enhancement New feature or request

Comments

@sandorkertesz
Copy link
Collaborator

sandorkertesz commented Apr 16, 2024

Is your feature request related to a problem? Please describe.

The proposal is to change the usage of streams in the following way.

When stream=True the returned object would be a Fieldlist (for GRIB data):

ds = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds:
     # f is now a Field

# at this point ds consumed the stream

Iterating in batches would be a generic option (not only stream specific):

ds1 = from_source("file", "my_local_data.grib")
ds2 = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds1.batched(2):
     # f is now a Fieldlist with 2 Fields

for f in ds2.batched(2):
     # f is now a Fieldlist with 2 Fields

group_by would behave in a similar way.

ds1 = from_source("file", "my_local_data.grib")
ds2 = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds1.group_by("level"):
     # f is now a Fieldlist

for f in ds2.group_by("level"):
     # f is now a Fieldlist

Please note that using group_by for non-stream data will be based on the metadata from the full dataset. However, for the stream it would be simply built by consuming GRIB messages from the stream until the values of the metadata keys specified in group_by change.

We could read the whole stream into memory with the read_all option:

ds = from_source("url", "http://..../my_data.grib", stream=True, read_all=True)

# ds is now a Fieldlist in memory, so all these work
len(ds)
r = ds.sel(param="t")

for f in ds:
     # f is now a Field

for f in ds.batched(2):
     # f is now a Fieldlist with 2 Fields
@sandorkertesz sandorkertesz added the enhancement New feature or request label Apr 16, 2024
@sandorkertesz sandorkertesz self-assigned this Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant