Refactor the stream API #364

sandorkertesz · 2024-04-16T09:42:30Z

Is your feature request related to a problem? Please describe.

The proposal is to change the usage of streams in the following way.

When stream=True the returned object would be a Fieldlist (for GRIB data):

ds = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds:
     # f is now a Field

# at this point ds consumed the stream

Iterating in batches would be a generic option (not only stream specific):

ds1 = from_source("file", "my_local_data.grib")
ds2 = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds1.batched(2):
     # f is now a Fieldlist with 2 Fields

for f in ds2.batched(2):
     # f is now a Fieldlist with 2 Fields

group_by would behave in a similar way.

ds1 = from_source("file", "my_local_data.grib")
ds2 = from_source("url", "http://..../my_data.grib", stream=True)

for f in ds1.group_by("level"):
     # f is now a Fieldlist

for f in ds2.group_by("level"):
     # f is now a Fieldlist

Please note that using group_by for non-stream data will be based on the metadata from the full dataset. However, for the stream it would be simply built by consuming GRIB messages from the stream until the values of the metadata keys specified in group_by change.

We could read the whole stream into memory with the read_all option:

ds = from_source("url", "http://..../my_data.grib", stream=True, read_all=True)

# ds is now a Fieldlist in memory, so all these work
len(ds)
r = ds.sel(param="t")

for f in ds:
     # f is now a Field

for f in ds.batched(2):
     # f is now a Fieldlist with 2 Fields

The text was updated successfully, but these errors were encountered:

sandorkertesz added the enhancement New feature or request label Apr 16, 2024

sandorkertesz self-assigned this Apr 18, 2024

sandorkertesz mentioned this issue Apr 22, 2024

Feature/refactor streaming #371

Merged

sandorkertesz closed this as completed in #371 May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the stream API #364

Refactor the stream API #364

sandorkertesz commented Apr 16, 2024 •

edited

Loading

Refactor the stream API #364

Refactor the stream API #364

Comments

sandorkertesz commented Apr 16, 2024 • edited Loading

Is your feature request related to a problem? Please describe.

sandorkertesz commented Apr 16, 2024 •

edited

Loading