Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor array fieldlists #471

Merged
merged 15 commits into from
Oct 17, 2024
Merged

Conversation

sandorkertesz
Copy link
Collaborator

@sandorkertesz sandorkertesz commented Sep 22, 2024

This PR introduces the following changes:

Array fieldlists

(array fieldlist = fieldlist containing in-memory fields each with a values array and a metadata object)

  • Adds SimpleFieldList, which is a fieldlist containing a list of arbitrary fields. Part of it was already implemented under the name FieldArray. FieldArray was kept as an alias because anemoi-datasets uses it. SimpleFieldList is a mutable object because fields can be appended to it with append(). It is a top level object and can be directly imported from earthkit-data. Can be used like this:
from earthkit.data import SimpleFieldList

ds = SimpleFieldList()
for ...
    f = .... # make a field
    ds.append(f)

my_fields = [...]
ds = SimpleFieldList(my_fields)
  • @staticmethod FieldList.from_fields() creates a SimpleFieldList
  • Removes ArrayFieldList. Its functionality is implemented by SimpleFieldList
  • ArrayField becomes a top level object and can be directly imported from earthkit-data
  • @staticmethod FieldList.from_array() returns a SimpleFieldList containing ArrayFields. Previously it returned an ArrayFieldList.
ds = FieldList.from_array(array_list, metadata_list)
  • The "list-of-dicts" source returns a SimpleFieldList containing ArrayFields.

Array backends

  • The array-api-compat package becomes a mandatory dependency. It provides the array namespaces.
  • Simplifies the array backend implementation and Field/FieldList now does not contain any array backend related objects.
  • Removes the array_backend option from the FieldList constructor. It means we cannot load a GRIB fieldlist from file/stream which mimics as if its data was stored in a given array backend format. Most probably this feature was not used at all.
  • We can still create an array fieldlist with the given array backend by using the to_fieldlist() method on a FieldList or by using from_array()
# create an array fieldlist with numpy arrays
ds = from_source("file", "my.grib").to_fieldlist()
ds = from_source("file", "my.grib").to_fieldlist(array_backend="numpy")

# create an array fieldlist with torch tensors
ds = from_source("file", "my.grib").to_fieldlist(array_backend="pytorch")

# create an array fieldlist with torch tensors
array =  ... # torch tensor
md = ... # list of metadata objects
FieldList.from_array(array, md)

Questions/Major changes

  • can array-api-compat be a mandatory dependency? Or only try to rely on it when other than numpy arrays are used?
  • mutable SimpleFieldList (append() method)
  • exposing SimpleFieldList (from earthkit.data import SimpleFieldList)
  • exposing ArrayField (from earthkit.data import ArrayField)
  • removal of array_backend from FieldList/from_source()
  • what should be the preferred way to build an array fieldlist during the computations? The current recommendation is this:
# create an empty fieldlist
ds_r = FieldList()

for f in fs:
    p = f.metadata("level")*100. # hPa -> Pa
    t_new = potential_temperature(f.values, p)
    md_new = f.metadata().override(shortName="pt")
    
    # create new numpy fieldlist with a single field
    ds_new = FieldList.from_array(t_new, md_new)

    # add it to the resulting fieldlist
    ds_r += ds_new

With SimpleFieldList and ArrayField we can rewrite it as:

# create an empty fieldlist
ds_r = SimpleFieldList()

for f in fs:
    p = f.metadata("level")*100. # hPa -> Pa
    t_new = potential_temperature(f.values, p)
    md_new = f.metadata().override(shortName="pt")
    
    # create new numpy field and add it to the resulting fieldlist
    ds_r.append(ArrayField(t_new, md_new))

Or (needs to be implemented) in the previous code we could even use a FieldList instead of SimpleFieldList as a starting point.

# create an empty fieldlist
ds_r = FieldList()

Further questions about ArrayField

Let us suppose f is an ArrayField containing a torch tensor and GRIB metadata. The return type of the following methods are straightforward:

f.to_numpy() -> ndarray
f.to_array() -> torch tensor
f.values -> torch tensor

However, not sure if the following methods should return a torch tensor or an ndarray:

f.to_latlon()
f.to_points()
f.data()
f.grid_points()
f.grid_points_unrotated()

@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 95.82367% with 18 lines in your changes missing coverage. Please review.

Project coverage is 90.59%. Comparing base (daeef71) to head (f72325f).
Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
tests/grib/test_grib_backend.py 69.56% 7 Missing ⚠️
tests/utils/test_array.py 76.19% 5 Missing ⚠️
tests/grib/grib_fixtures.py 85.00% 2 Missing and 1 partial ⚠️
tests/grib/test_grib_serialise.py 94.59% 1 Missing and 1 partial ⚠️
tests/grib/test_grib_slice.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #471      +/-   ##
===========================================
- Coverage    90.62%   90.59%   -0.03%     
===========================================
  Files          142      143       +1     
  Lines         9768     9755      -13     
  Branches       461      468       +7     
===========================================
- Hits          8852     8838      -14     
  Misses         761      761              
- Partials       155      156       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sandorkertesz sandorkertesz self-assigned this Oct 8, 2024
@sandorkertesz
Copy link
Collaborator Author

I will merge this PR to make progress (and required by other features).

@sandorkertesz sandorkertesz merged commit 2656f54 into develop Oct 17, 2024
100 checks passed
@sandorkertesz sandorkertesz deleted the feature/simplify-field-array-backend branch October 17, 2024 09:43
chpolste added a commit that referenced this pull request Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants