Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASDF handling of numpy scalar values (and scalar precision) #1519

Open
braingram opened this issue Apr 11, 2023 · 4 comments
Open

ASDF handling of numpy scalar values (and scalar precision) #1519

braingram opened this issue Apr 11, 2023 · 4 comments
Assignees

Comments

@braingram
Copy link
Contributor

Based off discussions in recent PRs:
#1474 (comment)
#1475 (comment)

ASDF (and the ASDF standard) should likely clarify how to handle scalar values.

For a short example of what should be clarified. Suppose a user adds a scalar (non-array) 'numpy.uint8' to the tree, then saves, then loads the tree. Should the value be a 'numpy.uint8', int?

To summarize the discussion so far based off contributions by @perrygreenfield

What should the standard require?

Perhaps support for "The greatest common types for integers, floats and complex values across standard platforms... int64, float64 and complex128". With larger types possibly supported with associated metadata.

What is required for round-tripping?

"Up to the library", so for python:

  • require permission to write out larger than standard values
  • raise a warning when the library doesn't support the size and return a special object
  • require round-tripping the special objects
  • raise an exception if the standard requirement is not met

How should numpy scalars that fit within the standard scalar?

The library will return the largest (standard) type which can violate round-tripping. Any user that expects a 'numpy.uint8' to stay a 'numpy.uint8' (and not be promoted to an int) should convert the value separately.

@braingram
Copy link
Contributor Author

Some guidance is already provided in the standard for integers:
https://asdf-standard.readthedocs.io/en/1.0.3/known_limits.html#literal-integer-values-in-the-tree

One thought, which I think is in partial agreement with what @perrygreenfield described would be to say that scalars in the tree can be:

  • int64
  • float64
  • complex128
    By default, ASDF (the python implementation) will convert any numpy scalar that fits into these types (and violate round-tripping, this means numpy.uint8 will become int). For any value outside of this range, I'm thinking we should throw an error mentioning the limitation and directing the user to a page with more information and a description of how to save the scalar as a ndarray (which will round trip without the same scalar issues).
    We could enable an option to disable the automatic conversion from numpy scalar to builtin type (and instead throw an error).

@braingram
Copy link
Contributor Author

braingram commented Jul 27, 2023

Copying from: #1605 (comment)

"""
Now that light has been shined on the issue of numpy scalars, I think ASDF should work towards serializing numpy scalars differently than the built in python scalars. NEP 41 is working towards having more complicated dtypes (which would apply to scalars too). One of the main motivations for this effort is to encode units into the dtype itself (for performance). Looking towards this indicates that ASDF might want to start considering how to encode the dtype for scalars.
"""

@WilliamJamieson
Copy link
Contributor

WilliamJamieson commented Jul 27, 2023

Linking this to asdf-format/asdf-standard#396, which is a proposal to add a scalar type to be added to the standard.

@perrygreenfield
Copy link
Contributor

perrygreenfield commented Jul 27, 2023

I wonder if this is another instance of trying to do too much with ASDF, similar to the argument I made about tuples previously. What sense is there in storing float16 scalars in an ASDF file? I am leaning towards thinking that if people want to deal with these in a consistent way, it is for them to address it specifically rather than add things like dtypes to scalar floats. Can we just assume they are doubles and leave it at that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants