Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With NumPy 2.0, text/plain + float formatting results in SphinxWarning #610

Closed
bryanwweber opened this issue Jun 16, 2024 · 8 comments · Fixed by #615
Closed

With NumPy 2.0, text/plain + float formatting results in SphinxWarning #610

bryanwweber opened this issue Jun 16, 2024 · 8 comments · Fixed by #615
Labels
bug Something isn't working

Comments

@bryanwweber
Copy link
Contributor

bryanwweber commented Jun 16, 2024

Describe the bug

context
When I do glue a NumPy float into a page with the {glue:text}`variable:<float format>`, I get a SphinxWarning:

sphinx.errors.SphinxWarning: <filename>:257:Failed to format text/plain data: could not convert string to float: 'np.float64(0.9643970836113095)' [mystnb.glue]

expectation
I expected no warning to occur.

problem
This is a problem for people doing formatting floating point NumPy objects because if they turn warnings to errors, their pages won't build.

The cause is described here: https://numpy.org/devdocs/release/2.0.0-notes.html#representation-of-numpy-scalars-changed

Basically, the repr of NumPy types changed to include the NumPy type. As such, the IPython display formatter outputs the literal 'numpy.float64(1.2345)' which obviously can't be formatted as a float. I'm not sure if this should be filed here or with IPython... I landed on here because it affects the glue functionality in particular and because it seems like the display formatter for NumPy types in IPython is probably doing the right thing.

The immediate workaround is as suggested in the NumPy release notes, setting np.set_printoptions(legacy="1.25") fixes the warning.

Reproduce the bug

  1. Install NumPy 2.0
  2. Glue and print a variable:
    ```{code-cell}
    from myst_nb import glue
    import numpy as np
    a = np.float64(1.234)
    glue("var-a", a)
    ```
    {glue:text}:`var-a:.2F`
  3. Build a book/the page and see the warning/error if -W is on.

List your environment

❯ jupyter-book --version
Jupyter Book      : 1.0.0
External ToC      : 1.0.1
MyST-Parser       : 2.0.0
MyST-NB           : 1.1.0
Sphinx Book Theme : 1.1.3
Jupyter-Cache     : 1.0.0
NbClient          : 0.10.0

and most importantly, NumPy >=2.0

@bryanwweber bryanwweber added the bug Something isn't working label Jun 16, 2024
@bryanwweber
Copy link
Contributor Author

Also want to recognize that as of this comment, NumPy 2.0 was released approximately 5 hours ago, so I'm way ahead of the curve. Happy to help contribute a fix as needed!

@flying-sheep
Copy link
Contributor

Also want to recognize that as of this comment, NumPy 2.0 was released approximately 5 hours ago, so I'm way ahead of the curve. Happy to help contribute a fix as needed!

numpy has pre-releases, so it’s very possible to catch and fix issues with it before the official release 😛

@agoose77
Copy link
Collaborator

I don't think this is something that we should try to solve. glue(name, object) is assuming that we can use eval for the text/plain formatting of floats. This is true for built-in floats, but no-longer true (without importing NumPy into the execution context) for NumPy floats.

I don't think we should special-case NumPy here, instead the user should either set the print options, or store the glue object as a Python float. Glue should be given something that's serialisable.

I'm not 100% firm on this personally; I can see merit in a less cumbersome UX. But, with my numeric-hat on; NumPy floats are not Python floats, and this distinction matters.

@bryanwweber
Copy link
Contributor Author

But, with my numeric-hat on; NumPy floats are not Python floats, and this distinction matters.

Agreed in all the contexts where we care what types are being used! But, in this specific case of text formatting for prose, I'd argue the distinction doesn't matter; as a user, I simply want something that I'm using as a number to show up as a specific format.

I don't think we should special-case NumPy here, instead the user should either set the print options, or store the glue object as a Python float... I can see merit in a less cumbersome UX.

I understand that special-casing NumPy here is quite fraught. Where does it stop, should we also do this for SciPy objects? etc. I personally think NumPy floats are common enough that having a special case will be a huge benefit to avoid questions - and especially inscrutable errors/warnings! That said, although I think a fix is warranted, I don't whether it belongs here or in IPython or even in NumPy. Ideally, NumPy objects would know how to format themselves; on the other hand, glue is using the text/plain mimetype here, which should also be displayed in the REPL where clearly indicating numpy.float is important. Adding a new mimetype might be an option, but then that would have to go into IPython and support added here in MyST-NB, and it would be even more confusing that the current situation. Another option would be to avoid serializing to text before actually pasting, but that's a total reversal of how the gluing system works now where all the mimetypes available for an object are stored as metadata on a cell.

The only reasonable solution I can see in the short term is to do something like (this is pseudo-code):

def glue(...):
    try:
        import numpy
    except ImportError:
        numpy = None
    if numpy is not None and isinstance(variable, numpy.float):
        variable = float(variable)
    mimebundle, metadata = IPython.core.formatters.format_display_data(variable)
    ...

but then that takes away options from end users as far as how they want to do the conversion.

So, after all that, I've convinced myself that the status quo is the best option, with the exception that I think this behavior should be documented (I had to find it in the NumPy release notes!). I'll write up a PR for a doc improvement, probably over the weekend or early next week if no one beats me to it 😄

@agoose77
Copy link
Collaborator

Yes, I went back and forth on this too. Ultimately, doing any kind of float conversion is "magic" and changes what is actually "stored". We should do a docs pass, I think.

@flying-sheep
Copy link
Contributor

flying-sheep commented Jun 28, 2024

glue:text seems to be fundamentally flawed if ends up taking a round-trip through repr before applying formatting specifiers.

If you want to serialize things for later, you need to use a serialization format, not repr, which is documented to “often” be able to work as one, but not reliably:

Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval();

The only standard Python serialization format is pickle, which would mean that you’d have to have the same packages installed in the environment that unpickles the glued values as there were when pickling them.

Alternatively you could make the serializer/deserializer configurable, and provide a default that can handle basic Python types and throws an error when it can’t handle something (e.g. json.dumps/loads).

@agoose77
Copy link
Collaborator

@flying-sheep it would seem to me that the intention of glue has been muddied. At its core, it provides a way to name MIME-bundles. The glue:text functionality is an addition which implies interpretation of the contents, i.e. glue:text assumes the content is a number if you pass in the proper format specifiers. To do that properly, one should perform the MIME-bundle generation at document-rendering time, rather than at execution time. But that would introduce peculiar bugs; the execution environment would no longer have any control over the representation of the object, etc., and moreover, it would presume a specific kernel type.

For now, I think the best path is what we have. Long-term, things like the {eval} role are a better way to do this - shifting the specific formatting work to the kernel.

@bryanwweber
Copy link
Contributor Author

I was just looking at the formatting code:

def format_plain_text(text: str, fmt_spec: str) -> str:
"""Format plain text for display in a docutils node."""
# literal eval, to remove surrounding quotes
try:
value = literal_eval(text)
except (SyntaxError, ValueError):
value = text
if fmt_spec == "":
return str(value)
type_char = fmt_spec[-1]
if type_char == "s":
value = str(value)
elif type_char in ("b", "c", "d", "o", "x", "X"):
value = int(value)
elif type_char in ("e", "E", "f", "F", "g", "G", "n", "%"):
value = float(value)
return format(value, fmt_spec)

I wonder if it makes sense to check whether the string numpy or np exists in the text and extract the float from it? Something like:

def format_plain_text(text: str, fmt_spec: str) -> str:
    ...
    elif type_char in ("e", "E", "f", "F", "g", "G", "n", "%"):
        import re
        is_np = re.match(text, r"(np)|(numpy)\.float\d+\((\d+\.\d*)\))
        if is_np is not None:
             value = is_np.group(1)
        value = float(value)
    return format(value, fmt_spec)

Other than the problem with special-casing NumPy, the only problems I see here are numpy not imported as np or numpy and that the precision in the repr is limited. This would at least resolve most of the most common cases, I expect...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants