Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-81022: Supporting customization of float encoding in JSON #13233

Closed
wants to merge 9 commits into from

Conversation

Lee-W
Copy link

@Lee-W Lee-W commented May 10, 2019

Add an encode_float argument to JSONEncoder for supporting customization float encoding

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@Lee-W Lee-W changed the title Bpo 36841: Supporting customization of float encoding in JSON bpo-36841: Supporting customization of float encoding in JSON May 10, 2019
Copy link

@mitar mitar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also some tests would be useful?

Lib/json/__init__.py Outdated Show resolved Hide resolved
@Lee-W
Copy link
Author

Lee-W commented Jul 17, 2019

@mitar I've added a test for the newly added encode_float argument.

@cmnord
Copy link

cmnord commented Jul 23, 2019

Rather than add another argument to these functions, why not un-nest floatstr from iterencode so that it can be overridden?

@mitar
Copy link

mitar commented Jul 23, 2019

I think the idea is that one can also use C code-path for this? Why it is an argument for decoding and not a method?

@cmnord
Copy link

cmnord commented Jul 23, 2019

@mitar I don't fully understand your comment. Could you clarify what you mean?

What I meant is that it would be nice to be able to do the following:

import json
import numpy as np


class MyJSONEncoder(json.JSONEncoder):
    def floatstr(self, o, _repr=float.__repr__, _inf=json.INFINITY, _neginf=-json.INFINITY):
        if o != o:
            text = "None"
        elif o == _inf:
            text = "Infinity"
        elif o == _neginf:
            text = "-Infinity"
        else:
            return _repr(o)

        if not self.allow_nan:
            raise ValueError(
                "Out of range float values are not JSON compliant: " + repr(o)
            )

        return text


assert json.dumps(np.nan) == "None"

So rather than adding a new argument to many functions, we could achieve the same goal of custom float encoding this way. What do you mean by "C code-path" and "decoding and not a method"?

@mitar
Copy link

mitar commented Jul 23, 2019

  • In load we already have parse_float and parse_int. Why do you think this is like that, and not simply a method on cls?
  • I think the reason why is because allows a fast code path in optimized C version of the parser.
  • So, we should probably do a similar thing in encoding as well, no? So while your suggestion works, I think the idea is that it is faster if you just have to change those two. But we should probably measure that and see if truly is. But currently, the API here tries to match decoding. It might be confusing that for decoding you have function callbacks, but for encoding you have methods?

@cmnord
Copy link

cmnord commented Jul 23, 2019

@mitar thanks for clarifying, I think I understand now. 🙂

@mitar
Copy link

mitar commented Jul 23, 2019

If you do have time, you could try to measure how much does this really benefit. You could try moving parsing of floats to a class in decoding. And see if that really slows down things. Personally I also do not like extra functions, especially if we already have a nice class to put methods on.

@Lee-W
Copy link
Author

Lee-W commented Nov 10, 2019

@mitar Could you please explain more on what should I measure? It seems you already explain it. Thanks 🙂

@mitar
Copy link

mitar commented Nov 11, 2019

I would propose that you measure and compare two cases a) having methods on the class b) having functions directly provided.

@Lee-W
Copy link
Author

Lee-W commented Nov 12, 2019

I've tested on Python 3.8.0a4+ using the code below.
The result is as follows.

Encode
Default:  0.19853970199999998
Argument:  0.31803161799999996
Class:  0.282146192
import json
import timeit


INFINITY = json.encoder.INFINITY


class MyJSONEncoder(json.JSONEncoder):
    def floatstr(self, obj,  _repr=float.__repr__, _inf=INFINITY, _neginf=-INFINITY):
        if obj != obj:
            text = "None"
        elif obj == json.encoder.INFINITY:
            text = "Infinity"
        elif obj == -json.encoder.INFINITY:
            text = "-Infinity"
        else:
            return float.__repr__(obj)

        if not self.allow_nan:
            raise ValueError(
                "Out of range float values are not JSON compliant: " + repr(obj)
            )

        return text


def floatstr(obj, allow_nan=True, _repr=float.__repr__, _inf=INFINITY, _neginf=-INFINITY):
    if obj != obj:
        text = "None"
    elif obj == json.encoder.INFINITY:
        text = "Infinity"
    elif obj == -json.encoder.INFINITY:
        text = "-Infinity"
    else:
        return float.__repr__(obj)

    if not allow_nan:
        raise ValueError(
            "Out of range float values are not JSON compliant: " + repr(obj)
        )
    return text


obj = {
    'inf': float('inf'),
    '-inf': -float('-inf'),
    'nan': float('nan'),
    '2': 2.23,
}

print('Encode')
print('Default: ', timeit.timeit(lambda: json.dumps(obj), number=10000))
print('Argument: ', timeit.timeit(lambda: json.dumps(obj, encode_float=floatstr), number=10000))
print('Class: ', timeit.timeit(lambda: json.dumps(obj, cls=MyJSONEncoder), number=10000))

@Lee-W
Copy link
Author

Lee-W commented Jan 11, 2020

@mitar Do I need to do other experiments on it? Or, would the above one be sufficient? 🙂

@mitar
Copy link

mitar commented Jan 11, 2020

I will leave to somebody else from the Python team to way in here.

@mitar
Copy link

mitar commented Dec 15, 2020

@Lee-W Please update the PR, there is now a merge conflict.

I really like this PR, could somebody from Python core team review/merge this?

@Lee-W
Copy link
Author

Lee-W commented Dec 16, 2020

@mitar Thanks for reminding. I just fix the conflict.

"Out of range float values are not JSON compliant: " +
repr(o))

return text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: why move the floatstr function definition inline here? it could stay as a regular method in the class, then the line below would be self.encode_float = self.floatstr and still work.

@mRemyLynceus
Copy link

@python's team : What's going on ?

Modules/_json.c Outdated Show resolved Hide resolved
@merwok
Copy link
Member

merwok commented Jun 4, 2023

As I noted on the ticket, this should be discussed to reach agreement on the need and the shape of the feature first.

@AlexWaygood AlexWaygood changed the title bpo-36841: Supporting customization of float encoding in JSON gh-81022: Supporting customization of float encoding in JSON Jun 4, 2023
@Lee-W
Copy link
Author

Lee-W commented Aug 6, 2023

Got it. I just wanted to resolve the previous conflict. As this is not yet agreed, let me close this PR

@Lee-W Lee-W closed this Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO-NOT-MERGE type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants