Add config option to specify json float serialization precision #905

Gatsik · 2022-05-08T21:12:55Z

Closes #875

Gatsik · 2022-05-12T21:58:38Z

test_ladder_game_draw_bug and test_game_ended_broadcasts_rating_update are failing probably because of customized precision
Do you want this custom precision only for queue_pop_time_delta field in matchamker_info message?

Askaholic · 2022-05-13T04:45:04Z

Hey, nice work! I think it makes sense to use the same precision for everything, so those tests probably need to be adjusted to have higher tolerance for error.

The second one is interesting because it means there will be some edge cases where the rating change is so small that it doesn’t appear to change at all. I think that’s ok though since the full change history will still be available through the replay details page and the rating graph.

Askaholic · 2022-05-16T23:27:04Z

One thing we have to keep in mind is that if we’re messing with the json encoding, we need to be very careful of the performance impact since this is one of the hottest functions in the entire code base. It would be amazing if we could just override the float serialization code and leave the rest of the implementation untouched. I found a little snippet that does this although it uses the mock.patch functionality to override private functions from the Json library. Still, this may be a good source of inspiration:

https://gist.github.com/Sukonnik-Illia/ed9b2bec1821cad437d1b8adb17406a3

Gatsik · 2022-05-22T17:19:22Z

It looks like we can't do much without creating custom python package that will use compiled C code.
I did some comparison between different methods using slightly modified script taken from https://developpaper.com/speed-comparison-of-five-json-libraries-in-python/ and here are the results of encoding on my machine:

patching (example given above): 107.800
current (this pr's code):  28.629

Default serialization

simplejson: 6.981
ujson: 3.503
stdlib json: 4.500

Round floats approach (https://stackoverflow.com/a/53798633):

simplejson: 14.196
ujson: 10.481
stdlib json: 11.109

Overriding iterencode function, which contains floatstr method, directly without patching:

15.991

Compiling custom mix of json and simplejson (most of the code is a copy from python/cpython#13233, built with setup.py from simplejson):

6.589

Time is in seconds, the data for encoding is:

{
    "command": "game_info",
    "visibility": "public",
    "password_protected": True,
    "uid": 13,
    "title": "someone's game",
    "state": "playing",
    "game_type": "custom",
    "featured_mod": "faf",
    "sim_mods": {},
    "mapname": "scmp_009",
    "map_file_path": "maps/scmp_009.zip",
    "host": "Foo",
    "num_players": 2,
    "launched_at": 1111111111.1112312312312,
    "rating_type": "faf",
    "rating_min": None,
    "rating_max": None,
    "enforce_rating_range": False,
    "team_ids": [
        {
            "team_id": 1,
            "player_ids": [1],
        },
        {
            "team_id": 2,
            "player_ids": [2],
        },
    ],
    "teams": {
        1: ["Foo"],
        2: ["Bar"],
    },
}

There is also a package called orjson which is super fast, but requires dict keys to be strings, so with modified initial data, in which all keys are strings, the result is:

0.624

Otherwise, using "round floats" approach in which we also convert dict keys to str:

8.457

Askaholic · 2022-05-27T05:57:03Z

server/protocol/protocol.py

+
+class CustomJSONEncoder(json.JSONEncoder):
+    # taken from https://stackoverflow.com/a/53798633
+    def encode(self, o):


I like this way the best out of all the ones proposed so far. Have you measured the performance difference for this one? I imagine it would be noticeable especially for dicts since it makes a copy of the entire data structure, but maybe it's not too bad.

I think the other way that would have merit would be to make a wrapper type around float and use default to format it. That would require us to explicitly set the float rounding everywhere, but it would also give more control if we wanted to round rating to 3 or 4 places but timestamps only to 2.

Oh sorry, I see this is the 'round floats' approach you mentioned in your other comment.

I don't quite understand what do you mean by a wrapper, because if it requires us to call this wrapper every time we do some math, then why don't just use round?
Rewriting representation of the float won't help, the C code will operate with value and encode all the digits

Something similar to this, but using our own class instead of Decimal: https://stackoverflow.com/a/3885198

class PFloat: def __init__(self, value: float, precision: int): self.value = value self.precision = precision

And then in the to_dict method you'd have to wrap the floats in this class:

def to_dict(self): return { "some_float": PFloat(self.some_float, 2) }

Yeah, I can make a class, but I think it is unnecessary complication, because we already have built-in round function which effectively does the same thing

Like, we have 2 options:

Rewrite json's encode method to change encoding of all floats

Rewrite every to_dict method where we will round every float with its own precision (the "more control" mentioned above)

server/servercontext.py

tests/unit_tests/test_protocol.py

Askaholic · 2022-05-27T06:08:58Z

server/protocol/protocol.py

+            if isinstance(o, (list, tuple)):
+                return [round_floats(x) for x in o]
+            return o
+        return super().encode(round_floats(o))


Maybe we should add a config option to disable this feature just in case it turns out to be too costly in production. Although, I think even if it doubles the json encoding time that will still be fine judging by the profiling information I've collected from the server in the past.

Yes, sure, what would you call it?

I think JSON_ROUND_FLOATS would be good. And then rename the other one to JSON_ROUND_FLOATS_MAX_DIGITS or JSON_ROUND_FLOATS_PRECISION

Should mocking be also conditional in tests?

I don't think it's too important to have tests for the config values. They're generally simple enough that it doesn't matter

Gatsik force-pushed the issue/#875-maximum-precision-for-float-serialization branch 2 times, most recently from bf7c3aa to 86d642a Compare May 12, 2022 21:42

Askaholic reviewed May 27, 2022

View reviewed changes

Gatsik added 3 commits May 28, 2022 22:28

Call logger.exception with required argument

d6da481

pre-commit autoupdate

16c70ce

Round floats before encoding

c61d227

Gatsik force-pushed the issue/#875-maximum-precision-for-float-serialization branch from 2a356ce to c61d227 Compare May 28, 2022 19:33

Add option to disable float rounding for json encoding

e4173b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config option to specify json float serialization precision #905

Add config option to specify json float serialization precision #905

Gatsik commented May 8, 2022

Gatsik commented May 12, 2022

Askaholic commented May 13, 2022

Askaholic commented May 16, 2022

Gatsik commented May 22, 2022 •

edited

Loading

Askaholic May 27, 2022

Askaholic May 27, 2022

Gatsik May 28, 2022

Askaholic May 28, 2022

Gatsik May 28, 2022

Gatsik May 28, 2022

Askaholic May 27, 2022

Gatsik May 28, 2022

Askaholic May 28, 2022

Gatsik May 28, 2022

Askaholic May 28, 2022

Add config option to specify json float serialization precision #905

Are you sure you want to change the base?

Add config option to specify json float serialization precision #905

Conversation

Gatsik commented May 8, 2022

Gatsik commented May 12, 2022

Askaholic commented May 13, 2022

Askaholic commented May 16, 2022

Gatsik commented May 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gatsik commented May 22, 2022 •

edited

Loading