Use float and possibly half in json::to_cbor #1719

misos1 · 2019-08-22T17:53:44Z

Describe the feature in as much detail as possible.

Similarly as with integer types why not to encode floating point types as small as possible? For example now is 0.5 encoded as FB 3F E0 00 00 00 00 00 00 but it could be F9 38 00. So if half is enough to store value without loss then store it in half. If not then test single precision and if that fails store it as double precision. One way could be to convert double to lower precision and compare whether they are equal. Or maybe this can be done by simply testing least significant bits in mantissa and exponent. If half would be too expensive due to scarce native support in hardware then at least support single precision floats.

The text was updated successfully, but these errors were encountered:

nlohmann · 2019-11-05T19:10:31Z

Any idea how to realize this?

rvjr · 2019-11-18T13:54:56Z

We actually noticed this too as we needed very small binaries; we fixed it internally as in the attached patch. So far we have not found any problems with this variant. As misos1 suggested we simply test if the value stays the same if we truncate it to float precision.

0001-SDK-341-Implemented-CBOR-serialization-with-32bit-fl.zip

nlohmann · 2019-11-18T14:40:59Z

Thanks for sharing! I'll have a look.

misos1 · 2019-11-18T14:55:28Z

It is done similarly also in other libraries:

https://github.com/pyfisch/cbor/blob/master/src/ser.rs#L311-L342
https://github.com/hildjj/node-cbor/blob/master/lib/encoder.js#L203-L229

There are edge cases. NaN does not equal to itself.

But is this really the best way? What about that other suggestion by testing bits?

Would be great to also have half.

misos1 · 2019-11-18T17:19:18Z

I suppose if hardware supports such conversion then will be cheaper to convert number to lower precision and compare with original. Testing bits would always require more instructions than that. If hardware does not support it then converting forth and back would be overhead and whether is conversion lossless or not can be decided during conversion.

I was probably wrong about scarce native support of half in hardware. Seems it is rather ubiquitous.

So after checking edge cases like infinities and NaN (which can be simply stored as half) if double value stays same after conversion to float then try to serialise it as float or half (else store it as double). If there is hardware support for half and float value stays same after conversion to half and back then store it as half (else store it as float). If there is no hardware support (and we are willing to do this) convert it manually and if it looks lossless then store it as half (else store it as float). So support half at least when there is hardware support. I also noticed that some libraries are storing whole floating point numbers as integers where possible but this would mean loss of type information.

rvjr · 2019-11-18T20:25:42Z

You're right, special handling of Inf and NaN would be nice, because these can we written as floats without any loss of information. But I don't think supporting half makes sense at this point, because at least in x86_64 there are not even scalar instructions for converting to float. Where do you see the benefit of half, and on which architecture?

misos1 · 2019-11-18T20:50:38Z

It could be good to produce even smaller cbor binaries. It is not about architecture but more about conciseness. In x86_64 the instruction converts 4 numbers at once but this does not mean that it is 4x slower than conversion from double to float. It is SIMD. Seems also ARM has support for half. In any case conversion to half can be done also in software like is now done conversion from half (there can be some switch whether it should encode floats to halfs if there is not hardware support).

rvjr · 2019-11-18T21:09:08Z

Yes of course this is not a question of conversion performance. For CBOR the resulting binary size is all that matters. I just meant it's maybe a lot of work to do this correctly including detection of supported instruction set, fall back to a correct manual conversion, etc., for some very rare cases in which you gain significant size reduction...
So, feel free to implement it :-)

nlohmann · 2019-11-19T18:50:33Z

Let me try to summarize before I misunderstand anything:

We do have a patch that tries to serialize floating-point numbers to float (0xFA) rather than double (0xFB) if possible.
We could use a similar approach to also try to use half-precision floats (0xF9), but this would only have advantages in the size of the generated binary value, but have negative impact on the performance, as generating and parsing half-precision floats means some overhead.
Both approaches are similar to the way we treat integers in the sense that we try to use the most compact representation. However, for integers this (a) does not involve round-tripping, and (b) all integer types enjoy good hardware support.

Is this right?

If so, maybe adding parameters to the to_cbor function could control such conversions just like the use_size and use_type parameters in to_ubjson. What do you think?

misos1 · 2019-11-19T19:53:46Z

As mentioned above that patch would probably need little modifications to treat edge cases as for example a == a is false when a is float or double NaN.
It could have impact without hardware support for half. Hardware half conversion can be also used in cbor deserialisation instead of current software approach. Why not use third-party C++11 header-only library for half manipulation like for example this http://half.sourceforge.net/. It uses hardware half for conversions when possible or software fallback. Seems to enable it on clang and gcc on x86-64 it is needed to pass option -mf16c or similar.
Yes integers are easier to check.

Yes, parameters to to_cbor would be good idea for better control about half behaviour. But they are not needed for double to float conversions. Also other formats like messagepack could store numbers to 32-bit float where possible.

nlohmann · 2019-12-02T20:34:20Z

Thanks! I like the idea of using float where possible, but I am very hesitant to pull in a 200 KB header for half support...

misos1 · 2019-12-02T21:10:27Z

Maybe can be extracted only part of it which is doing conversions or use some another.

stale · 2020-01-01T21:32:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dota17 · 2020-03-25T08:39:59Z

Should we discuss this issue again to prevent it from being ignored？

rvjr · 2020-03-25T10:53:23Z

Well I have a strong preference for the simple cast-and-compare approach. Maybe with some special handling of NaN and Inf, which could be stored as float and would otherwise be stored as double. Including another library just for this tiny feature really bloats the code footprint. And I think keeping the json library compact should be preferred over hardware support of half conversions, which is still rare and the benefit is probably negligible on almost all platforms.

dota17 · 2020-03-26T07:17:45Z

According to the NAN and infinity you mentioned, I looked at the code and the corresponding documentation, and got the following information:
NaN and Inf are half-precision floats, the from_cbor function can hadle these kind of type, but to_cbor function doesn't support half and single-precision floats type, that's to say, we can deserialize 0xf97c00(means Infinity) to null, but we can't serialize Infinity to 0xf97c00 directly.
So, how about adding NAN/Inf support to the to_cbor function first?
Hope I didn't miss something important.

rvjr · 2020-03-26T11:39:29Z

I would simply test the input value with std::isnan() and std::isinf() and serialize std::numeric_limits::quietNaN() or std::numeric_limits::infinity() into the stream. All other values can be handled as in my proposed patch, this worked fine for us for quite some time.

dota17 · 2020-03-26T12:20:46Z

Yes, that's good. Do you mind if I make some changes based on your patch?

rvjr · 2020-03-26T13:47:39Z

Feel free, that's why I attached it to this thread...

stale · 2020-05-10T12:14:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dota17 · 2020-05-12T07:25:02Z

@rvjr we can't simply treat the type conversion like in your patch, because some float numbers in testcase cannot handled correctly, you can see them in this comment.
Could you please take a look at this?

nlohmann · 2020-05-12T10:50:25Z

I think the patch is fine - the failing tests just assume that the library always returns doubles.

misos1 added the kind: enhancement/improvement label Aug 22, 2019

This comment has been minimized.

Sign in to view

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Sep 21, 2019

nlohmann removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Sep 26, 2019

This comment has been minimized.

Sign in to view

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Oct 26, 2019

stale bot closed this as completed Nov 2, 2019

nlohmann added the aspect: binary formats BSON, CBOR, MessagePack, UBJSON label Nov 2, 2019

nlohmann reopened this Nov 2, 2019

stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Nov 2, 2019

nlohmann added the state: help needed the issue needs help to proceed label Nov 5, 2019

nlohmann added solution: proposed fix a fix for the issue has been proposed and waits for confirmation and removed state: help needed the issue needs help to proceed labels Nov 18, 2019

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jan 1, 2020

stale bot closed this as completed Jan 8, 2020

nlohmann reopened this Apr 10, 2020

stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Apr 10, 2020

dota17 mentioned this issue Apr 16, 2020

Fix issue#1719 #2044

Merged

4 tasks

nlohmann linked a pull request Apr 18, 2020 that will close this issue

Fix issue#1719 #2044

Merged

4 tasks

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 10, 2020

stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 12, 2020

nlohmann added this to the Release 3.8.0 milestone May 13, 2020

nlohmann self-assigned this May 13, 2020

nlohmann closed this as completed in #2044 May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use float and possibly half in json::to_cbor #1719

Use float and possibly half in json::to_cbor #1719

misos1 commented Aug 22, 2019

This comment has been minimized.

This comment has been minimized.

nlohmann commented Nov 5, 2019

rvjr commented Nov 18, 2019

nlohmann commented Nov 18, 2019

misos1 commented Nov 18, 2019 •

edited

Loading

misos1 commented Nov 18, 2019

rvjr commented Nov 18, 2019

misos1 commented Nov 18, 2019

rvjr commented Nov 18, 2019

nlohmann commented Nov 19, 2019

misos1 commented Nov 19, 2019

nlohmann commented Dec 2, 2019

misos1 commented Dec 2, 2019

stale bot commented Jan 1, 2020

dota17 commented Mar 25, 2020

rvjr commented Mar 25, 2020

dota17 commented Mar 26, 2020

rvjr commented Mar 26, 2020

dota17 commented Mar 26, 2020

rvjr commented Mar 26, 2020

stale bot commented May 10, 2020

dota17 commented May 12, 2020

nlohmann commented May 12, 2020

Use float and possibly half in json::to_cbor #1719

Use float and possibly half in json::to_cbor #1719

Comments

misos1 commented Aug 22, 2019

This comment has been minimized.

This comment has been minimized.

nlohmann commented Nov 5, 2019

rvjr commented Nov 18, 2019

nlohmann commented Nov 18, 2019

misos1 commented Nov 18, 2019 • edited Loading

misos1 commented Nov 18, 2019

rvjr commented Nov 18, 2019

misos1 commented Nov 18, 2019

rvjr commented Nov 18, 2019

nlohmann commented Nov 19, 2019

misos1 commented Nov 19, 2019

nlohmann commented Dec 2, 2019

misos1 commented Dec 2, 2019

stale bot commented Jan 1, 2020

dota17 commented Mar 25, 2020

rvjr commented Mar 25, 2020

dota17 commented Mar 26, 2020

rvjr commented Mar 26, 2020

dota17 commented Mar 26, 2020

rvjr commented Mar 26, 2020

stale bot commented May 10, 2020

dota17 commented May 12, 2020

nlohmann commented May 12, 2020

misos1 commented Nov 18, 2019 •

edited

Loading