sjson.encode should use more digits in floating point numbers #3213

pjsg · 2020-07-16T01:55:32Z

Expected behavior

floating point numbers are represented in JSON as a value that is equivalent to the number

Actual behavior

Numbers like 1594863744.123456 appear as 1.594864e+09 -- which loses a lot of information.

Test code

> =sjson.encode({1594863744.123456})
[1.594864e+09]

NodeMCU startup banner

NodeMCU 3.0.0.0 built with Docker provided by frightanic.com
	branch: fix-spiffs-rw
	commit: 92cb1fb08fbbc9f478c13625e520b4a941805ad3
	release: 3.0-master_20200610 +13
	release DTS: 202007130331
	SSL: false
	build type: float
	LFS: 0x20000 bytes total capacity
	modules: adc,bit,crypto,encoder,enduser_setup,file,gpio,i2c,mdns,mqtt,net,node,rtcmem,rtctime,sjson,sntp,spi,tmr,uart,wifi
 build 2020-07-16 01:33 powered by Lua 5.3.5 on SDK 3.0.1-dev(fce080e)

The text was updated successfully, but these errors were encountered:

pjsg · 2020-07-16T02:17:31Z

It turns out that the actual conversion is done by this line of code (from luaxlib.c):

lua_pushfstring(L, "%f", (LUAI_UACNUMBER)lua_tonumber(L, idx));

which is doomed to failure. It turns out (from doing a quick google search) that this is a tough problem to get right -- when you want value -> string -> value to end up with the same value that you started with. In my case, I'd be happy if it was right to within a few bits of precision -- getting the last few bits always appears to be a problem.

Actually, this may not be a problem -- we are only using floats (rather than doubles) which have terrible accuracy anyway, so it probably doesn't matter.

pjsg · 2020-07-16T17:48:24Z

I'm now more confused -- the docs seem to indicate that we are using double for numbers and consequently integers under 2^53 should be represented exactly. However, this isn't true:

aa = 1234567890
bb = aa + 9
print ('int', bb-aa)

aa = 1234567890.0
bb = aa + 9
print ('float', bb-aa)

returns

int	9
float	0.0

TerryE · 2020-07-16T21:03:56Z

@pjsg, it depends on whether you are using Lua51 or Lua53. Lua 51 uses a 12 byte Tvalue where all numerics are stored as double which allows precise representation of 53 bit integers. The latter uses an 8 byte TValue where Numeric values are either the subtype integer (32 bit) or single float (32 bit with 24 bit mantissa).

It's basically a trade-off. The default Lua53 build has the RAM efficiency of old integer build but with the convenience of being able to use FP when needed. I have yet to come across a real world DAC or ADC that gives more than about 12 bits accuracy, so worrying about rounding errors after 12 d.p. seems a little excessive to me. 32-bit numerics are good enough for real-world IoT applications, IMO.

I did design the Lua53 make so that for those that wanted to drop their effective RAM by a third, and slow runtime accordingly, then they could still configure the build to do 64bit float subtypes, but I haven't had a chance to try this out in anger.

I did bounce this off the other committers in the Lua53 requirements discussions.

pjsg · 2020-07-16T21:18:08Z

I think that we need to update the documentation quite significantly to point out this change. We also need to review all uses of lua_pushnumber in our code to see if they should be changed to lua_pushinteger calls. In the lua51 world, it didn't matter because an integer had an exact float representation. This is no longer true. For example the current unix time is not usefully representable as a float .....

I'll file some tickets for the modules as I search through them.

TerryE · 2020-07-17T00:58:03Z

Philip, if you read the latest draft of the Lua 5.3 documentation you will see this covered in some depth, and it has been discussed in some depth as well. Everyone has their favourite point that they would like to see at the top of the documentation, but I can't put everyone's favourite as the first point. I have to assume that developers actually bother to read the documentation provided start to end.

If you have specific changes that you want to suggest, that is what the review processes are for.

TerryE · 2020-07-17T01:02:41Z

Have you got a real world example of an IoT application that really requires 14 sig. fig. floating point calculation instead of 6?

pjsg · 2020-07-17T01:56:32Z

I would rephrase the question to "are there no iot apps that need integers of more than 23 bits?" Part of the problem is that some of the modules were cavalier about numbers vs integers. This used to be fine as you could represent an integer (signed or unsigned) exactly as a number. This is no longer true. This means that, for example, a timestamp is not representable as a number. It is true that, if we fixed all the firmware to use integers where appropriate, then with careful Lua coding, you can probably get by with using floats.

…

On Thu, Jul 16, 2020, 21:03 Terry Ellison ***@***.***> wrote: Have you got a real world example of an IoT application that really requires 14 sig. fig. floating point calculation instead of 6? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3213 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALQLTOQ4EA6VMNNRXRQGQ3R36PMBANCNFSM4O3KIR2A> .

pjsg · 2020-07-17T02:11:59Z

Part of the problem is that in some of the modules the wrong functions were used. For example, the struct module returns integer fields as numbers (which doesn't work any more). Of course, with floats/ints, there is actually no way of representing an unsigned 32-bit integer field. Many of these problems can be fixed in the modules (and I filed some tickets for those). I'm not sure about whether the same issue is present where the C code is trying to read integers but using number routines.

For example, rtcmem.write32 coerces the values to numbers and then back to integers -- thus

rtcmem.write32(1, 1234567890)
print (rtcmem.read32(1))

will not return the expected value. This may, or may not, be serious. It is certainly surprising.

TerryE · 2020-07-17T04:20:16Z

Philip, none of this surprises me as I raised this as a risk when we had the debate six months ago. Going from a 64-bit Number type to a 32-bit (signed) integer + a 32-bit FP type increases the number of variable that can be stored in a fixed head size by perhaps roughly 20%. No free lunch, so there are some edge cases where we need to fix code. I did all of the Lua53 subsystem work, I did a lot of back porting to make the Lua51 and Lua53 C APIs sufficiently compatible that all the Modules compile clean. I also emphasised the there may be a few edge cases where we need to sort out number vs integer. I had hoped that at least one other committer would take on this task, or at least work with me on this. It sometimes gets pretty lonely and frustrating doing everything on your own.

Just checking standard Lua on the PC, the conversion of a numeric string and numeric strings larger than LONG_MAX get converted to the float (double) subtype, so I guess we should do likewise larger than INT_MAX on the ESPs.

We just need to define what the determinate rules are and do a review to enforce these.

TerryE · 2020-07-17T11:06:22Z

Philip, IMO sjson is one of the few modules that may need a Lua version check in the source and have version variant code. The reason for this that the Lua type number has separate subtypes of float and integer and these align to the JSON schema types number and integer. That being said I also think that Lua 5.1 numbers that have an exact integer value should also map onto the JSON schema type integer.

TerryE · 2020-08-22T18:14:50Z

@pjsg, I've just gone through the potential to number vs integer changes in the modules and in fact all bar sjson.c should be integer and coerced to the appropriate type. The one exception is the sjson module. Which JSON schema do you implement? AFAIK, the latest recognised different number and integer subtypes and represents them accordingly.

The code which uses the call

  lua_pushlstring(data->L, get_state_buffer(data, state), state->pos_cur - state->pos_begin);
  LUA_NUMBER r = lua_tonumber(data->L, -1);
  lua_pop(data->L, 1);
  lua_pushnumber(data->L, r);

the last three lines could just be coded as:

#if LUA_VERSION_NUM == 501
  lua_pushlstring(data->L, get_state_buffer(data, state), state->pos_cur - state->pos_begin);
  LUA_NUMBER r = lua_tonumber(data->L, -1);
  lua_pop(data->L, 1);
  lua_pushnumber(data->L, r);
#else /* LUA_VERSION_NUM == 503 */
  if (lua_stringtonumber (L, get_state_buffer(data, state)) == 0)
    luaL_error(L, "JSON parse error: invalid number type"); /* or push 0 to maintain existing behaviour */ 
#endif /* LUA_VERSION_NUM */

This will convert whole integers to int subtype. Note that the lua_tolstring() call in 5.3 does this subtyping. But I think that we need to go through the code because it may need extra logic for decoding count fields and in particular how we handle invalid non-integer counts.

stale · 2021-08-28T17:22:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HHHartmann · 2022-04-17T12:51:02Z

This seems to be handled at least partially in #3222

TerryE mentioned this issue Jul 17, 2020

RFI: Integer vs Float in Lua 5.3 #3220

Open

TerryE mentioned this issue Aug 23, 2020

Change struct & sjson to use integers. This is slightly more complex #3222

Merged

4 tasks

pjsg mentioned this issue Aug 24, 2020

Review all uses of lua numbers to determine if they ought to be integers #3219

Open

stale bot added the stale label Aug 28, 2021

stale bot closed this as completed Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sjson.encode should use more digits in floating point numbers #3213

sjson.encode should use more digits in floating point numbers #3213

pjsg commented Jul 16, 2020

pjsg commented Jul 16, 2020

pjsg commented Jul 16, 2020

TerryE commented Jul 16, 2020 •

edited

Loading

pjsg commented Jul 16, 2020

TerryE commented Jul 17, 2020 •

edited

Loading

TerryE commented Jul 17, 2020

pjsg commented Jul 17, 2020 via email

pjsg commented Jul 17, 2020

TerryE commented Jul 17, 2020 •

edited

Loading

TerryE commented Jul 17, 2020

TerryE commented Aug 22, 2020 •

edited

Loading

stale bot commented Aug 28, 2021

HHHartmann commented Apr 17, 2022

sjson.encode should use more digits in floating point numbers #3213

sjson.encode should use more digits in floating point numbers #3213

Comments

pjsg commented Jul 16, 2020

Expected behavior

Actual behavior

Test code

NodeMCU startup banner

pjsg commented Jul 16, 2020

pjsg commented Jul 16, 2020

TerryE commented Jul 16, 2020 • edited Loading

pjsg commented Jul 16, 2020

TerryE commented Jul 17, 2020 • edited Loading

TerryE commented Jul 17, 2020

pjsg commented Jul 17, 2020 via email

pjsg commented Jul 17, 2020

TerryE commented Jul 17, 2020 • edited Loading

TerryE commented Jul 17, 2020

TerryE commented Aug 22, 2020 • edited Loading

stale bot commented Aug 28, 2021

HHHartmann commented Apr 17, 2022

TerryE commented Jul 16, 2020 •

edited

Loading

TerryE commented Jul 17, 2020 •

edited

Loading

TerryE commented Jul 17, 2020 •

edited

Loading

TerryE commented Aug 22, 2020 •

edited

Loading