Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An off-the-wall idea: Drop the end filler word and reduce TValues from 16 to 12 bytes. #2333

Closed
TerryE opened this issue Mar 30, 2018 · 8 comments

Comments

@TerryE
Copy link
Collaborator

TerryE commented Mar 30, 2018

This doesn't fit into the usual bugfix / feature request but is an off-the-wall idea arising from #2332.

Perhaps the main advantage of the integer builds over the FP ones is that integer Tvalues are 2 words long:

word word
value type

whereas floating point ones are 4 words long:

word word word word
value LSW value MSW type filler word

and in fact the Most Significant Word (MSW) of the value is only used for Number values. The filler word is added because the compiler assumes that doubles must be doubleword aligned and so the TV has to be a multiple of double long. This is the case on the x86 architecture, but on the ESPs all FP is done by S/W routines so only word-alignment is needed. So why not just force Tvalue to be word aligned:

#pragma pack(4)     // Added but would need to be conditional so only added on xtensa compiles
typedef struct lua_TValue {
  TValuefields;
} TValue;
#pragma pack()

and hey-presto! Tvalues are now 12 bytes instead of 16. At -O2 the few instances of *sizeof(TValue) have o be a *12 instead of <<4 , but otherwise the generated code is the same.. I'd also need to change my pack logic in lflashimg.c slightly.

In the medium term, Lua 5.3 TValues are going to be 8 bytes anyway, but I can see a lot of developers being reluctant to switch to the 5.3 platform until there is a body of confidence in the engine, and with LFS moving most code use and constant string use of out of RAM then dropping the size of TV values by 25% is going to have a material impact of memory free.

So I will try this out and get back to you girls. There might some gotcha that I haven't though of.

@TerryE
Copy link
Collaborator Author

TerryE commented Mar 31, 2018

Well that seems to work fine and TValues are now 12 bytes instead of 16. Why didn't we do this years ago?

@TerryE TerryE changed the title An off-the-wall idea for a quick win An off-the-wall idea: Dropy the end filler word and reduce TValues from 16 to 12 bytes. Mar 31, 2018
@TerryE
Copy link
Collaborator Author

TerryE commented Mar 31, 2018

This all seems solid and there's bugger-all difference in the executable size. What I suggest is that we add LUA_USE_PACKED_TVALUES as an option to user_config.h in the LFS patch and I make lflashimg.c aware of this option in its record packing Then people have the option of leaving TValues as-is or reducing their size by 25% If the default for now is to comment this out then it's a low-risk mod to add..

@TerryE TerryE changed the title An off-the-wall idea: Dropy the end filler word and reduce TValues from 16 to 12 bytes. An off-the-wall idea: Drop the end filler word and reduce TValues from 16 to 12 bytes. Mar 31, 2018
@TerryE
Copy link
Collaborator Author

TerryE commented Mar 31, 2018

I was just doing some sizing benchmarks and discovered an anomaly which isn't present on standard Lua on the PC -- another eLua bug ;-( An array has two types of entry: a hashed vector used for string keys, etc, and a dense vector of TValues that is used for things like a={1,2,3,4} or for i=1,n do a[i]=0 end

The for loop version works fine on the PC but used the hash entries on the eLua / NodeMCU version. No wonder arrays give memory problems!

But at least the new version uses 25⅜ less of too much 😞

I need to dump this eLua 5.1 engine ASAP.

@TerryE
Copy link
Collaborator Author

TerryE commented Mar 31, 2018

I was about to raise a separate issue on this But unfortunately my latest LFS patch gives a different result to the current master and to the current PC 5.1.5 version. For a dense 1..N vector

 a = {unpack(a)}

should not free up memory after GC, but it does. Essentially the SETLIST LVM opccode uses a lua_rawseti() loop internally since the comiple knows that it is dealing with integer keys. Whereas a[i] = ... generates a SETTABLE LVM opcode which uses a lua_settable() call and this internally includes an optimisation path to handle the case where i is an integer. I only came across this because I was trying to create a unit test which would show garbagecollect('cont') increasing by well defined TValue multiples. The test was correct, but the results weren't.

More investigation needed.

@pjsg
Copy link
Member

pjsg commented Mar 31, 2018

This sounds similar to the issue in #1434. Did that bug sneak back in?

@TerryE
Copy link
Collaborator Author

TerryE commented Mar 31, 2018

Sorry, don't know, but I do know that I need to hose this area of the core to work out why standard Lua out performs our builds, and why my patch is different again. AFAIK, the only changes that I made here where in GC guards, but I need to analyse the diffs on the variants, and work through them with the remote debugger

@TerryE
Copy link
Collaborator Author

TerryE commented Apr 2, 2018

OK this one was 1 bug down to me and one down to elua. So with:

$ cat nodemcu-firmware/local/fs/init.lua
local function bm()
  local gc, print, concat = collectgarbage, print, table.concat
  gc() gc()
  local a = {0}; gc() gc()
  local base = gc 'count'*1024
  for i = 2,9 do
    a[i] = 0;    gc() gc()
    a[i] = gc 'count'*1024 - base
  end
  print ('Results are '..concat(a, ', '))
end
bm()
$
$ lua nodemcu-firmware/local/fs/init.lua   # with standard lua this gives:
Results are 0, 16, 48, 48, 112, 112, 112, 112, 240
$
$ nodemcu-firmware/luac.cross -e  nodemcu-firmware/local/fs/init.lua    #diito with luac.cross -e option
Results are 0, 16, 48, 48, 112, 112, 112, 112, 240

$ screen -L -l /dev/ttyUSB0 115200                            #default TValues
 NodeMCU 2.1.0 build unspecified powered by Lua 5.1.4 on SDK 2.1.0(116b762)
Results are 0, 16, 48, 48, 112, 112, 112, 112, 240
>

$ screen -L -l /dev/ttyUSB0 115200                         LUA_PACK_TVALUES_ENABLE in user_config.h
 NodeMCU 2.1.0 build unspecified powered by Lua 5.1.4 on SDK 2.1.0(116b762)
Results are 0, 12, 36, 36, 84, 84, 84, 84, 180
>

The ability be to able to debug a whole class of bugs on the PC using lauc.cross -e is brilliant.

@TerryE
Copy link
Collaborator Author

TerryE commented Apr 27, 2018

I am closing this because the LFS PR implements this.

@TerryE TerryE closed this as completed Apr 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants