Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some more ROTable preparation / optimisations pre Lua 5.3 #2858

Closed
TerryE opened this issue Jul 26, 2019 · 10 comments
Closed

Some more ROTable preparation / optimisations pre Lua 5.3 #2858

TerryE opened this issue Jul 26, 2019 · 10 comments
Labels

Comments

@TerryE
Copy link
Collaborator

TerryE commented Jul 26, 2019

ROTable performance changes

  • ROTable accesses use a lookaside cache (currently 32×4) to prime any ROTable lookup. The (4) entries in the appropriate cache line are checked for a table/key hit and if so then this specific table entry in Flash is matched against the requested key. In practice this achieves a better than 99% hit rate enabling direct reads of the ROTable entry in Flash without any ROTable scan.
  • In the case of a miss, a scan of the ROTable is required, and because of the high cache hit rate, nearly all such ROTable scans are for a missing key. The highest source of key-misses (over 90%) are the Lua VM probes for various metafields such as __metatable or __index. We have therefore added an ordering constraint on all ROTables: fields with keys starting with the character "_" must be located at the head of the table in ascending order.
  • A separate scan algo is used for fields starting with "_": this breaks if strcmp(pentry->key, strkey)>=0 which means that typically only one or two fields are checked before the scan is aborted.
  • A key-miss for all other key will result in a full scan of the table. However we only support string keys for rotables, the luaR_entry key field now direclty points to the CString key name which are now word-aligned in Flash. The scan now does a word-read from flash of the pentry->key and *pentry->key and does a masked compare against the 1st word of the search key. (The mask is to accommodate keys shorter than 3 chars in length.) This means that in practice the scan is a small inline loop with no unaligned exceptions or calls to the ROM strcmp() (other than the last hit validation if there is a key match), and this executes fast.

In practice this means that ROTable access is now perhaps 3× slower than equivalent RAM Table accesses, rather eLua case of perhaps 100× slower.

Preparing for lua53

In the Lua53 implementation, all tables will have a Table header, (though this is a reduced variant which omits unused fields in the case of ROTables). The LROT_ macros have been designed to hide these implementation differences at a source level in our C modules.

  • How a table is bound to its metatable is different in the two version. This is a minor issue in our code, since with one exception we declare ROTables for use as UData metatables not ROTable metatables.
    • In the case of Lua51, the eLua patch overloads the __metatable field. The core Lua VM only uses the __* metafields in a metatable and ignores them in a normal table, but here any normal table is scanned for the __metatable entry to determine if it uses a metatable. This entry has been encapsulated in the LROT_METATABENTRY( mt ) entry which expands to a LROT_TABENTRY( __metatable, mt ).
    • In the case of Lua53, The mt value is in the second parameter of the LRO_END() macro and this is used in the generation of the ROTable's header. Note that this field is ignored in lua51 builds and similarly the LROT_METATABENTRY() declarations are treated as empty expansons in lua53 builds. This enable the same module source to be used for both targets.
  • The default Lua VM uses the table header flags as an existence check for the common meta fields such as __index to avoid unnecessary table accesses. This optimisation is bypassed for the lua51 eLua patch. However it is honoured in the lua53 implementation and the 3rd parameter of the LRO_END() macro contains this static flag setting.

Suggestions for reviewers reviewing these meta fields

Try using the command:

grep -RIPn '\bLROT_(\w+?ENTRY\s*\(\s*_|(PUBLIC_)?BEGIN|END)' app | grep _ -C 2 | vi -
  • The __* fields must be in ascending order and consecutive immediately following the LROT_BEGIN() statement.
  • The second field of the LRO_END() macro such be NULL unless this is a ROTable that uses another ROTable as its metatable.
  • The third field of the LRO_END() macro should be 0 unless the table is used as a metatable and in this cae should match the corresponding metafield entries.

Comment:

I realise that this change denomalises the source which is a bad thing, and if anyone can think of a better approach to us keeping this ROTable code base common to lua51 and lu53 then I am open to suggestions.

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 26, 2019

Re the last comment if it looks worthwhile performance-wise then post lua53, we can always back engineer the new ROtable format into lua51 once we have it stable in Lua 53. I could also write a perl or python script to validate the Table declarations, rather than doing it manually.

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 31, 2019

@jmattsson , @nwf et al, feedback requested.

Now that my local branch has both the make LUA=53 working so I can compile the firmware with either Lua51 (app/lua) or Lua53 (app/lua53), I am now working through the practical aspects of making our Lua modules interoperable between the two Lua versions. I will be documenting the conclusions in my next update to the Lua 5.3 port to NodeMCU Firmware paper (some of which will end up in our FAQs / official documentation), but here is a summary of the relevant issue for comment / confirmation.

This exercise of fixing up the modules for Lua 5.3 compatibility has really underline for me a conclusion: that we've been lax about how we have allowed our modules to be implemented when interfacing to the Lua runtime.

Both the Lua Reference Manual and PiL make it clear that the complete public API for C modules is as documented in lua.h and that all its definitions are prefixed by lua_. This API aims at economy and orthogonality.

There is also set of supplementary functions provided by an auxiliary library (auxlib) as documented in lauxlib.h with its definitions prefixed by with luaL_. The auxiliary library provides practical implementations of common tasks, but does this entirely through the core public API without other reference to the internals of Lua.

So all of our modules and helper libraries should only include lua.h and lauxlib.h.  The other l_xxxx_.h headers are really internal to the Lua runtime and so if any or our module needs them to compile, then their implementation is wrong and needs changing.

A quick analysis of all of the 8266 modules shows these issues:

  • Memory allocation. 17 modules include use lmem.h functions.  They are not documented and clearly confuse our developers.  We need a cleaner approach if we want to encourage use of the Lua memory allocator.  BTW, the documented vehicle is Userdata and this is properly integrated into the Lua GC, etc.  However, this separate discussion that merits its own issue.
  • Node Library.  This is a bit of a mess (10 other l_xxxx_.h includes) but this is in effect our API window into the Lua internals, and I do need to review this.
  • LWiP makes a few minor uses; this is really internal to the firmware, so as long as these compile clean, I am not too bothered.
  • Other fixable exceptions. We have 4 modules (pipe, sjson, websocket, wifi_monitor) and one package library (coap) with the odd include which need recoding to avoid.

We have had the previous discussion about letter numbering (luaF_, luaN_, etc.) but thinking about this I thing that we should stick to:

  • We add an addendum to lua.h for NodeMCU features that we truly want to add to the core API, and we should be frugal in doing do.
  • We should be clear and consistent in how we expose our helper API, and the real hotspot here is our new declarative mechanism for creating ROTable resources at compile time. I suggest that everything that we want our modules to use should go in lnodemcu.h and continue to use the LROT_ prefix (Lua Read-Only Table) macros for ROTable declaration. The modules.h may as will just include lnodemcu.has well as the current lua.h so modules should only need to include modules.h and optionally lauxlib.h to use the full public Lua C API.
  • At the moment, I can't think of any cases whee we might want to expose the luaF_ or the luaN_ prefixes as a public API.

@jmattsson
Copy link
Member

Agreed. The big question mark is how we deal with lmem.h, considering we've been using it more and more for things that are not subject to GC. At this point that still seems preferable to plain old malloc() (well, whatever that ends up resolving to) in order to tap into GC-on-demand on alloc.

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 31, 2019

To be honest IMO the best way to allocate resources is to use the Userdata mechanism. The overhead of doing so is pretty minimal. and the nice thing is that it is fully integrated into the Lua GC so if you need a temporary resource then just create it using userdata and put a reference on the stack. It doesn't matter whether the routine return normally or throws an error: the stack entry gets dereferenced and the resource will be recovered on the next GC run -- this makes coding leak-free routines a lot simpler with minimal overhead.

The problem with the lmem.h interface is that it is complicated and confusing.

Failing this we should just have a simple set of malloc / calloc / realloc / free macros/wrappers which follow the posix API and do " a normal malloc. if NULL return then do a Lua GC and retry the malloc before returning NULL".

KISS

@jmattsson
Copy link
Member

I could certainly get behind a set of wrapper functions for the allocs. I'd have to dig into the IDF to work out what the best approach there would be, but memory is far less of an issue there in the first place.

@HHHartmann
Copy link
Member

I would clearly prefer the userdata approach here. We have a higher level memory management available which frees us of the sometimes tedious task of keeping track of memory. It also includes GC before fail already.
So why not use it.

KISS

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 31, 2019

Gregor, what I see is a multiple pronged approach.

  • Convert a few of the core modules to using the UData approach and document this in the modules developer's FAQ (yes another job TBD).
  • Do our our own malloc, ... wrappers so that any app using <stdlib.h> and doing a malloc(), etc. uses this. (BTW LWIP compiled with this would make a heap threshold for the ECG far more effective.)
  • Deprecate and remove use of the luaM_ calls in our modules.

@nwf
Copy link
Member

nwf commented Jul 31, 2019

Could we convert LwIP and mbedTLS to using the lua heap / udata objects? I think it's plausible, if not actually true, that tying mbedTLS's allocations to the Lua heap would help during the surges in heap requirement seen for TLS handshaking.

@TerryE
Copy link
Collaborator Author

TerryE commented Aug 1, 2019

I've decided that it's going to be a lot easier to maintain the modules if the public Table / ROTable interface is the same for both versions, so I'm regressing the Lua53 table handling into our app/lua codebase: faster and closer to standard Lua.

All ROtables have a extra row at the start (generated by the LROT_TABLE macro) which contains a cut-down Table record, so apart from the ltable.c internals, the rest of Lua handles both variants the same.

@TerryE
Copy link
Collaborator Author

TerryE commented Sep 17, 2019

Now implemented in both Lua 5.1 and 5.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants