Skip to content

[8pt] Describe LuaJIT memory profiler interface and parser #1753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Buristan opened this issue Jan 14, 2021 · 4 comments · Fixed by #2085
Closed

[8pt] Describe LuaJIT memory profiler interface and parser #1753

Buristan opened this issue Jan 14, 2021 · 4 comments · Fixed by #2085
Assignees
Labels
feature A new functionality server [area] Task relates to Tarantool's server (core) functionality

Comments

@Buristan
Copy link

Buristan commented Jan 14, 2021

LuaJIT memory profiler interface and parser

Product: Tarantool
Since: 2.7.1
Audience/target: developers, operations: what's making my Tarantool slow
Root document: where to add or update documentation
SME: @igormunkin @Buristan
Peer reviewer: @

Details

Tarantool 2.7.1 release introduced the LuaJIT memory profiler (gh-5442) and the profile parser (gh-5490).

This version of LuaJIT memory profiler does not support verbose reporting allocations from traces. All allocation from traces are reported as internal. But trace code semantics should be totally the same as for the Lua interpreter (excluding sink optimizations). Also all deallocations reported as internal too.

Tail call optimization does not create a new call frame, so all allocations inside the function called via CALLT/CALLMT bytecodes are attributed to its caller.

Usually developers are not interested in information about allocations inside builtins. So if builtin function was called from a Lua function all allocations are attributed to this Lua function. Otherwise attribute this event to a C function.

Assume we have the following Lua chunk named <test.lua>:

1  jit.off()
2  misc.memprof.start("memprof_new.bin")
3  -- Lua does not create a new frame to call string.rep and all allocations are
4  -- attributed not to `append()` function but to the parent scope.
5  local function append(str, rep)
6    return string.rep(str, rep)
7  end
8
9  local t = {}
10 for _ = 1, 1e5 do
11   -- table.insert is a builtin and all corresponding allocations
12   -- are reported in the scope of main chunk.
13   table.insert(t,
14     append('q', _)
15   )
16 end
17 misc.memprof.stop()

Binary data can be read via Tarantool itself via the following command (NB: mind the dash prior to the dump filename):

$ tarantool -e 'require("memprof")(arg[1])' - memprof_new.bin

It parses the binary format provided by memory profiler and render it on human-readable format.

If one run the chunk above the profiler reports approximately the following:

ALLOCATIONS
@test.lua:0, line 14: 1002      531818  0
@test.lua:0, line 13: 1 24      0
@test.lua:0, line 9: 1  32      0
@test.lua:0, line 7: 1  20      0

REALLOCATIONS
@test.lua:0, line 13: 9 16424   8248
        Overrides:
                @test.lua:0, line 13

@test.lua:0, line 14: 5 1984    992
        Overrides:
                @test.lua:0, line 14


DEALLOCATIONS
INTERNAL: 20    0       1481
@test.lua:0, line 14: 3 0       7168
        Overrides:
                @test.lua:0, line 14

Plain text of profiled info has the following format:

@<filename>:<function_line>, line <line where event was detected>: <number of events>	<allocated>	<freed>

INTERNAL means that these allocations are caused by internal LuaJIT structures. Note that events are sorted from the most often to the least.

Overrides means what allocation this reallocation overrides.

Starting profiler from Lua is quite simple:

local started, err, errno = misc.memprof.start(fname)

where fname is name of the file where profile events are written. Writer for this function perform fwrite() for each call retrying in case of EINTR. When the profiling is stopped the fclose() is called. If it is impossible to open a file for writing or profiler fails to start, returns nil on failure (plus an error message as a second result and a system-dependent error code as a third result). Otherwise returns true value.

Stopping profiler from Lua is simple too:

local stopped, err, errno = misc.memprof.stop()

If there is any error occurred at profiling stopping (an error when file descriptor was closed) or during reporting memprof.stop() returns nil (plus an error message as a second result and a system-dependent error code as a third result). Returns true otherwise.

Also, see some possible FAQs in the issue comments

@veod32
Possibly, we can firstly create a short MVP document that can be enough for our developers to have a try. Then publish it and collect developers' feedback on the functionality.

@igormunkin igormunkin added 2.7 server [area] Task relates to Tarantool's server (core) functionality labels Jan 15, 2021
@veod32 veod32 changed the title Describe LuaJIT memory profiler interface and parser [80pt] Describe LuaJIT memory profiler interface and parser Jan 29, 2021
@veod32 veod32 added the feature A new functionality label Jan 29, 2021
@Buristan
Copy link
Author

Buristan commented Feb 1, 2021

Some of FAQ:
Q: Is it profiler suitable for C allocations or allocation inside C code?
A: Profiler reports only allocation events caused by Lua allocation function. All "Lua-related" allocation like table or string creation are reported. But it does not report allocations made by malloc or other non Lua allocator, use valgring for debug them.

Q: Why is there so much INTERNAL allocation in my profiler's report , what does it mean?
A: INTERNAL means that these allocations/reallocations are related to internal LuaJIT's structures or are made on traces. Memory profiler doesn't report verbosely allocations of objects that are made during trace execution. Try to add jit.off() before profiler start.

Q: Why there is some reallocations/deallocations without OVERRIDED section?
A: These objects may be created before profiler starting. Adding collectgarbage() before profiler's start allow to collect all previously allocated objects, that are dead when the profiler starts.

Q: Why some objects are not collected during profiler recording? Is it a memory leak?
A: LuaJIT uses incremental Garbage Collector. Cycle of Garbage collector may be not finished at the moment of the profiler's stop. Add collectgarbage() before stopping memory profiler to collect all dead objects for sure.

Q: Can I profile not a current chunk, but the whole running application? Can I start profiler when the application is already running?
A: Yes. Here is the example of code that can be inserted in the Tarantool console on a running instance.

local fiber = require "fiber"
local log = require "log"

fiber.create(function()
  fiber.name("memprof")

  collectgarbage() -- Collect all already dead objects.
  log.warn("start of profile")

  local st, err = misc.memprof.start(FILENAME)
  if not st then
    log.error("failed to start profiler: %s", err)
  end

  fiber.sleep(TIME)

  collectgarbage()
  st, err = misc.memprof.stop()

  if not st then
    log.error("profiler on stop error: %s", err)
  end

  log.warn("end of profile")
end)

Where FILENAME -- filename to output, TIME -- duration of profiling. Also, you can directly call misc.memprof.start() and misc.memprof.stop() from console.

@veod32 veod32 changed the title [80pt] Describe LuaJIT memory profiler interface and parser [TBD] Describe LuaJIT memory profiler interface and parser Feb 12, 2021
@veod32 veod32 changed the title [TBD] Describe LuaJIT memory profiler interface and parser [8pt] Describe LuaJIT memory profiler interface and parser Feb 26, 2021
@veod32 veod32 added 2.7.1 and removed 2.7 labels Mar 18, 2021
@veod32 veod32 added this to the Tarantool releases of Dec 2020 milestone Mar 22, 2021
@tarantool tarantool deleted a comment from NickVolynkin Mar 28, 2021
@tarantool tarantool deleted a comment from igormunkin Mar 28, 2021
@Buristan
Copy link
Author

Buristan commented Apr 3, 2021

The example of investigation of the memory profiler's report.

Assume we have the following Lua code named <format_concat.lua>:

 1 jit.off() -- More verbose reports.
 2
 3 local function concat(a)
 4   local nstr = a.."a"
 5   return nstr
 6 end
 7
 8 local function format(a)
 9   local nstr = string.format("%sa", a)
10   return nstr
11 end
12
13 collectgarbage() -- Clean up.
14
15 local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
16
17 local st, err = misc.memprof.start(binfile)
18 assert(st, err)
19
20 -- Payload.
21 for i = 1, 10000 do
22   local f = format(i)
23   local c = concat(i)
24 end
25 collectgarbage() -- Clean up.
26
27 local st, err = misc.memprof.stop()
28 assert(st, err)
29
30 os.exit()

If one runs this code via Tarantool and parses the memory profiler's report he sees the following output:

ALLOCATIONS
@format_concat.lua:8, line 9: 19998     624322  0
INTERNAL: 1     65536   0

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 19998 0       558816
        Overrides:
                @format_concat.lua:8, line 9

@format_concat.lua:8, line 9: 2 0       98304
        Overrides:
                @format_concat.lua:8, line 9

The reasonable questions are:
Q: Why there are no allocations related to concat() function?
Q: Why the amount of allocation is not a nice round number?
Q: Why there are approximately 20 K allocations instead 10 K?

First things first.
LuaJIT doesn't create a new string if the string with the same payload exists. It is called string interning. So, when the string creates via format() function, there is no need to create the same string via concat() function, LuaJIT just use the previous one. This is the reason unpretty amount of allocation -- Tarantool creates some strings for internal needments and built-in modules, so some strings are just already exist.

But why there are so many allocations? Almost double of the expected amount. This is because string.format() built-in creates another string necessary for %s identifier, so there are two allocations for each iteration: for tostring(i) and for string.format("%sa", string_i_value). You can see the difference in the behaviour by adding local _ = tostring(i) line between 21st and 22nd lines.

OK, let's comment the 22nd line (by adding -- at the start), to take a look for concat() function.

The profiler's report is the following:

ALLOCATIONS
@format_concat.lua:3, line 4: 10000     284411  0

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 10000 0       218905
        Overrides:
                @format_concat.lua:3, line 4

@format_concat.lua:3, line 4: 1 0       32768

Q: But what will change if we enable JIT compilation?
Let's comment the first line of the code to see what happens. Now we can see only 56 allocations in the report, and all other allocations are JIT-related (see also tarantool/tarantool#5679):

ALLOCATIONS
@format_concat.lua:3, line 4: 56        1112    0
@format_concat.lua:0, line 0: 4 640     0
INTERNAL: 2     382     0

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 58    0       1164
        Overrides:
                @format_concat.lua:3, line 4
                INTERNAL

This is because after 56 iterations trace is compiled and JIT-compiler removed unused variable c from trace, and, therefore, dead code of concat() function is eliminated.

Lets now to profile only format() function with enabled JIT. The report is the following:

ALLOCATIONS
@format_concat.lua:8, line 9: 19998     624322  0
INTERNAL: 4     66824   0
@format_concat.lua:0, line 0: 4 640     0

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 19999 0       559072
        Overrides:
                @format_concat.lua:0, line 0
                @format_concat.lua:8, line 9

@format_concat.lua:8, line 9: 2 0       98304
        Overrides:
                @format_concat.lua:8, line 9

Q: Why there is so many allocation against concat() function?
The answer is easy: string.format() function with %s identifier is not yet compiled via LuaJIT. So, trace can't be recorded and the compiler doesn't perform corresponding optimizations.

If we change the format() function like the following:

 8 local function format(a)
 9   local nstr = string.format("%sa", tostring(a))
10   return nstr
11 end

the results become much prettier:

ALLOCATIONS
@format_concat.lua:8, line 9: 110       2131    0
@format_concat.lua:0, line 0: 4 640     0
INTERNAL: 3     1148    0

REALLOCATIONS

DEALLOCATIONS
INTERNAL: 113   0       2469
        Overrides:
                @format_concat.lua:0, line 0
                @format_concat.lua:8, line 9
                INTERNAL

@sharonovd
Copy link

The most common case: I have an Tarantool that fails due to Lua OOM (for example, "normal" memory consumption is slightly lower than 1Gb, and load spikes quickly lead to OOM). I want to know how ate all the memory - what should I do now?
Obvious steps:

  • upgrade to T 2.7
  • enable memory profiling
  • what's next? can I patiently wait for the next coredump? will the coredump contain necessary information?

@Buristan these are questions for you, I assume

@Buristan
Copy link
Author

@sharonovd
If you want to find the reason of growing Lua-memory at the production server, I suggest the following:

  • Update Tarantool to 2.7.2/2.8.1 (2.7.1 contains bug in memprof, that may lead to core dump of instance)
  • Run profiler for the significant time to collect information about objects that created and unreleased
  • Analize the binary output of the profiler with memprof parser and --leak-only option (2.8.1+) for easier reading

NB: taking into account that this profiling takes place with JIT enabled, many allocations will be reported as INTERNAL without a location in source code.

Regarding OOM error -- Mike solves (if tests will not show the opposite) this problem for all architectures except 32-bit ARM and Windows on x86, so it looks like the user should update Tarantool to version with exceptions on traces support. I hope that this patch will be backported in 2.9.1.

@veod32 veod32 linked a pull request Apr 20, 2021 that will close this issue
veod32 added a commit that referenced this issue Apr 20, 2021
veod32 added a commit that referenced this issue Apr 26, 2021
veod32 added a commit that referenced this issue Apr 28, 2021
veod32 added a commit that referenced this issue Apr 28, 2021
NickVolynkin pushed a commit that referenced this issue Apr 28, 2021
NickVolynkin pushed a commit that referenced this issue Apr 28, 2021
NickVolynkin pushed a commit that referenced this issue Apr 28, 2021
veod32 added a commit that referenced this issue Apr 29, 2021
Introduce a new document on LuaJIT memory profiler

* LuaJIT memory profiler is a new feature implemented in Tarantool 2.7.1. The document describes the profiler's behavior as of this Tarantool version.
* The document is placed in the Application server chapter. 

Closes #1753
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality server [area] Task relates to Tarantool's server (core) functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants