irmin-pack: GC suffix copy in child process appears slow #1975

tomjridge · 2022-07-08T15:14:09Z

In the child process, the copy to the new suffix should be iterated a number of times (or until there are few bytes left to copy).

There is a branch that does this here: https://github.com/zshipko/irmin/tree/transfer-loop

When running the "tree.exe" benchmarks: dune exec ./bench/irmin-pack/tree.exe -- ./data4_100066commits.repr --artefacts=latest the following was observed:

+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 152483
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 36526
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 7620
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 15160
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 9326
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 11704
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 20629
+044us [INFO] application XXX: transfer_loop ran to end

What this means is that the parent is writing to the suffix faster than the child can copy. I don't think this was observed in the old code: there, my memory is that the raw copy in the child was much faster than the writes to the suffix in the parent (after all, one is a raw copy of bytes, whereas the other involves marshalling objects etc etc). It may point to poor performance in the "copy to suffix" code (or in the IO layer generally).

The text was updated successfully, but these errors were encountered:

tomjridge · 2022-07-08T15:16:54Z

Related to #1959

tomjridge · 2022-07-12T13:41:23Z

Attached is a detailed log from running Zach's transfer-loop branch
pr1975.log

tomjridge · 2022-07-12T16:26:29Z

Zach reported that changing the gap_tolerance seemed to limit the number of transfer_loop iterations to a small number (2 or 3). It is not clear why gap_tolerance would affect the speed of the suffix copy though. Tom is currently investigating this.

zshipko · 2022-07-12T16:32:59Z

Here is a log with gap_tolerance set to 1000, it looks like in this case transfer_loop runs 1 iteration each time: out.log

tomjridge · 2022-07-14T11:06:10Z

Within the transfer loop, Zach's code executes Fm.reload. This is likely to be slow. However, in order to pick up a possible file length change, with the current code, this is needed. (In the original code, working with raw files, we would just fstat to get the length each time, which is very quick.)

This is the reason why the copy loop is "slow" compared to the main process.

This may be the reason why (in main branch) there is no loop when copying the suffix. However, a loop is surely needed (if the suffix is large, the copy takes a long time, the suffix grows a lot in the meantime, and if we don't copy again in the GC process, we have to do it in the main process, which will take some time).

So the question is: what to do now?

One thing is to confirm with Victor (or our own testing) that the main process blocks for a long time, when the suffix is large (as in the Tezos usecase) and when the copy loop is not iterated. It should be clearly observable that when the GC process finishes and the main process takes over, there is time spent in the additional suffix copy.

[Aside: for these performance bugs, we need some way of recording the time spent on particular code paths, so that we can later analyse; at the moment the use of a few printf's is ad-hoc and doesn't currently give good coverage of the paths we are interested in.]

Then one "fix" would be to accept that the Fm.reload takes time, so that the copy loop in the child may not be faster than the main process (when reading/updating the suffix). So we want to avoid the situation where the copy loop calls Fm.reload to find a small number of bytes have been written, copies these bytes, and then repeats; we don't mind if a small number of bytes - say 4096*5 - needs to be copied in the main process; we just want to avoid the "bad" interaction where we repeatedly copy a small number of bytes. I think Zach's current code does set a limit on the min number of bytes necessary to cause a copy. So something like Zach's code looks like the correct fix.

metanivek · 2022-08-18T18:36:43Z

This has been addressed with mmap and transfer loop.

tomjridge added the feature/layered-store Related to the Layered Store label Jul 8, 2022

tomjridge added the status/likely-fix-known Likely fix is known label Jul 14, 2022

zshipko mentioned this issue Jul 15, 2022

Run transfer_exn in loop in GC child process #1986

Merged

metanivek closed this as completed Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

irmin-pack: GC suffix copy in child process appears slow #1975

irmin-pack: GC suffix copy in child process appears slow #1975

tomjridge commented Jul 8, 2022

tomjridge commented Jul 8, 2022

tomjridge commented Jul 12, 2022

tomjridge commented Jul 12, 2022

zshipko commented Jul 12, 2022

tomjridge commented Jul 14, 2022 •

edited

Loading

metanivek commented Aug 18, 2022

irmin-pack: GC suffix copy in child process appears slow #1975

irmin-pack: GC suffix copy in child process appears slow #1975

Comments

tomjridge commented Jul 8, 2022

tomjridge commented Jul 8, 2022

tomjridge commented Jul 12, 2022

tomjridge commented Jul 12, 2022

zshipko commented Jul 12, 2022

tomjridge commented Jul 14, 2022 • edited Loading

metanivek commented Aug 18, 2022

tomjridge commented Jul 14, 2022 •

edited

Loading