Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: GC suffix copy in child process appears slow #1975

Closed
tomjridge opened this issue Jul 8, 2022 · 6 comments
Closed

irmin-pack: GC suffix copy in child process appears slow #1975

tomjridge opened this issue Jul 8, 2022 · 6 comments
Labels
feature/layered-store Related to the Layered Store status/likely-fix-known Likely fix is known

Comments

@tomjridge
Copy link
Contributor

In the child process, the copy to the new suffix should be iterated a number of times (or until there are few bytes left to copy).

There is a branch that does this here: https://github.com/zshipko/irmin/tree/transfer-loop

When running the "tree.exe" benchmarks: dune exec ./bench/irmin-pack/tree.exe -- ./data4_100066commits.repr --artefacts=latest the following was observed:

+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 152483
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 36526
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 7620
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 15160
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 9326
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 11704
+044us [INFO] index [root] sync
+044us [INFO] application XXX: transfer_loop new length 20629
+044us [INFO] application XXX: transfer_loop ran to end

What this means is that the parent is writing to the suffix faster than the child can copy. I don't think this was observed in the old code: there, my memory is that the raw copy in the child was much faster than the writes to the suffix in the parent (after all, one is a raw copy of bytes, whereas the other involves marshalling objects etc etc). It may point to poor performance in the "copy to suffix" code (or in the IO layer generally).

@tomjridge tomjridge added the feature/layered-store Related to the Layered Store label Jul 8, 2022
@tomjridge
Copy link
Contributor Author

Related to #1959

@tomjridge
Copy link
Contributor Author

Attached is a detailed log from running Zach's transfer-loop branch
pr1975.log

@tomjridge
Copy link
Contributor Author

Zach reported that changing the gap_tolerance seemed to limit the number of transfer_loop iterations to a small number (2 or 3). It is not clear why gap_tolerance would affect the speed of the suffix copy though. Tom is currently investigating this.

@zshipko
Copy link
Contributor

zshipko commented Jul 12, 2022

Here is a log with gap_tolerance set to 1000, it looks like in this case transfer_loop runs 1 iteration each time: out.log

@tomjridge
Copy link
Contributor Author

tomjridge commented Jul 14, 2022

Within the transfer loop, Zach's code executes Fm.reload. This is likely to be slow. However, in order to pick up a possible file length change, with the current code, this is needed. (In the original code, working with raw files, we would just fstat to get the length each time, which is very quick.)

This is the reason why the copy loop is "slow" compared to the main process.

This may be the reason why (in main branch) there is no loop when copying the suffix. However, a loop is surely needed (if the suffix is large, the copy takes a long time, the suffix grows a lot in the meantime, and if we don't copy again in the GC process, we have to do it in the main process, which will take some time).

So the question is: what to do now?

One thing is to confirm with Victor (or our own testing) that the main process blocks for a long time, when the suffix is large (as in the Tezos usecase) and when the copy loop is not iterated. It should be clearly observable that when the GC process finishes and the main process takes over, there is time spent in the additional suffix copy.

[Aside: for these performance bugs, we need some way of recording the time spent on particular code paths, so that we can later analyse; at the moment the use of a few printf's is ad-hoc and doesn't currently give good coverage of the paths we are interested in.]

Then one "fix" would be to accept that the Fm.reload takes time, so that the copy loop in the child may not be faster than the main process (when reading/updating the suffix). So we want to avoid the situation where the copy loop calls Fm.reload to find a small number of bytes have been written, copies these bytes, and then repeats; we don't mind if a small number of bytes - say 4096*5 - needs to be copied in the main process; we just want to avoid the "bad" interaction where we repeatedly copy a small number of bytes. I think Zach's current code does set a limit on the min number of bytes necessary to cause a copy. So something like Zach's code looks like the correct fix.

@metanivek
Copy link
Member

This has been addressed with mmap and transfer loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/layered-store Related to the Layered Store status/likely-fix-known Likely fix is known
Projects
None yet
Development

No branches or pull requests

3 participants