-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slow "Saving files cache" with many files #4869
Comments
borg create
slows down terribly during "Saving files cache", tracing the process shows tons of madvise() and very few write() syscalls
where did you get your borg 1.1.10 and how did you install it? |
notes:
|
This is the relevant code from 1.1.10 (NOTEs added):
So, I'ld say the time within "Saving files cache" but without I/O is borg NOT writing files cache items to disk that are beyond the TTL. It creates/garbage collects python objects all the time during that, in your case likely for ~64M items. |
Hmm, you said this is first subsequent backup after the initial full backup. |
From the Homebrew package repository, and using its package manager:
Agreed. My guess (I'm far from being a MacOSX or Python specialist) is that the calls come from somewhere inside MacOSX's run-time libraries, and is being indirectly called and/or triggered by the Python interpreter, which in turn is being indirectly 'called' and/or triggered by borg's operations during the "Saving files cache" stage.
Exactly what I was going to say :-) Glad you agree. |
In that loop, it dynamically creates a lot of python objects (and then needs to garbage collect them again at some time). I'll try what I can do about that. Problem is that I can't test it for performance with as many files as you have, so for testing you would need to run borg from a git checkout (see our install docs). Can you do that testing? For that source-based install, it would be interesting to first just try 1.1.10 code with different python versions, maybe some weirdness or bad perf in 3.5.x got fixed in 3.6.x or 3.7.x. Especially interesting would be 3.6.9 which is the version I currently use for master branch (and future borg 1.2 alpha binary builds). |
btw, thanks to @DurvalMenezes and @textshell for doing lots of analysis, the irc log and pastebins were quite informative. |
From the Homebrew package repository, and using its package manager:
Agreed. My guess (I'm far from being a MacOSX or Python specialist) is that the madivise() calls come from somewhere inside MacOSX's run-time libraries, and are being indirectly called and/or triggered by the Python interpreter, which in turn is being indirectly 'called' and/or triggered by borg's operations during the "Saving files cache" stage.
Exactly what I was going to say :-) Glad you agree. From the segment of code you pasted, I see no other place for that "indirect calling/triggering" to happen except inside the
Ditto here, please count on me to help diagnose and fix that, as long as it can be done without fouling up the machine (as it is already in production as the main backup 'server' -- see below).
By "run borg from a git checkout", you mean this, right? So, this is my main limitation: this machine is already in production as the main backup 'server here', By 'server', I mean it's a dedicated machine The ideal would be to use some kind of "container" technology to install everything needed for running borg from the git source in a separate manner and with minimal performance impact, like docker (or better, LXC) on Linux, both of which uses cgroup functionality to insulate containers from each other (and more importantly for my case, the 'host' from the containers). I'm trying to determine whether such a thing exists for MacOSX, so far to no avail (just found out that docker on OSX uses VirtualBox and so would surely impact performance, for example the "container" would only be able to access the host's files via VB's virtual filesystem).
Agreed: as mentioned in the IRC log, there was at least one bug in Python 2.6 (fixed in 2.70) that fit the description of the problem. Given we're long past 2.6, this is not the case -- but I've stumbled on a lot of other bugs and issues on subsequent Python versions, some of them only fixed in 3.y.x
Agreed.
You are more than welcome, @ThomasWaldmann, and thank you very much for all the time and effort you spent to make Please give me a few days while I try and figure out how to go ahead with this. Cheers, |
https://borgbackup.readthedocs.io/en/stable/installation.html#from-source you need to start reading there and later skip forward to the git section. virtualenv / venv is a lightweight "container", so you do not mess up your system python's site-packages. installing an additional python should also do no harm as long as you do not change your system python[3] (on linux it would be /usr/bin/python[3] or anything on the PATH called python[3]). i would rather not recommend using vboxsf, that could introduce other issues. |
Thanks for the tip. But that would not isolate the system from the out-of-Python deps (like OpenSSL, libacl/libattr, etc), right?
My thinking exactly. I'm right now exploring with our MacOS local specialist the possibility of just doing a "bare-metal backup" of this entire machine, so we can go ahead full-steam with the tests, installing/replacing anything necessary, and them restore the machine entirely from this backup when we are done with it (or if our local ZFS send/recv backups are in anyway impacted during the borg tests). I will keep this issue posted on our progress on that. |
no, venv doesn't affect the non-python stuff. |
That's what I expected. I'm now moving ahead with that "bare metal backup" idea, will let you know as soon as I get it implemented and update/install everything needed on the machine and proceed to "run from source and with Python 3.6.9" as you indicated. |
Folks, bad news: just heard back from our Mac specialist that there is no way that he knows of to make a "bare metal backup" from a MacOSX machine. TimeMachine, which is the tool he uses most, doesn't make a bit-by-bit copy of the machine, just copies applications, data and configurations that are then restored on top of a new OS install. Furthermore, he also strongly advised against changing the machine's installation as backup of this data is critical for the company. And even without borg, we have backups already being made in a "local" capacity (ie, from the NAS server to the disks attached to the backup machine via zfs send|recv that are working perfectly) and changing the machine's installation could jeopardize that. So we are moving ahead with borg as-is, working around the present issue by setting When/if I get an unused MacOS machine with enough RAM to play with, I will try and get back to testing this (none is probable in the short term, unfortunately). Again, thanks @ThomasWaldmann and everyone for the help. |
We know one source of madvice from my fumbling around with dtrace:
So that looks like cython. If we want to make this easier to debug, it might be worth the effort to see if it reproduces with a small test program that just loads and saves the files cache. That would also likely make tests easier to run without doing actual backups. |
@textshell, good points. And if we have this test program to load and save the cache and it just needs a new Python interpreter and not all the libraries etc that the full borg depends on, I can run it on my machine with virtualenv/venv as @ThomasWaldmann suggested, as it very probably won't disturb its current installation. |
Good idea to use a test program. It could even include generating a large files cache and then benchmark processing it. |
Hello everyone, My faith about working around the issue using
This is exactly what I observed on the window where the Curiously enough, the next steps of cache-saving after "Saving files cache" were quite fast:
Of course this was face-palm time for me. But not the next one:
ie, it spent over 5 hours after printing the stats screen and before printing the final So, it in total spent ~25h actually working (doing I/O, updating the progress line, etc) and then over 15 hours in the above two episodes. Some further data: I ran borg under
I did some research on what "page reclaims" means and what I was able to cobble together is that these are " 'soft' page faults" that are satisfied by claiming a page from the list of free pages in the system (ie, without doing any swapping/paging I/O), so not too bad. But I wonder whether it is somehow connected to the ton of madvise() calls we saw, either as a cause or as a side-effect or co-effect. I got a hunch that perhaps the ZFS ARC cache (which by default tries to claim most of the idle memory up to half the machine's memory, leaving around 1M free for immediate allocation and then freeing more automatically as needed, when that 1M gets used) has something to do with it. Perhaps the Python interpreter and/or system runtime are seeing the system short on totally free RAM (because of this ZFS ARC effect) and are engaging in dramatic action to try and free some of it up? I will go ahead and make another subsequent backup, but this time I will restrict the ZFS arc_max parameter (which controls the maximum RAM it ever grabs for its ARC cache) to a mere 4GB; on this 48GB machine it would leave ~40GB free for For this, please wait for my next report in a day or so. Cheers, |
We already have something like this #1871 |
Agreed, with due regard for the differences:
Anyway, the similarities are tantalizing: in that other issue, the problem was the subsequent backups taking over 24h (so they could not be done "daily"), and borg was also spending a lot of time in the "Saving files cache" step (but I don't think that @Phuzzy112, the OP, went in as deep as I did on this. Perhaps he would have seen madvise() or equivalent on his/her case too?). Also, it involved ZFS. Anyway, it was over 3 years and a major version of borg ago, but perhaps it goes on to show that the problem is more widespread (across major borg versions and very different OSs, and very probably Python versions too). |
So, my second subsequent backup finished during the night. It seems that reducing the maximum memory ZFS grabs for its ARC to 4G (from the default of half the RAM, ie 24G) had a HUGE effect:
This was very surprising: ZFS is designed from the ground up to "feel" memory pressure and proactively release memory from its ARC back to the OS, in order to avoid exactly this. Perhaps Python and/or the system run-time is also trying to be proactive and so instead of putting the pressure on ZFS and getting the memory it needs, it spends those hours going through its memory regions desperately trying to release memory it can ill afford? Anyway, I now see the main problem is with memory: borg is using far more than the 15GB my initial calculations (based on the formula from the borg docs) indicated. More on this:
I'm right now starting my 3rd subsequent backup and will keep ZFS arc_max at 4GB, let's see how it goes. |
Based on the data, I have a possible explanation. Before I write it down, here two caveats/assumptions:
Now here’s what I think might be going on: Based on the observation of the massively increased resident memory use with the reduced ARC size, it seems to me that borg on @DurvalMenezes system and run needs ~30 GB to operate. A lot, but let’s treat that as a given for now. Also, as mentioned above, I assume that much of this are cold memory pages from memory mappings or python objects which are rarely touched. I suggest that during the previous run (with large ARC size), the cold pages of borg lost against the ARC memory in priority and got swapped out. Then whenever it tried to write the cache, it had to iterate over all its pages, swapping them in painfully, which is why writing the cache took ages. At the same time, when swapping in pages, the OS had to decide to release ARC memory (potentially causing disk accesses) or swapping out other pages (of borg, most likely, since anything swappable else would already be swapped out at this point), also causing disk accesses. Which is why everything was terribly slow here. Then you reduced the ARC size and did a re-run. In this run, there was no pressure since the ARC was small enough to accomodate the full borg process (30 GiB) in RAM. Everything went smoothly without swapping. This explains all of the observations:
|
btw, iirc python has its own internal memory management: it gets bigger memory chunks from the OS when needed and then manages that memory internally to satisfy smaller allocations. iirc, it is rather lazy with giving memory back to the OS (if ever, not sure). |
@ThomasWaldmann, thanks for the greatly detailed response! Much appreciated. So, in my case (client only), I have just chunks cache and files cache (as I expected from my conversations on IRC). Re: "going over the threshold for resizing": as the total size and the number of files hardly changed (and specially the chunks file on disk, which I understand reflects the chunks cache size, remains at the exact same size since the initial backup) then this should not be the case, right?
This is as I expected from my IRC conversations. Doesn't explain how the RSS used by the process is apparently incrementing by ~9GB each time I run it (ie, for each "generation"). Let's see how this 3rd subsequent backup (running right now) ends -- will keep this issue posted. |
Another strategy that might work for @DurvalMenezes would be to break up the tree being backed up into a number of sub-trees that are backed up separately. Maybe to separate repos. And you could use different home directories for the backups for each sub-tree, so the file caches would be independent. (That would also allow you to run the backups in parallel, which might or might not work well, depending on what is the bottleneck.) There's still the mystery of why the RSS changed so much when the ARC was changed. I have never heard of python adjusting its memory usage based on memory pressure, so I suspect some swapping, but @DurvalMenezes thinks that wasn't happening. Also, about borg memory usage: could there be unnecessary object creation causing garbage collection? Could there be a memory leak? Not sure how to test for these in python. |
@DurvalMenezes for non-client-server mode, the repo index is loaded into the client. Splitting the files into multiple subsets and using multiple repos would help most. That keeps the number of files per files cache (which is per repo and/or per user, if different users are used) and the number of chunks in the repo (which is also per repo obviously) lower and cause correspondingly lower memory usage. BTW, I also had strange/fatal effects with ZFS on linux and memory allocation - in my case not with borg, but somehow ZFS competing against kvm (on proxmox ve). Thus, I usually do set an max ARC cache limit to a not too high value (few GBs). |
@ThomasWaldmann and @jdchristensen, thanks for the suggestions re:splitting the differences between multiple repos. I've seen something like that on other postings here in github and briefly considered it, but would rather not do it as it would require me to start over from the beginning again and it would be rather painful (specially as I would be throwing away 1 month of my parallel rclone process struggling to copy the borg repo up to the cloud -- which is now almost finished). I'm trying to get more memory for that machine (it can go all the way to 128GB with 3rd party modules -- almost 3x the current 48GB configuration), so it will gain me some time (and later, if we can't 'fix' it in borg, I think I can make it work by keeping a smaller number of backups in the repo before pruning). But finding why/how is borg using up all this memory would really be best, for everyone IMHO. |
The hashindex entries are already quite dense: key 256bit, some 32bit ints - no fat to get rid of here. The python dict's internal behaviour / python and kernel memory management is not under our control, so we have to live with that. The files cache size is proportional to the amount of files and also a bit to their size, not much we can do about that. What one can do as a emergency measure if memory needs is beyond the available RAM, is to switch off the files cache - but that has a huge performance impact as every file will need to be read/chunked/hashed at backup time. So better than nothing, but not really good for huge amounts of data. |
Did somebody write that "out of borg" test program yet? I have an idea how maybe that loop code could be optimized, but I need a way to test / measure the effect. |
@ThomasWaldmann, I think that, given the apparent effect of memory pressure on this issue, the program will need some way to simulate that too (perhaps by allocating memory with the records read/writen) or it will probably not be able to reproduce the problem. Perhaps you have that in mind already... I'm just making sure it doesn't go by unnoticed. |
Hello everyone, specially @ThomasWaldmann, @textshell and @horazont: It seems I just found a surprising workaround for the issue: hold 2GB extra memory in the ZFS ARC before starting the For more details, see the paste of my announcing it on the borg's IRC channel: http://rn0.me/show/tnEllC18pOCm27Npt6C9/ If it keeps working during the next couple of |
Aaaand.... it worked! AGAIN! yay! ;-) Here are more details on what I did:
This My hypothesis to explain the above is that, when "unused" memory (or rather, "unused + buffers", which do not include the ZFS ARC) is too small, either the Python interpreter or the system runtime (or both, acting in concert) try really hard to free as much of its memory as possible (perhaps triggering Python GC, resulting in multiple madvise() calls to the kernel, etc) greatly slowing down the system. But, when there's enough (eg, 2G+) in " unused + buffers", it stops happening (perhaps after the current lets-desperately-try-and-free-some-memory-and-or-GC-and-or-whatever cycle finishes, which would explain the delay I observed in 4) above. To try and help clarify this, I will do as suggested by @textshell on IRC, and tonight I will try to use If all goes well, I shall be posting them here tomorrow morning. Cheers, |
Hello everyone, A new day, a new successful backup ;-) Exactly the same thing happened this time as in the last two runs: As stated previously, I tried to profile the process using If/when the py-spy developers (or anyone else; please feel free to chime in) tell me how to work around its inability to find borg's Python interpreter, I will try and take a few profiles; meanwhile, unfortunately I have my hands tied. Cheers, |
Now try again without reducing ZFS ARC memory while borg is running and check if you see the same speed fluctuations (which could also be caused by different file sizes and hashtable state). |
@ThomasWaldmann I was expecting the issue to turn up again this weekend (when I would have a large enough window for waiting it to take 30h for the backup) but it did not happen: speed in files/sec never dipped under 50%, so I didn't intervene with releasing the 2GB of cache and the backup finished in a record 15.01h. Coincidentally or not, this monday (when I would not have the time to run it once more without intervening, in case it needed intervention in the first run) again the speed dip failed to happen. Not sure what could have changed; volume of changed files/bytes are about the same from day to day, absolutely no new software has been installed nor new programs been started on the machine, etc. I hate when problems just disappear all by themselves, but this seems to be what happened so far. I'm pretty sure I can still reproduce the original problem (ie, have
|
When the hashtable gets fuller, speed drops. When it gets over the resize threshold, it will be enlarged and then be quicker again. The threshold is some sort of compromise of speed vs. memory needs. |
I checked for this with @textshell's help and it didn't seem to be the case: the file that would have doubled in size (named "chunks" IIRC) has kept the exact same size since my first |
When it resizes, it does not double in size, but uses a smaller factor depending on the size it already has. Not sure how the python dict resizes (used for the |
@ThomasWaldmann thanks for the explanation. But the "chunks" file size hasn't changed at all... this eliminates the hypothesis it has been resized, right? |
If it has the exact same bytesize, yeah, then it likely has not been resized (there is some processing, therefore the |
py-spy's author @benfred kindly produced a working py-spy binary for me (see benfred/py-spy#207 (comment) for more details, and I managed to capture a 15-minute profile of a working When I see one of the big slowdows/lapses again, I will capture another profile; please verify if the details/layout/etc of the one below is sufficient or if you need any other options/modes/etc for me to call py-spy with. |
Can't read all of it, but it seems to mostly do xattr stuff. If you have a lot of unchanged files, it does not deal with content data (as it'll know they're unchange), but it reads the metadata from the fs to create the archive's metadata stream. So: stat data, xattr metadata, acl metadata (also in xattrs on linux). |
I'm not sure if the svg is better (e.g. tooltips/titles) but the png seems mostly clipped. |
Hello @ThomasWaldmann, @textshell:
Yeah, I noticed that too. But the SVG was not any better (at least not with the default visualizer in my system). Will post the next SVG somewhere for you guys to check.
Just captured them all (flamegraph, speedscope and raw) during the latest http://durval.com/xfer-only/borg_create_20191218L210000_at_20191219L105904_-_FLAMEGRAPH.svg Please confirm which of them is adequate, and if the parameters etc. are OK or need changing, so I can be ready the next time the slowdown occurs. Thanks, |
Just as a side note: the borg available via brew recently had a fundamental change: It is not using the pyinstaller-made fat binary any more, but now installs borg in a more conventional way. |
Thanks for the heads-up, @ThomasWaldmann. Just checked, and I already have a brew-installed python3.7. So, the next time the issue turns up, I just need to make sure I have a new(er) brew-installed borg, and debugging will be much easier. It hasn't happened since I last reported, possibly because we upgraded the machine to 96GB RAM. Will keep this issue posted. |
@DurvalMenezes Is this still an issue with a current macOS, zfs, python and borg version (like 1.2.x)? |
@ThomasWaldmann, thanks for inquiring back. After the machine was upgraded to 96GB RAM, the issue never repeated itself, despite running borg-backup every single day in otherwise (as much as we could tell) the exact same conditions. Alas, the machine (along with the entire network it ran on) was decommissioned last October when the company that owned it moved its entire production process 'to the cloud', so I'm no longer in a position to do any further tests :-/ Therefore, I'm closing this issue, and thank you again and the whole borg-backup community for the excellent software and support. |
On MacOSX,
borg create
slows down terribly during "Saving files cache", tracing the process shows tons of madvise() and very few write() syscallsHave you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
BUG/ISSUE
System information. For client/server mode post info for both machines.
Client only (no server).
Your borg version (borg -V).
borg 1.1.10
Operating system (distribution) and version.
Mac OSX High Sierra, output of
sw_vers
command:Hardware / network configuration, and filesystems used.
Power Mac 2013 with Xeon 4c/8t CPU and 48GB RAM. Filesystem is OpenZFS On MacOSX (O3X).
How much data is handled by borg?
~35T bytes on ~64M files
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg create --checkpoint-interval=10800 -v --show-rc --progress --stats ::20191204L210000
Describe the problem you're observing.
This is my first 'subsequent' archive, ie the first one after the initial one that took over 10 days to complete.
Initial estimates (based on the number of bytes (O) and files (N) displayed by borg) pointed to a 22-hour ETA, so I left it to run and came back after 12 hours to have a look. The last line output was "Saving files cache". I then waited for it to resume backing up files... and waited... and waited some more.
ps auwx
showed the borg process using up CPU butiostats
showed negligible I/O; there were over 18GB totally free RAM at the time (ie, not even used by the OS/ZFS disk caches).Only after about 4 more hours(!) did it finish with the cache-saving and got back to backing up files.
After 3 more hours (as indicated by
--checkpoint-interval=10800
), sure enough it displayed "Saving files cache" again... and again spent hours and hours there. Meanwhile me and @textshell on IRC tried everything we could think of to diagnose the issue, arriving at no definite conclusion;dtruss
showed the borg process doing a LOT (tens of thousands per minute)madvise()
syscalls, and very few (less than 2)write()
syscalls; that led us to especulate it might have something to do with Python's garbage collection or otherwise memory management. We then tried some syscall profiling and dumping stack frames (both usingdtrace
andkill -USR2
on the borg process), but arrived at no definite conclusion.Please see the entire IRC conversation, with pastes etc, on http://rn0.me/show/fz07NsoKpH9eVb71H28u/
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yes, I can reproduce it anytime; all I need to do is to rollback the ZFS dataset containing the borg repo and cache to the state I snapshotted before running the above command, and then run it again.
Include any warning/errors/backtraces from the system logs
No errors were produced nor show. We pried out some backtraces with both dtrace (OS task-level) and
kill -USR2
(Python level), see above URL.Not including the CRUD benchmark results as I'm currently running a new
borg create
on the machine (after killing the previous one withkill -9
-- a simple ^C didn't terminate it, even after waiting for over 5 minutes), this time with--checkpoint-interval=3600000
so as to try and avoid the problem by doing no checkpoints at all (the expected run time is much much lower than the 1000 hours checkpoint period indicated by that option). But as contained in the above IRC log, I tested acp -rp
of the entire borg cache directory to a new directory on the same filesystem, and it took less than 3m23s to complete (so definitely there's no I/O bottleneck here that could explain this issue).The text was updated successfully, but these errors were encountered: