-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure sync memory leak #802
Comments
Is it likely that many of the files already exist at the destination? If so, I believe its a known bug that's on our backlog. Thanks for the clear charts and details. |
Hello,
actually there were no files on the destination, it never gets past scanning and doesn’t upload anything.
I tried the “copy” but that’s not going to work either, its been running for a week and is still
“scanning” .. its up to 9 million files out of 53 million. Would take 6 weeks just to scan. And who knows how long the copy would take.
Going to have to figure something else out to get this data in the cloud … I could export/import
to a Microsoft device I guess but we would not be able to sync it.
Regards
-mjd
[email chain old messages deleted for clarity]
|
Hi @mikejdunphy Sorry for the slow reply. I was away over Christmas/New Year. My guess is that this is not actually a memory leak... but just relatively high memory usage. We've heard of a few other cases where it plateaus in the low GBs. I.e. a bit higher than the 2.x GB you are seeing. Also, I should mention that there are actually two known issues with high file counts:
From your description, I'm not sure whether you're actually suffering from the perf issue where breaking it up into separate jobs would help. It might be just a memory issue. I'd suggest that the following things might help:
Finally, I'm puzzled by this:
When you use copy, it starts copying file as soon as the first 10,000 have been scanned. Did it report any throughput? Did it report any files completed? |
(BTW, I edited the above reply. The first draft mistakenly said that the first known perf issue applies to sync. But its actually to copy with overwrite=false) |
Thanks for all the hints. I can try those I suppose.
After further review, I had a firewall/initernet issue on my end. Yes after the first 10k it started to try and copy
but received a error. The error wasn’t straightforward so I didn’t recognize it as a internet firewall
issue as I mistakenly thought it would not even be able to start/scan and act like it was working.
I verified it does work on a vm that has full internet access.
In any event I am going to start w/ shares less then 1 million files till I get familiar/comfortable
w/ it all. Its still going to take weeks to get the 50+ million files up there and even when it is
up there I really have no way in a reasonable time to “sync” it. I’ll have to come up w/ something
later.
I have plenty of other shares to put up there that have less files so I will start with those.
Regards
-mjd
[prior email messages deleted for clarity]
|
A few other general perf tips for tiny files include:
In some small-files cases, the first 3 tips (combined) can give you something like a doubling of throughput. The last one, premium block blobs, can double or better on top of that - but check the pricing differences because it is priced differently. Performance-wise it a is a very good choice for small blobs. |
Closing due to inactivity. Please open a new issue if you are still experiencing issues with AzCopy. |
azcopy 10.3.3 on linux crashes vm w/ memory leak.
running azcopy sync source destination --recursive
runs for a few hours, scans about 3.4 million and crashes system
eats all swap and memory
repeated 2x
source has 53 million tiny files
The text was updated successfully, but these errors were encountered: