-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest git nightly freezes under load with no log messages (could be xattr & rsync) #2725
Comments
Since you have xattr=on, it's different, but your stack trace is helpful nonetheless. I'll be analyzing it later today. |
Are you sure xattr=sa isn't set? |
Sorry, I posted this late after a trip last night. I was wrong, it's xattr=sa. |
@angstymeat Could you please tell me what your |
So far I've been able to make it crash in about 30 minutes or less anytime I run my backup script, so it appears to be easily reproducible. I haven't seen any corruption. I took a look at the directories after I read #2717, but didn't see a problem. I'm currently running |
The |
@angstymeat Could you please |
Ok, I'm running the jobs right now. I'll post when I've got the info. |
It crashed again, and here are the results:
There was no output when I ran the
I'm rebooting and I'm going to run with |
The backup completed without error with |
@angstymeat Thanks for the report. This could be a simple out-of-memory condition or it could be something deeper. Could you please try it again with the slab limit set to 16KiB and boot your kernel with the |
I think it crashed again, but I got no message in the logs. Instead, rsync is either hanging or getting IO errors. Also, I lost snmpd and I can't open new ssh sessions or shells because it seems like some IO is hanging. I just rebooted it with the session I still had opened. I've got these, now:
|
@angstymeat Those number are pretty much as I'd expect and confirm heavy use of the I think I may have found something. When using the Linux slab and after lots of sa_cache entries are allocated, an |
@angstymeat Regarding the drop caches issue, see #2753 and the related spl issue. If you can reproduce this problem, it might be worth trying those patches given that you're running a pretty new kernel. |
I'll get this tested out tomorrow or Wednesday. Thanks! |
I applied the patch in af59f0e to the latest git from this afternoon, and I've run the backup job twice without a problem. I'm going to let it run at its normal nightly time and keep an eye on it. |
So far, so good. It looks like the patch has solved the problems. |
@angstymeat Thanks for the update. |
Just closing up my old issues that have been solved. |
I don't know if this is related to other other recently-posted issues concerning xattr & rsync, but I thought I would add my problem.
On Thursday I installed the latest git development version of SPL (0.6.3-9_gf9bde4f) and ZFS (0.6.3-77_g6d9036f) on a Dell 515 server that we use for a backup, replacing the 0.6.3 release version that had been on it. It's not vital so I thought it would make a good real-world test, and it has been running for over a year without issue.
Since then the machine will consistently freeze up a few minutes into running our nightly rsyncs. The freeze appears to be I/O related. If I'm logged in I can continue to work with it until I need disk access, then that session hangs forever.
This happens on a Dell 515 server. The OS is Fedora 20 running the 3.16.2-200 kernel (3.16.2-200.fc20.x86_64). The pool is fully upgraded with all of the new features enabled. The pool is compressed, and there is no dedup running. XATTR=on and ACLTYPE=posixacl.
I tried turning xattr off, but it didn't appear to make a difference. I don't see any errors when trying to access files, and scrub appears to run without crashing.
I get no panic or crash dump concerning ZFS or SPL written to the logs. However, I do get the page allocation failure that I've added at the end of this post.
The text was updated successfully, but these errors were encountered: