-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple parallel rm get deadlocked #4895
Comments
@tuomari I'd start by checking |
Unfortunately I already booted the system. I will report the problem comes up again. Also, I noticed that the deadlocked directories were created roughly at the same time. The video survaillance software creates directories for each camera for every 10 minutes of data in the format: camid/yy/mm/dd/hh/MM. The stuck processes were removing directories from different cameras, but from the same 10-minute slot:
|
@tuomari If you're running zed, it should record some basic information about events in your syslog. Maybe something interesting wound up there. The problem you're seeing may be exacerbated by, but shouldn't be caused by, the new dnode shedding logic. Some of the |
Unfortunately I was not running zed at the moment. I will report as soon as I have another incident, which can take few days. |
Hmm.. I tried to regenerate the problem by removing an hour worth of directories at once (88 in total), and got this. Not sure if this is related, or not:
|
@dweeezil You were right. There were a lot of delayed IOs in the pool. Am I correct to assume that the pool gets stuck because of vdev 0x6de336ca30fa5d9d. Is there any reason why zfs does not kick it out of the pool after some reasonable timeout period?
|
After a day or so of being stuck the system received a general protection fault.
|
@tuomari did things improve after you identified a problematic drive? |
Have not hit this in a while. Closing as this is most likely fixed... |
I have multiple rm running in parallel on directories ( ~1G & 6000 files per directory ). After a day or so some of the rm processes get stuck with "INFO: task rm:14138 blocked for more than 120 seconds." The rm processes can not be killed, and have not finished after ~6h. This behaviour has now manifested itself 4-5 times. The filesystem can be written to, and deleted from, except the directories where rm processes are stuck on.
I am running ZFS HEAD 25458cb with kernel version 4.5.5. The removed files are on a filesystem and the pool layout is following:
Here are the stack traces of the D-state processes.
The text was updated successfully, but these errors were encountered: