-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hung task on zpool import #65
Comments
I've got a hunch this is related to the l2arc devices we've added in and have started testing with. In particular I think we're likely stuck on the l2arc_dev_mtx mutex, but we'ld need to resolve l2arc_feed_thread+0xf3 to be sure. We might also be stuck on the l2arc_feed_thr_lock mutex but I think that's less likely. This might be related to the pool being misconfigured. If you look above you'll see that disk B23 is part of both the 'logs' and 'cache' vdev. I'm very surprised the zpool command allowed you to do that. |
# zeno5 /root > zpool create -f lustre-zeno5 raidz2 A8 B8 C8 D8 E8 F8 G8 H8 I8 J8 raidz2 A9 B9 C9 D9 E9 F9 G9 H9 I9 J9 raidz2 A10 B10 C10 D10 E10 F10 G10 H10 I10 J10 raidz2 A11 B11 C11 D11 E11 F11 G11 H11 I11 J11 raidz2 A12 B12 C12 D12 E12 F12 G12 H12 I12 J12 raidz2 A13 B13 C13 D13 E13 F13 G13 H13 I13 J13 raidz2 A14 B14 C14 D14 E14 F14 G14 H14 I14 J14 log G15 H15 cache G15 I15 J15 cannot create 'lustre-zeno5': one or more vdevs refer to the same device, or one of the devices is part of an active md or lvm device Indeed it does not allow it. Perhaps something got corrupted or there was an error importing the zpool. The above layout was not from one of the the nodes that hung. I'm not sure if those nodes also had the duplicate disk problem, as their zpools are no longer accessible. I'll do more testing to see if this can be reproduced. |
I can't reproduce this, and we haven't seen it since. I'm closing this bug due to a lack of information, we'll open a new one if we see this failure again. |
The spl_task structure was renamed to taskq_ent, and all of its fields were renamed to have a prefix of 'tqent' rather than 't'. This was to align with the naming convention which the ZFS code assumes. Previously these fields were private so the name never mattered. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65
To lay the ground work for introducing the taskq_dispatch_prealloc() interface, the tq_work_list and tq_threads fields had to be replaced with new alternatives in the taskq_t structure. The tq_threads field was replaced with tq_thread_list. Rather than storing the pointers to the taskq's kernel threads in an array, they are now stored as a list. In addition to laying the ground work for the taskq_dispatch_prealloc() interface, this change could also enable taskq threads to be dynamically created and destroyed as threads can now be added and removed to this list relatively easily. The tq_work_list field was replaced with tq_active_list. Instead of keeping a list of taskq_ent_t's which are currently being serviced, a list of taskq_threads currently servicing a taskq_ent_t is kept. This frees up the taskq_ent_t's tqent_list field when it is being serviced (i.e. now when a taskq_ent_t is being serviced, it's tqent_list field will be empty). Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65
Added another splat taskq test to ensure tasks can be recursively submitted to a single task queue without issue. When the taskq_dispatch_prealloc() interface is introduced, this use case can potentially cause a deadlock if a taskq_ent_t is dispatched while its tqent_list field is not empty. This _should_ never be a problem with the existing taskq_dispatch() interface. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65
This patch implements the taskq_dispatch_prealloc() interface which was introduced by the following illumos-gate commit. It allows for a preallocated taskq_ent_t to be used when dispatching items to a taskq. This eliminates a memory allocation which helps minimize lock contention in the taskq when dispatching functions. commit 5aeb94743e3be0c51e86f73096334611ae3a058e Author: Garrett D'Amore <garrett@nexenta.com> Date: Wed Jul 27 07:13:44 2011 -0700 734 taskq_dispatch_prealloc() desired 943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65
The splat-taskq test functions were slightly modified to exercise the new taskq interface in addition to the old interface. If the old interface passes each of its tests, the new interface is exercised. Both sub tests (old interface and new interface) must pass for each test as a whole to pass. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #65
As of the removal of the taskq work list made in commit: commit 2c02b71 Author: Prakash Surya <surya1@llnl.gov> Date: Mon Dec 5 17:32:48 2011 -0800 Replace tq_work_list and tq_threads in taskq_t To lay the ground work for introducing the taskq_dispatch_prealloc() interface, the tq_work_list and tq_threads fields had to be replaced with new alternatives in the taskq_t structure. the comment above taskq_wait_check has been incorrect. This change is an attempt at bringing that description more in line with the current implementation. Essentially, references to the old task work list had to be updated to reference the new taskq thread active list. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65
Added call to hide Plymouth when error shell is launched.
[cstor#21] thread to update txg every 10 mins
Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>
Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>
Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>
[cstor#21] thread to update txg every 10 mins
Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>
Saw this after a failed attempt to destroy a dataset and a reboot (see #66). After reboot, tried to import the volume with "zpool import lustre-zeno1 -d /dev/disk/zpool",
which hung. Had built a Lustre filesystem on the dataset and filled it to 100%.
The zpool was layed out like this.
The text was updated successfully, but these errors were encountered: