Add support for rw semaphore changes under PREEMPT_RT_FULL #589

clefru · 2016-12-11T09:50:37Z

The main complication from the RT patch set is that the RW semaphore locks change in that read locks on an rwsem can be taken only by a single thread. All other threads are locked out. This single thread can take a read lock multiple times though. The underlying implementation changes to a mutex with an additional read_depth count.

The implementation can be best understood by inspecting the RT patch. rwsem_rt.h and rt.c give the best insight into how RT rwsem works. My implementation for rwsem_tryupgrade is basically an inversion of rt_downgrade_write found in rt.c. Please see the comments in the code.

Unfortunately, I have to drop SPLAT rwlock test4 completely as this test tries to take multiple locks from different threads, which RT rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh and zfs-tests.sh pass on my Debian-testing VM with linux-image-4.8.0-1-rt-amd64.

Assuming this PR is the right direction, I'll add a test to execute "rw_enter(rwp, RW_READER); rw_enter(rwp, RW_READER); ASSERT(rwsem_tryupgrade(rwp) == -EBUSY);" in the same thread.

kernelOfTruth · 2016-12-15T14:35:52Z

Testing right now with 4.8.14-rt9, runs fine so far, thanks 👍

behlendorf

The key idea here looks good to me. I particularly like that this is a low risk change in the sense that all non-RT will be entirely unaffected.

Just some style issues and a question about the ASSERT.

behlendorf · 2016-12-16T01:48:04Z

include/linux/rwsem_compat.h

-#ifdef CONFIG_RWSEM_GENERIC_SPINLOCK
+#if defined(CONFIG_PREEMPT_RT_FULL)
+#define SPL_RWSEM_SINGLE_READER_VALUE   (1)
+#define SPL_RWSEM_SINGLE_WRITER_VALUE   (0)


[cstyle] #define followed by space instead of tab

behlendorf · 2016-12-16T01:48:38Z

include/linux/rwsem_compat.h

@@ -36,7 +39,9 @@
 #endif

 /* Linux 3.16 changed activity to count for rwsem-spinlock */
-#if defined(HAVE_RWSEM_ACTIVITY)
+#if defined(CONFIG_PREEMPT_RT_FULL)
+#define RWSEM_COUNT(sem)	sem->read_depth


[cstyle] #define followed by space instead of tab

behlendorf · 2016-12-16T01:50:42Z

include/sys/rwlock.h

- * lock is held.  On other platforms the lock is never released during
- * the upgrade process.  This is necessary under Linux because the kernel
- * does not provide an upgrade function.
- */


We really should have dropped this comment in f58040c. I'm OK with removing it in this patch just make sure to update the commit comment to mention the above commit where it should have been dropped.

I've split the commit in two.

behlendorf · 2016-12-16T01:52:16Z

module/spl/spl-rwlock.c

+	// the second attempt. Therefore the implementation allows a
+	// single thread to take a rwsem as read lock multiple times
+	// tracking that nesting as read_depth counter.
+	if(rwsem->read_depth <= 1) {


[cstyle] Missing space after if. Should be if (rwsem...).

behlendorf · 2016-12-16T01:55:04Z

module/spl/spl-rwlock.c

+	// read lock twice, as the mutex would already be locked on
+	// the second attempt. Therefore the implementation allows a
+	// single thread to take a rwsem as read lock multiple times
+	// tracking that nesting as read_depth counter.


[cstyle] This should be a C89 style block comment.

/* * Under the realtime patch series... * ... * tracking that nesting as read_depth counter. */

behlendorf · 2016-12-16T01:55:39Z

module/spl/spl-rwlock.c

+	if(rwsem->read_depth <= 1) {
+		// In case, the current thread has not taken the lock more
+		// than once as read lock, we can allow an upgrade to a write
+		// lock. rwsem_rt.h implements write locks as read_depth == 0.


[cstyle] C89 comments should be used here and elsewhere.

behlendorf · 2016-12-16T02:05:02Z

module/spl/spl-rwlock.c

+static int
+__rwsem_tryupgrade(struct rw_semaphore *rwsem)
+{
+	ASSERT(rt_mutex_owner(&rwsem->lock) != current);


Don't you mean ==? The lock can only be upgraded when your holding it as a reader. Which would be exclusive in this case.

ASSERT(rt_mutex_owner(&rwsem->lock) == current);

Yes, this is absolutely inverted. Fixed. The owner must be the current thread otherwise it's not legal to call the upgrade method. (Same condition as in the downgrade code).

@clefru this puzzled me too, initially. But, wasn't the first statement right? If you are trying to upgrade, you are only a reader, hence not a owner.
EDITED

I just reinspected 0124-rt-Add*.patch linked above. I am pretty sure the ASSERT is correct as is. Just search for rt_mt_owner in that file and you see a lot of comments that explain the semantic of that field, in particular that rt_mutex_owner(lock) == current when current owns the lock.

I also found why my code was wrong. My first try had the same guard BUG_ON(rt_mutex_owner(&rwsem->lock) != current); as the rt_downgrade_write method, but BUG_ON isn't available in spl, so I changed it to ASSERT(..) forgetting to invert the conditional.

This is somewhat confusing because with the RT kernel there really is no such thing as a read lock. They're all exclusive mutexs and thus have owners even when only held for read. Offhand I can't think of any place in the ZFS code where this change in semantics will result in a problem but it's something we should keep in mind.

@clefru thanks for updating this, the BUG_ON -> ASSERT inversion makes perfect sense.

behlendorf · 2016-12-16T02:07:26Z

module/splat/splat-rwlock.c

@@ -207,6 +207,10 @@ splat_rwlock_test1(struct file *file, void *arg)
 	rw_thr_t rwt[SPLAT_RWLOCK_TEST_COUNT];
 	rw_priv_t *rwp;

+#if defined(CONFIG_PREEMPT_RT_FULL)
+	// This test will never succeed on PREEMPT_RT_FULL because locks can only be held by a single thread.


[cstyle] 80 character limit

behlendorf · 2016-12-16T02:07:47Z

module/splat/splat-rwlock.c

+#if defined(CONFIG_PREEMPT_RT_FULL)
+	// This test will never succeed on PREEMPT_RT_FULL because locks can only be held by a single thread.
+	return 0;
+#endif


I'd suggest turning this in to an #else clause for readability.

f58040c.

clefru · 2016-12-17T17:00:08Z

I haven't found a way to run cstyle quickly, so I just visually inspected the patch. Pardon me if I have not spotted a mistake. Offtopic: Is there a good way to get emacs to do spl/zfs style?

clefru · 2016-12-17T19:06:36Z

FYI on retesting the patch with zpios-sanity.sh, I hit an occasional deadlock. I haven't looked into that yet.

[  726.461501] INFO: task zpios_io/0:3844 blocked for more than 120 seconds.
[  726.461525]       Tainted: P           OE   4.8.0-1-rt-amd64 #1
[  726.461558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  726.461583] zpios_io/0      D ffff97f3bfce9100     0  3844      2 0x00000000
[  726.461587]  ffff97f3a6a1af40 ffff97f3ba7d8000 ffffffff83ea94b5 ffff97f3a62dc000
[  726.461591]  ffff97f3a6a1af40 ffff97f3ad48daa8 ffff97f3ad48dad0 ffff97f3a1ea6000
[  726.461594]  ffff97f3ad48da00 ffffffff8440db23 ffff97f3a6a1af40 ffff97f3a62dbec8
[  726.461598] Call Trace:
[  726.461601]  [<ffffffff83ea94b5>] ? preempt_count_add+0x5/0xa0
[  726.461604]  [<ffffffff8440db23>] ? schedule+0x43/0xd0
[  726.461606]  [<ffffffffc09b0df3>] ? zpios_thread_main+0x533/0xa20 [zpios]
[  726.461610]  [<ffffffffc09b08c0>] ? zpios_dmu_object_create+0x120/0x120 [zpios]
[  726.461613]  [<ffffffff83ea2bcd>] ? kthread+0xcd/0xf0
[  726.461616]  [<ffffffff844118ef>] ? ret_from_fork+0x1f/0x40
[  726.461618]  [<ffffffff83ea2b00>] ? kthread_worker_fn+0x150/0x150

behlendorf

LGTM. There are still a few remaining style issues but I can fix those in the merge. We haven't yet had a change to cleanup the SPL for cstyle without automated the checking so it's easy to get wrong.

Commit f58040c should have removed this comment which is no longer relevant. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Issue #589

Commit f58040c should have removed this comment which is no longer relevant. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Issue openzfs#589

The main complication from the RT patch set is that the RW semaphore locks change such that read locks on an rwsem can be taken only by a single thread. All other threads are locked out. This single thread can take a read lock multiple times though. The underlying implementation changes to a mutex with an additional read_depth count. The implementation can be best understood by inspecting the RT patch. rwsem_rt.h and rt.c give the best insight into how RT rwsem works. My implementation for rwsem_tryupgrade is basically an inversion of rt_downgrade_write found in rt.c. Please see the comments in the code. Unfortunately, I have to drop SPLAT rwlock test4 completely as this test tries to take multiple locks from different threads, which RT rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh and zfs-tests.sh pass on my Debian-testing VM with the kernel linux-image-4.8.0-1-rt-amd64. Tested-by: kernelOfTruth <kerneloftruth@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Closes openzfs/zfs#5491 Closes openzfs#589 Closes openzfs#308

clefru mentioned this pull request Dec 11, 2016

Unable to compile on 3.10-3-rt-amd64 #308

Closed

behlendorf mentioned this pull request Dec 15, 2016

Dummy commit to test SPL #589 openzfs/zfs#5491

Closed

behlendorf suggested changes Dec 16, 2016

View reviewed changes

clefru force-pushed the rt-support branch 3 times, most recently from ee7fb7e to 610207d Compare December 17, 2016 16:15

Remove stale comment that should been dropped with

0636e96

f58040c.

clefru force-pushed the rt-support branch from 610207d to df9dfda Compare December 17, 2016 16:28

Add support for rw semaphore changes under PREEMPT_RT_FULL

3d96ef4

clefru force-pushed the rt-support branch from df9dfda to 3d96ef4 Compare December 17, 2016 16:29

behlendorf approved these changes Dec 19, 2016

View reviewed changes

behlendorf closed this in 8e99d66 Dec 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for rw semaphore changes under PREEMPT_RT_FULL #589

Add support for rw semaphore changes under PREEMPT_RT_FULL #589

clefru commented Dec 11, 2016 •

edited

Loading

kernelOfTruth commented Dec 15, 2016

behlendorf left a comment

behlendorf Dec 16, 2016

behlendorf Dec 16, 2016

behlendorf Dec 16, 2016

clefru Dec 17, 2016

behlendorf Dec 16, 2016

behlendorf Dec 16, 2016

behlendorf Dec 16, 2016

behlendorf Dec 16, 2016 •

edited

Loading

clefru Dec 17, 2016

ironMann Dec 17, 2016 •

edited

Loading

clefru Dec 17, 2016

behlendorf Dec 19, 2016

behlendorf Dec 16, 2016

behlendorf Dec 16, 2016

clefru commented Dec 17, 2016 •

edited

Loading

clefru commented Dec 17, 2016 •

edited

Loading

behlendorf left a comment

Add support for rw semaphore changes under PREEMPT_RT_FULL #589

Add support for rw semaphore changes under PREEMPT_RT_FULL #589

Conversation

clefru commented Dec 11, 2016 • edited Loading

kernelOfTruth commented Dec 15, 2016

behlendorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf Dec 16, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironMann Dec 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clefru commented Dec 17, 2016 • edited Loading

clefru commented Dec 17, 2016 • edited Loading

behlendorf left a comment

Choose a reason for hiding this comment

clefru commented Dec 11, 2016 •

edited

Loading

behlendorf Dec 16, 2016 •

edited

Loading

ironMann Dec 17, 2016 •

edited

Loading

clefru commented Dec 17, 2016 •

edited

Loading

clefru commented Dec 17, 2016 •

edited

Loading