Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

Add support for rw semaphore changes under PREEMPT_RT_FULL #589

Closed
wants to merge 2 commits into from

Conversation

clefru
Copy link
Contributor

@clefru clefru commented Dec 11, 2016

The main complication from the RT patch set is that the RW semaphore locks change in that read locks on an rwsem can be taken only by a single thread. All other threads are locked out. This single thread can take a read lock multiple times though. The underlying implementation changes to a mutex with an additional read_depth count.

The implementation can be best understood by inspecting the RT patch. rwsem_rt.h and rt.c give the best insight into how RT rwsem works. My implementation for rwsem_tryupgrade is basically an inversion of rt_downgrade_write found in rt.c. Please see the comments in the code.

Unfortunately, I have to drop SPLAT rwlock test4 completely as this test tries to take multiple locks from different threads, which RT rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh and zfs-tests.sh pass on my Debian-testing VM with linux-image-4.8.0-1-rt-amd64.

Assuming this PR is the right direction, I'll add a test to execute "rw_enter(rwp, RW_READER); rw_enter(rwp, RW_READER); ASSERT(rwsem_tryupgrade(rwp) == -EBUSY);" in the same thread.

@kernelOfTruth
Copy link

Testing right now with 4.8.14-rt9, runs fine so far, thanks 👍

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key idea here looks good to me. I particularly like that this is a low risk change in the sense that all non-RT will be entirely unaffected.

Just some style issues and a question about the ASSERT.

#ifdef CONFIG_RWSEM_GENERIC_SPINLOCK
#if defined(CONFIG_PREEMPT_RT_FULL)
#define SPL_RWSEM_SINGLE_READER_VALUE (1)
#define SPL_RWSEM_SINGLE_WRITER_VALUE (0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cstyle] #define followed by space instead of tab

@@ -36,7 +39,9 @@
#endif

/* Linux 3.16 changed activity to count for rwsem-spinlock */
#if defined(HAVE_RWSEM_ACTIVITY)
#if defined(CONFIG_PREEMPT_RT_FULL)
#define RWSEM_COUNT(sem) sem->read_depth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cstyle] #define followed by space instead of tab

* lock is held. On other platforms the lock is never released during
* the upgrade process. This is necessary under Linux because the kernel
* does not provide an upgrade function.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should have dropped this comment in f58040c. I'm OK with removing it in this patch just make sure to update the commit comment to mention the above commit where it should have been dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the commit in two.

// the second attempt. Therefore the implementation allows a
// single thread to take a rwsem as read lock multiple times
// tracking that nesting as read_depth counter.
if(rwsem->read_depth <= 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cstyle] Missing space after if. Should be if (rwsem...).

// read lock twice, as the mutex would already be locked on
// the second attempt. Therefore the implementation allows a
// single thread to take a rwsem as read lock multiple times
// tracking that nesting as read_depth counter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cstyle] This should be a C89 style block comment.

/*
 * Under the realtime patch series...
 * ...
 * tracking that nesting as read_depth counter.
 */

if(rwsem->read_depth <= 1) {
// In case, the current thread has not taken the lock more
// than once as read lock, we can allow an upgrade to a write
// lock. rwsem_rt.h implements write locks as read_depth == 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cstyle] C89 comments should be used here and elsewhere.

static int
__rwsem_tryupgrade(struct rw_semaphore *rwsem)
{
ASSERT(rt_mutex_owner(&rwsem->lock) != current);
Copy link
Contributor

@behlendorf behlendorf Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean ==? The lock can only be upgraded when your holding it as a reader. Which would be exclusive in this case.

ASSERT(rt_mutex_owner(&rwsem->lock) == current);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is absolutely inverted. Fixed. The owner must be the current thread otherwise it's not legal to call the upgrade method. (Same condition as in the downgrade code).

Copy link
Contributor

@ironMann ironMann Dec 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clefru this puzzled me too, initially. But, wasn't the first statement right? If you are trying to upgrade, you are only a reader, hence not a owner.
EDITED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reinspected 0124-rt-Add*.patch linked above. I am pretty sure the ASSERT is correct as is. Just search for rt_mt_owner in that file and you see a lot of comments that explain the semantic of that field, in particular that rt_mutex_owner(lock) == current when current owns the lock.

I also found why my code was wrong. My first try had the same guard BUG_ON(rt_mutex_owner(&rwsem->lock) != current); as the rt_downgrade_write method, but BUG_ON isn't available in spl, so I changed it to ASSERT(..) forgetting to invert the conditional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat confusing because with the RT kernel there really is no such thing as a read lock. They're all exclusive mutexs and thus have owners even when only held for read. Offhand I can't think of any place in the ZFS code where this change in semantics will result in a problem but it's something we should keep in mind.

@clefru thanks for updating this, the BUG_ON -> ASSERT inversion makes perfect sense.

@@ -207,6 +207,10 @@ splat_rwlock_test1(struct file *file, void *arg)
rw_thr_t rwt[SPLAT_RWLOCK_TEST_COUNT];
rw_priv_t *rwp;

#if defined(CONFIG_PREEMPT_RT_FULL)
// This test will never succeed on PREEMPT_RT_FULL because locks can only be held by a single thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[cstyle] 80 character limit

#if defined(CONFIG_PREEMPT_RT_FULL)
// This test will never succeed on PREEMPT_RT_FULL because locks can only be held by a single thread.
return 0;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest turning this in to an #else clause for readability.

@clefru clefru force-pushed the rt-support branch 3 times, most recently from ee7fb7e to 610207d Compare December 17, 2016 16:15
@clefru
Copy link
Contributor Author

clefru commented Dec 17, 2016

I haven't found a way to run cstyle quickly, so I just visually inspected the patch. Pardon me if I have not spotted a mistake. Offtopic: Is there a good way to get emacs to do spl/zfs style?

@clefru
Copy link
Contributor Author

clefru commented Dec 17, 2016

FYI on retesting the patch with zpios-sanity.sh, I hit an occasional deadlock. I haven't looked into that yet.

[  726.461501] INFO: task zpios_io/0:3844 blocked for more than 120 seconds.
[  726.461525]       Tainted: P           OE   4.8.0-1-rt-amd64 #1
[  726.461558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  726.461583] zpios_io/0      D ffff97f3bfce9100     0  3844      2 0x00000000
[  726.461587]  ffff97f3a6a1af40 ffff97f3ba7d8000 ffffffff83ea94b5 ffff97f3a62dc000
[  726.461591]  ffff97f3a6a1af40 ffff97f3ad48daa8 ffff97f3ad48dad0 ffff97f3a1ea6000
[  726.461594]  ffff97f3ad48da00 ffffffff8440db23 ffff97f3a6a1af40 ffff97f3a62dbec8
[  726.461598] Call Trace:
[  726.461601]  [<ffffffff83ea94b5>] ? preempt_count_add+0x5/0xa0
[  726.461604]  [<ffffffff8440db23>] ? schedule+0x43/0xd0
[  726.461606]  [<ffffffffc09b0df3>] ? zpios_thread_main+0x533/0xa20 [zpios]
[  726.461610]  [<ffffffffc09b08c0>] ? zpios_dmu_object_create+0x120/0x120 [zpios]
[  726.461613]  [<ffffffff83ea2bcd>] ? kthread+0xcd/0xf0
[  726.461616]  [<ffffffff844118ef>] ? ret_from_fork+0x1f/0x40
[  726.461618]  [<ffffffff83ea2b00>] ? kthread_worker_fn+0x150/0x150

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are still a few remaining style issues but I can fix those in the merge. We haven't yet had a change to cleanup the SPL for cstyle without automated the checking so it's easy to get wrong.

behlendorf pushed a commit that referenced this pull request Dec 19, 2016
Commit f58040c should have removed
this comment which is no longer relevant.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Issue #589
kernelOfTruth pushed a commit to kernelOfTruth/spl that referenced this pull request Dec 28, 2016
Commit f58040c should have removed
this comment which is no longer relevant.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Issue openzfs#589
kernelOfTruth pushed a commit to kernelOfTruth/spl that referenced this pull request Dec 28, 2016
The main complication from the RT patch set is that the RW semaphore
locks change such that read locks on an rwsem can be taken only by
a single thread.  All other threads are locked out. This single
thread can take a read lock multiple times though. The underlying
implementation changes to a mutex with an additional read_depth
count.

The implementation can be best understood by inspecting the RT
patch.  rwsem_rt.h and rt.c give the best insight into how RT
rwsem works. My implementation for rwsem_tryupgrade is basically
an inversion of rt_downgrade_write found in rt.c. Please see the
comments in the code.

Unfortunately, I have to drop SPLAT rwlock test4 completely as this
test tries to take multiple locks from different threads, which RT
rwsems do not support.  Otherwise SPLAT, zconfig.sh, zpios-sanity.sh
and zfs-tests.sh pass on my Debian-testing VM with the kernel
linux-image-4.8.0-1-rt-amd64.

Tested-by: kernelOfTruth <kerneloftruth@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Closes openzfs/zfs#5491
Closes openzfs#589
Closes openzfs#308
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants