-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sem: change sem wait to atomic operation #14465
Conversation
Add sem_wait fast operations, use atomic to ensure atomicity of semcount operations, and do not depend on critical section. Test with robot: before modify: nxmutex_lock cost: 78 ns nxmutex_unlock cost: 82 ns after modify: nxmutex_lock cost: 28 ns nxmutex_unlock cost: 14 ns Signed-off-by: zhangyuan29 <zhangyuan29@xiaomi.com>
[Experimental Bot, please feedback here] The PR summary is somewhat informative, but the rest of the information is insufficient to determine if it meets NuttX requirements. Here's a breakdown: What's Good:
What's Missing:
Conclusion: The PR needs significant improvements to meet NuttX requirements. It needs to be more descriptive, particularly in the "Impact" section and provide specific evidence of testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor comments
if (sem->flags & SEM_TYPE_MUTEX) | ||
{ | ||
short old = 0; | ||
if (atomic_compare_exchange_weak_explicit(NXSEM_COUNT(sem), &old, 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us wrapper inline function named nxsem_post_fast
https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c#L177-L182
if (sem->flags & SEM_TYPE_MUTEX) | ||
{ | ||
short old = 1; | ||
if (atomic_compare_exchange_weak_explicit(NXSEM_COUNT(sem), &old, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
*/ | ||
|
||
#if !defined(CONFIG_PRIORITY_INHERITANCE) && !defined(CONFIG_PRIORITY_PROTECT) | ||
if (sem->flags & SEM_TYPE_MUTEX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move SEM_TYPE_MUTEX related logic inside mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the atomic operation for count is implemented in the lib mutex, when mutex_lock is called, it may exit early due to fast_lock, which can result in the sem->count value not being updated. This, in turn, can cause mutex_unlock to fail.
{ | ||
sem->semcount = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR should be split to two, one is replace semaphore count type to atomic, another one is fast mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If fast mutexes are implemented the implementation should be in user space (i.e. libc) for it to benefit memory protected builds as well. There taking the lock (in our case, semaphore) is a very expensive operation via syscall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this patch only optimizes the mutex performance of the kernel, while the performance optimization of the userspace will be implemented in subsequent patches.
* Pre-processor Definitions | ||
****************************************************************************/ | ||
|
||
#define NXSEM_COUNT(s) ((FAR atomic_short *)&(s)->semcount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should semcount be atomic_short to begin with ? What happens if the compiler / architecture cannot handle datatypes that are not of natural width atomically without locking ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In c code, the type of atomic_short is always short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but my point is, is short guaranteed to be atomic without locking ? atomic_short can be forwarded to stdatomic which can use locks to implement the read-modify-write. On some architectures only the natural datawidth can be handled atomically by the hardware.
So should we use u64 as semcount for 64-bit architectures, u32 for 32-bit architectures etc?
* else try to get it in slow mode. | ||
*/ | ||
|
||
#if !defined(CONFIG_PRIORITY_INHERITANCE) && !defined(CONFIG_PRIORITY_PROTECT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the fast path is usable only when priority inheritance is not used? For mutexes you could actually set the holder atomically without locking (by setting PID) although for semaphores this is not the case (semaphores can have several holders).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fast path cannot used when priority inheritance enabled, because holder function need enter critical section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For semaphores this is true, as semaphores can have multiple holders, but for mutexes you can set PID atomically without locking / critical section, as mutexes will only have 1 holder.
@pussuw should we merge this patch and optimize priority inheritance and semaphore later? |
Fine by me |
Hi @zyfeier , this PR broke Steps to reproduce
Output:
The crash's stack call is not useful. Can you please take a look? |
@tmedicci we don't have an ESP32 board. can we replicate it using QEMU? |
@tmedicci We cannot reproduce this issue with other boards and qemu which use xtensa arch, and we also do not have an ESP32 board. Could you provide the ELF file, or could you help debug using J-Link to gather more information? Thanks. |
Hi! Of course, check the backtrace using the J-Link:
It fails to run |
Regressions caused by signedness issues in "sem: change sem wait to atomic operation". (apache#14465) An alternative would be to make these atomic macros propagate signedness using the typeof() GCC/clang extension. I'm not inclined to do so because typeof is not so portable though. As we can unlikely require "real" C11 atomics in the foreseeable future, maybe we should use a different set of names from C11 to avoid confusions.
/* The following operations must be performed with interrupts | ||
* disabled because sem_post() may be called from an interrupt | ||
* handler. | ||
*/ | ||
|
||
flags = enter_critical_section(); | ||
|
||
sem_count = sem->semcount; | ||
sem_count = atomic_fetch_add(NXSEM_COUNT(sem), 1); | ||
|
||
/* Check the maximum allowable value */ | ||
|
||
if (sem_count >= SEM_VALUE_MAX) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the overflown value is already visible to other threads at this point, isn't it?
isn't it a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it's safer to use compare xchg.
Using JTAG to single step through the
At this point, the register values are as expection:
The sem value is But the problem might be that the iram is not compatiable with It this is the case, one solution might be:
|
@tmedicci Found this statement in "Xtensa ® LX7 MicroprocessorData Book"
|
depending on how the instruction is actually used, it might or might not easy to |
Summary: - In apache#14465, atomic_compare_exchange_weak_explicit() was newly introduced in semaphore. Howerver, cxd56xx has an issue with the API if SMP is enabled (see up_testset2 in cxd56_testset.c). - This commit fixes the issue by using LIBC_ARCH_ATOMIC. Impact: - Only cxd56xx SoCs in SMP mode. Testing: - Tested with spresense:smp, spresense:wifi_smp - NOTE: If DEBUG_ASSERTIONS is enabled assert would be happend. I think this might be another issue. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary: - In apache#14465, atomic_compare_exchange_weak_explicit() was newly introduced in semaphore. However, cxd56xx has an issue with the API if SMP is enabled (see up_testset2 in cxd56_testset.c). - This commit fixes the issue by using LIBC_ARCH_ATOMIC. Impact: - Only cxd56xx SoCs in SMP mode. Testing: - Tested with spresense:smp, spresense:wifi_smp - NOTE: If DEBUG_ASSERTIONS is enabled assert would be happend. I think this might be another issue. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary: - In apache#14465, atomic_compare_exchange_weak_explicit() was newly introduced in semaphore. However, cxd56xx has an issue with the API if SMP is enabled (see up_testset2 in cxd56_testset.c). - This commit fixes the issue by using LIBC_ARCH_ATOMIC. Impact: - Only cxd56xx SoCs in SMP mode. Testing: - Tested with spresense:smp, spresense:wifi_smp - NOTE: If DEBUG_ASSERTIONS is enabled assert would be happend. I think this might be another issue. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary: - In #14465, atomic_compare_exchange_weak_explicit() was newly introduced in semaphore. However, cxd56xx has an issue with the API if SMP is enabled (see up_testset2 in cxd56_testset.c). - This commit fixes the issue by using LIBC_ARCH_ATOMIC. Impact: - Only cxd56xx SoCs in SMP mode. Testing: - Tested with spresense:smp, spresense:wifi_smp - NOTE: If DEBUG_ASSERTIONS is enabled assert would be happend. I think this might be another issue. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary
Add sem_wait fast operations, use atomic to ensure atomicity of semcount operations, and do not depend on critical section.
Test with robot board:
before modify:
nxmutex_lock cost: 78 ns
nxmutex_unlock cost: 82 ns
after modify:
nxmutex_lock cost: 28 ns
nxmutex_unlock cost: 14 ns
Impact
semaphore
Testing
bes board with monkey test pass
sabre-6quad:smp with ostest pass