sem: change sem wait to atomic operation #14465

zyfeier · 2024-10-23T01:54:29Z

Summary

Add sem_wait fast operations, use atomic to ensure atomicity of semcount operations, and do not depend on critical section.

Test with robot board:
before modify:
nxmutex_lock cost: 78 ns
nxmutex_unlock cost: 82 ns

after modify:
nxmutex_lock cost: 28 ns
nxmutex_unlock cost: 14 ns

Impact

semaphore

Testing

bes board with monkey test pass
sabre-6quad:smp with ostest pass

Add sem_wait fast operations, use atomic to ensure atomicity of semcount operations, and do not depend on critical section. Test with robot: before modify: nxmutex_lock cost: 78 ns nxmutex_unlock cost: 82 ns after modify: nxmutex_lock cost: 28 ns nxmutex_unlock cost: 14 ns Signed-off-by: zhangyuan29 <zhangyuan29@xiaomi.com>

nuttxpr · 2024-10-23T01:55:39Z

[Experimental Bot, please feedback here]

The PR summary is somewhat informative, but the rest of the information is insufficient to determine if it meets NuttX requirements. Here's a breakdown:

What's Good:

Summary: Clearly states the technical changes made. The performance improvement mentioned is a good addition.
Testing: Mentions specific boards used for testing.

What's Missing:

Summary:
- Why? No explanation of the problem the PR solves. Why were the original semaphore operations slow? Why is atomicity important in this context?
- Issue reference: Is there a related NuttX issue?
Impact:
- User Impact: While the summary mentions semaphore changes, it doesn't explain if users need to adapt their code. Will existing semaphore usage still work correctly?
- Other Impacts: The PR lacks details about potential impacts on build systems, hardware, documentation, security, and compatibility.
Testing:
- Incomplete Information: The PR needs to specify the host operating system, CPU architecture, and compiler used for testing.
- Vague Descriptions: "bes board with monkey test pass" and "sabre-6quad:smp with ostest pass" are too vague. Provide specific details about the tests performed and what "pass" means in each context.
- Missing Logs: The PR should include actual testing logs from before and after the changes, as the template requests.

Conclusion:

The PR needs significant improvements to meet NuttX requirements. It needs to be more descriptive, particularly in the "Impact" section and provide specific evidence of testing.

anchao

LGTM, just some minor comments

anchao · 2024-10-23T04:03:37Z

sched/semaphore/sem_post.c

+  if (sem->flags & SEM_TYPE_MUTEX)
+    {
+      short old = 0;
+      if (atomic_compare_exchange_weak_explicit(NXSEM_COUNT(sem), &old, 1,


let us wrapper inline function named nxsem_post_fast

https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c#L177-L182

anchao · 2024-10-23T04:03:56Z

sched/semaphore/sem_wait.c

+  if (sem->flags & SEM_TYPE_MUTEX)
+    {
+      short old = 1;
+      if (atomic_compare_exchange_weak_explicit(NXSEM_COUNT(sem), &old, 0,


anchao · 2024-10-23T04:10:49Z

sched/semaphore/sem_post.c

+   */
+
+#if !defined(CONFIG_PRIORITY_INHERITANCE) && !defined(CONFIG_PRIORITY_PROTECT)
+  if (sem->flags & SEM_TYPE_MUTEX)


why not move SEM_TYPE_MUTEX related logic inside mutex?

If the atomic operation for count is implemented in the lib mutex, when mutex_lock is called, it may exit early due to fast_lock, which can result in the sem->count value not being updated. This, in turn, can cause mutex_unlock to fail.

anchao · 2024-10-23T04:12:37Z

sched/semaphore/sem_destroy.c

    {
-      sem->semcount = 1;


I think this PR should be split to two, one is replace semaphore count type to atomic, another one is fast mutex

If fast mutexes are implemented the implementation should be in user space (i.e. libc) for it to benefit memory protected builds as well. There taking the lock (in our case, semaphore) is a very expensive operation via syscall.

Yes, this patch only optimizes the mutex performance of the kernel, while the performance optimization of the userspace will be implemented in subsequent patches.

pussuw · 2024-10-23T08:20:43Z

sched/semaphore/semaphore.h

+ * Pre-processor Definitions
+ ****************************************************************************/
+
+#define NXSEM_COUNT(s) ((FAR atomic_short *)&(s)->semcount)


Should semcount be atomic_short to begin with ? What happens if the compiler / architecture cannot handle datatypes that are not of natural width atomically without locking ?

In c code, the type of atomic_short is always short.

Yes but my point is, is short guaranteed to be atomic without locking ? atomic_short can be forwarded to stdatomic which can use locks to implement the read-modify-write. On some architectures only the natural datawidth can be handled atomically by the hardware.

So should we use u64 as semcount for 64-bit architectures, u32 for 32-bit architectures etc?

pussuw · 2024-10-23T08:30:00Z

sched/semaphore/sem_wait.c

+   * else try to get it in slow mode.
+   */
+
+#if !defined(CONFIG_PRIORITY_INHERITANCE) && !defined(CONFIG_PRIORITY_PROTECT)


So the fast path is usable only when priority inheritance is not used? For mutexes you could actually set the holder atomically without locking (by setting PID) although for semaphores this is not the case (semaphores can have several holders).

Yes, fast path cannot used when priority inheritance enabled, because holder function need enter critical section

For semaphores this is true, as semaphores can have multiple holders, but for mutexes you can set PID atomically without locking / critical section, as mutexes will only have 1 holder.

xiaoxiang781216 · 2024-10-29T15:42:48Z

@pussuw should we merge this patch and optimize priority inheritance and semaphore later?

pussuw · 2024-10-29T16:49:47Z

@pussuw should we merge this patch and optimize priority inheritance and semaphore later?

Fine by me

tmedicci · 2024-10-31T20:58:10Z

Hi @zyfeier , this PR broke esp32-devkitc:sotest . The device can't boot anymore:

Steps to reproduce

make -j distclean && ./tools/configure.sh esp32-devkitc:sotest &&  make -j bootloader &&  make flash EXTRAFLAGS="-Wno-cpp -Werror" ESPTOOL_PORT=/dev/ttyUSB0 ESPTOOL_BINDIR=./ -s -j$(nproc) && minicom -D /dev/ttyUSB0

Output:

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3ffb20a0,len:3604
load:0x40080000,len:24576
entry 0x400827ac
*** Booting NuttX ***
I (27) boot: chip revision: v3.0
I (28) boot.esp32: SPI Speed      : 40MHz
I (28) boot.esp32: SPI Mode       : DIO
I (29) boot.esp32: SPI Flash Size : 8MB
I (33) boot: Enabling RNG early entropy source...
dram: lma 0x00001020 vma 0x3ffb20a0 len 0xe14    (3604)
iram: lma 0x00001e3c vma 0x40080000 len 0x6000   (24576)
padd: lma 0x00007e48 vma 0x00000000 len 0x81b0   (33200)
imap: lma 0x00010000 vma 0x400e0000 len 0x17804  (96260)
padd: lma 0x0002780c vma 0x00000000 len 0x880c   (34828)
dmap: lma 0x00030020 vma 0x3f400020 len 0x7f20   (32544)
total segments stored 6
A__esp32_start: ESP32 chip revision is v3.0
Bxtensa_user_panic: User Exception: EXCCAUSE=0003 task: Idle_Task
dump_assert_info: Current Version: NuttX  10.4.0 befe29801f Oct 31 2024 17:56:57 xtensa
dump_assert_info: Assertion failed user panic: at file: common/xtensa_assert.c:180 task: Idle_Task process: Kernel 0x400e15d8
up_dump_register:    PC: 400f0a59    PS: 00060d30
up_dump_register:    A0: 800e392d    A1: 3ffb0af0    A2: 40086000    A3: 00040000
up_dump_register:    A4: 00040000    A5: 40086000    A6: 00040001    A7: 00040001
up_dump_register:    A8: 00000000    A9: ffff0000   A10: 00000001   A11: 00000000
up_dump_register:   A12: 3ffb0cc4   A13: 00000000   A14: 00000000   A15: 00000001
up_dump_register:   SAR: 00000020 CAUSE: 00000003 VADDR: 40086000
up_dump_register:  LBEG: 400e4f50  LEND: 400e4f5d  LCNT: 00000000
dump_stacks: ERROR: Stack pointer is not within the stack
dump_stackinfo: User Stack:
dump_stackinfo:   base: 0
dump_stackinfo:   size: 00000000
stack_dump: 0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dump_tasks:    PID GROUP PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          STACKBASE  STACKSIZE   COMMAND

The crash's stack call is not useful. Can you please take a look?

zyfeier · 2024-11-01T04:41:13Z

Hi @zyfeier , this PR broke esp32-devkitc:sotest . The device can't boot anymore:

Steps to reproduce

make -j distclean && ./tools/configure.sh esp32-devkitc:sotest &&  make -j bootloader &&  make flash EXTRAFLAGS="-Wno-cpp -Werror" ESPTOOL_PORT=/dev/ttyUSB0 ESPTOOL_BINDIR=./ -s -j$(nproc) && minicom -D /dev/ttyUSB0

Output:

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3ffb20a0,len:3604
load:0x40080000,len:24576
entry 0x400827ac
*** Booting NuttX ***
I (27) boot: chip revision: v3.0
I (28) boot.esp32: SPI Speed      : 40MHz
I (28) boot.esp32: SPI Mode       : DIO
I (29) boot.esp32: SPI Flash Size : 8MB
I (33) boot: Enabling RNG early entropy source...
dram: lma 0x00001020 vma 0x3ffb20a0 len 0xe14    (3604)
iram: lma 0x00001e3c vma 0x40080000 len 0x6000   (24576)
padd: lma 0x00007e48 vma 0x00000000 len 0x81b0   (33200)
imap: lma 0x00010000 vma 0x400e0000 len 0x17804  (96260)
padd: lma 0x0002780c vma 0x00000000 len 0x880c   (34828)
dmap: lma 0x00030020 vma 0x3f400020 len 0x7f20   (32544)
total segments stored 6
A__esp32_start: ESP32 chip revision is v3.0
Bxtensa_user_panic: User Exception: EXCCAUSE=0003 task: Idle_Task
dump_assert_info: Current Version: NuttX  10.4.0 befe29801f Oct 31 2024 17:56:57 xtensa
dump_assert_info: Assertion failed user panic: at file: common/xtensa_assert.c:180 task: Idle_Task process: Kernel 0x400e15d8
up_dump_register:    PC: 400f0a59    PS: 00060d30
up_dump_register:    A0: 800e392d    A1: 3ffb0af0    A2: 40086000    A3: 00040000
up_dump_register:    A4: 00040000    A5: 40086000    A6: 00040001    A7: 00040001
up_dump_register:    A8: 00000000    A9: ffff0000   A10: 00000001   A11: 00000000
up_dump_register:   A12: 3ffb0cc4   A13: 00000000   A14: 00000000   A15: 00000001
up_dump_register:   SAR: 00000020 CAUSE: 00000003 VADDR: 40086000
up_dump_register:  LBEG: 400e4f50  LEND: 400e4f5d  LCNT: 00000000
dump_stacks: ERROR: Stack pointer is not within the stack
dump_stackinfo: User Stack:
dump_stackinfo:   base: 0
dump_stackinfo:   size: 00000000
stack_dump: 0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dump_tasks:    PID GROUP PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          STACKBASE  STACKSIZE   COMMAND

The crash's stack call is not useful. Can you please take a look?

@tmedicci we don't have an ESP32 board. can we replicate it using QEMU?

zyfeier · 2024-11-01T08:06:39Z

@tmedicci We cannot reproduce this issue with other boards and qemu which use xtensa arch, and we also do not have an ESP32 board. Could you provide the ELF file, or could you help debug using J-Link to gather more information? Thanks.

tmedicci · 2024-11-01T15:40:52Z

@tmedicci We cannot reproduce this issue with other boards and qemu which use xtensa arch, and we also do not have an ESP32 board. Could you provide the ELF file, or could you help debug using J-Link to gather more information? Thanks.

Hi! Of course, check the backtrace using the J-Link:

#0  nxsem_wait (sem=0x40086140) at semaphore/sem_wait.c:263
#1  0x400e3e0a in nxmutex_lock (mutex=0x40086140) at misc/lib_mutex.c:253
#2  0x400e54b4 in mm_lock (heap=0x40086140) at mm_heap/mm_lock.c:97
#3  0x400e5330 in mm_addregion (heap=0x40086140, heapstart=0x400862b8, heapsize=105800)
    at mm_heap/mm_initialize.c:112
#4  0x400e5471 in mm_initialize (name=<optimized out>, heapstart=0x400862b8, heapsize=105800)
    at mm_heap/mm_initialize.c:287
#5  0x400e64b9 in esp32_iramheap_initialize () at chip/esp32_iramheap.c:62
#6  0x400e64a1 in up_extraheaps_init () at chip/esp32_extraheaps.c:66
#7  0x400e1b6a in nx_start () at init/nx_start.c:603
#8  0x4008291a in __esp32_start () at chip/esp32_start.c:293
#9  __start () at chip/esp32_start.c:358

It fails to run atomic_compare_exchange_weak_explicit when adding the IRAM heap. Pay attention that this heap region is accessible through the instruction bus and any non-word-aligned access would trigger an exception. If the exception triggers, it's treated here and the execution is supposed to return from the moment it was firstly triggered. This function, somehow, makes this mechanism break.

zyfeier · 2024-11-04T07:15:53Z

@tmedicci Could you please help test if this patch #14625 can fix the issue? Thanks.

Regressions caused by signedness issues in "sem: change sem wait to atomic operation". (apache#14465) An alternative would be to make these atomic macros propagate signedness using the typeof() GCC/clang extension. I'm not inclined to do so because typeof is not so portable though. As we can unlikely require "real" C11 atomics in the foreseeable future, maybe we should use a different set of names from C11 to avoid confusions.

yamt · 2024-11-13T06:06:29Z

sched/semaphore/sem_post.c

  /* The following operations must be performed with interrupts
   * disabled because sem_post() may be called from an interrupt
   * handler.
   */

  flags = enter_critical_section();

-  sem_count = sem->semcount;
+  sem_count = atomic_fetch_add(NXSEM_COUNT(sem), 1);

  /* Check the maximum allowable value */

  if (sem_count >= SEM_VALUE_MAX)
    {


the overflown value is already visible to other threads at this point, isn't it?
isn't it a problem?

i guess it's safer to use compare xchg.

chirping78 · 2024-11-14T13:48:50Z

Using JTAG to single step through the nxsem_wait function, it shows that it's not an alignment issue, but the s32c1i instruction caused the exception.

   0x400f0468 <+196>:   or      a14, a13, a9
   0x400f046b <+199>:   or      a8, a9, a9
   0x400f046e <+202>:   wsr.scompare1   a14
=> 0x400f0471 <+205>:   s32c1i  a8, a12, 0
   0x400f0474 <+208>:   beq     a8, a14, 0x400f0480 <nxsem_wait+220>
   0x400f0477 <+211>:   or      a14, a9, a9
   0x400f047a <+214>:   and     a9, a8, a11
   0x400f047d <+217>:   bne     a14, a9, 0x400f0468 <nxsem_wait+196>

At this point, the register values are as expection:

(gdb) p/x $a14
$3 = 0x40001
(gdb) p/x $a12
$4 = 0x40085fd0
(gdb) p/x $a8
$5 = 0x40000

The sem value is 1, and here the code wants to write 0 to it.

But the problem might be that the iram is not compatiable with s32c1i instruction.
@tmedicci you may need to check this with the IC designer.

It this is the case, one solution might be:

in esp32_iramheap_initialize, not directly use mm_initialize, since mm_initialize will put the memory manager struct to the memory header, i.e. a sem will be in iram;
but allocate the memory manager struct from another data heap, such as Umem, then use that memory manager struct to manage iram heap.

chirping78 · 2024-11-15T02:30:14Z

But the problem might be that the iram is not compatiable with s32c1i instruction. @tmedicci you may need to check this with the IC designer.

@tmedicci Found this statement in "Xtensa ® LX7 MicroprocessorData Book"

S32C1I instructions may target cached, cache-bypass, and data RAM memory locations. 
S32C1I instructions are not permitted to access memory addresses in data ROM,
instruction memory or the address region allocated to the XLMI port. Attempts to direct
the S32C1I at these addresses will cause an exception.

yamt · 2024-11-15T06:14:36Z

But the problem might be that the iram is not compatiable with s32c1i instruction. @tmedicci you may need to check this with the IC designer.

@tmedicci Found this statement in "Xtensa ® LX7 MicroprocessorData Book"
S32C1I instructions may target cached, cache-bypass, and data RAM memory locations. 
S32C1I instructions are not permitted to access memory addresses in data ROM,
instruction memory or the address region allocated to the XLMI port. Attempts to direct
the S32C1I at these addresses will cause an exception.

depending on how the instruction is actually used, it might or might not easy to
implement enough emulation in the trap handler.
it might be difficult if the same memory can be modified with non-trapping instruction (s32i) as well.
maybe someone needs to make a feasibility study by reading the disassembly of the relevant code.

Summary: - In apache#14465, atomic_compare_exchange_weak_explicit() was newly introduced in semaphore. Howerver, cxd56xx has an issue with the API if SMP is enabled (see up_testset2 in cxd56_testset.c). - This commit fixes the issue by using LIBC_ARCH_ATOMIC. Impact: - Only cxd56xx SoCs in SMP mode. Testing: - Tested with spresense:smp, spresense:wifi_smp - NOTE: If DEBUG_ASSERTIONS is enabled assert would be happend. I think this might be another issue. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>

Summary: - In apache#14465, atomic_compare_exchange_weak_explicit() was newly introduced in semaphore. However, cxd56xx has an issue with the API if SMP is enabled (see up_testset2 in cxd56_testset.c). - This commit fixes the issue by using LIBC_ARCH_ATOMIC. Impact: - Only cxd56xx SoCs in SMP mode. Testing: - Tested with spresense:smp, spresense:wifi_smp - NOTE: If DEBUG_ASSERTIONS is enabled assert would be happend. I think this might be another issue. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>

Summary: - In #14465, atomic_compare_exchange_weak_explicit() was newly introduced in semaphore. However, cxd56xx has an issue with the API if SMP is enabled (see up_testset2 in cxd56_testset.c). - This commit fixes the issue by using LIBC_ARCH_ATOMIC. Impact: - Only cxd56xx SoCs in SMP mode. Testing: - Tested with spresense:smp, spresense:wifi_smp - NOTE: If DEBUG_ASSERTIONS is enabled assert would be happend. I think this might be another issue. Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>

github-actions bot added Area: OS Components OS Components issues Size: M The size of the change in this PR is medium labels Oct 23, 2024

xiaoxiang781216 approved these changes Oct 23, 2024

View reviewed changes

xiaoxiang781216 requested a review from anchao October 23, 2024 02:24

anchao reviewed Oct 23, 2024

View reviewed changes

pussuw reviewed Oct 23, 2024

View reviewed changes

pussuw approved these changes Oct 29, 2024

View reviewed changes

xiaoxiang781216 merged commit befe298 into apache:master Oct 29, 2024
26 checks passed

tmedicci mentioned this pull request Nov 1, 2024

[BUG] ELF loader on ESP32-S3 broken after #14100 #14487

Closed

1 task

zyfeier mentioned this pull request Nov 4, 2024

sched/semaphore: change semcount type to int #14625

Closed

yamt mentioned this pull request Nov 13, 2024

semaphore: Fix a few regressions #14755

Closed

yamt reviewed Nov 13, 2024

View reviewed changes

zyfeier mentioned this pull request Nov 15, 2024

arch/xtensa: use arch atomic when enable iram heap #14805

Open

masayuki2009 mentioned this pull request Nov 19, 2024

arch: cxd56xx: Fix cxd56xx for SMP #14842

Merged

pussuw mentioned this pull request Nov 19, 2024

arch_atomic: support nx atomic function #14827

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sem: change sem wait to atomic operation #14465

sem: change sem wait to atomic operation #14465

zyfeier commented Oct 23, 2024

nuttxpr commented Oct 23, 2024

anchao left a comment

anchao Oct 23, 2024

anchao Oct 23, 2024

anchao Oct 23, 2024

zyfeier Oct 23, 2024

anchao Oct 23, 2024

pussuw Oct 23, 2024

zyfeier Oct 23, 2024

pussuw Oct 23, 2024 •

edited

Loading

zyfeier Oct 23, 2024

pussuw Oct 23, 2024 •

edited

Loading

pussuw Oct 23, 2024

zyfeier Oct 23, 2024

pussuw Oct 23, 2024

xiaoxiang781216 commented Oct 29, 2024

pussuw commented Oct 29, 2024

tmedicci commented Oct 31, 2024

zyfeier commented Nov 1, 2024 •

edited

Loading

Steps to reproduce

zyfeier commented Nov 1, 2024 •

edited

Loading

tmedicci commented Nov 1, 2024

zyfeier commented Nov 4, 2024

yamt Nov 13, 2024

yamt Nov 13, 2024

chirping78 commented Nov 14, 2024

chirping78 commented Nov 15, 2024

yamt commented Nov 15, 2024 •

edited

Loading

sem: change sem wait to atomic operation #14465

sem: change sem wait to atomic operation #14465

Conversation

zyfeier commented Oct 23, 2024

Summary

Impact

Testing

nuttxpr commented Oct 23, 2024

anchao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pussuw Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pussuw Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiaoxiang781216 commented Oct 29, 2024

pussuw commented Oct 29, 2024

tmedicci commented Oct 31, 2024

Steps to reproduce

zyfeier commented Nov 1, 2024 • edited Loading

Steps to reproduce

zyfeier commented Nov 1, 2024 • edited Loading

tmedicci commented Nov 1, 2024

zyfeier commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chirping78 commented Nov 14, 2024

chirping78 commented Nov 15, 2024

yamt commented Nov 15, 2024 • edited Loading

pussuw Oct 23, 2024 •

edited

Loading

pussuw Oct 23, 2024 •

edited

Loading

zyfeier commented Nov 1, 2024 •

edited

Loading

zyfeier commented Nov 1, 2024 •

edited

Loading

yamt commented Nov 15, 2024 •

edited

Loading