Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake (IDFGH-8766) #10199

Merged
merged 1 commit into from
Nov 29, 2022

Conversation

timoxd7
Copy link
Contributor

@timoxd7 timoxd7 commented Nov 17, 2022

FreeRTOS References:

Commit: FreeRTOS/FreeRTOS-Kernel@4e2bf2c
Pull-Request: FreeRTOS/FreeRTOS-Kernel#576
Problem Description: https://forums.freertos.org/t/15967

This pull-request is purposed to copy the bugfix from the FreeRTOS Repo into esp-idf.
All credit goes to @Erlkoenig90.

Original FreeRTOS Pull Request Description

Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake as the reasoning behind it is wrong; it can trigger on wrongly on highly-contested semaphores on multicore systems. See the forum for details.

Fixes rare deadlock on heavy loaded multicore-systems.
@CLAassistant
Copy link

CLAassistant commented Nov 17, 2022

CLA assistant check
All committers have signed the CLA.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Nov 17, 2022
@github-actions github-actions bot changed the title Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake (IDFGH-8766) Nov 17, 2022
@timoxd7 timoxd7 marked this pull request as ready for review November 17, 2022 19:42
@Alvin1Zhang
Copy link
Collaborator

Thanks for your contribution.

Copy link
Contributor

@Dazza0 Dazza0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timoxd7 Changes LGTM.

I think the issue described in the post is a manifestation of more general "spinlock acquire race condition" that can occur on SMP systems under the current FreeRTOS queue/semaphore/mutex implementation.

Basically, because there is a gap in the critical section between each loop iteration (see below), this can lead to some race conditions when there are multiple cores running concurrently.

BaseType_t xQueueSend/Receive/SemaphoreTake(...)
{
    // Main fucntion loop
    for(;;) {
        taskENTER_CRITICAL();
        // Check for empty/full queue/semaphore/mutex
        taskEXIT_CRITICAL();

        // Handle timeout stuff
        taskENTER_CRITICAL();
        if (timed_out) {
            portYIELD_WITHIN_API();
        }
        taskEXIT_CRITICAL();
        /*
        - If yielding, it occurs after we exit the critical section.
        - Once we schedule back here, we need to enter the critical section again
          which can lead to race conditions if the other CPU gets enters it
          faster than we do.
        */
    }
}

We've run into a similar issue with xQueueReceive() before where a lower priority task can "steal" a message intended for a higher priority task that was blocked on the queue. The flow would look like the following:

  • Task A (CPU 0, Priority 5) calls xQueueReceive() on a empty queue, thus it blocks
  • Task B (CPU 0, Priority 4) calls xQueueSend() to send a message to the queue, thus unblocking Task A.
  • Task A (on CPU 0) has how scheduled back, and needs to enter the critical section again to read the message.
  • At the same time a Task C (CPU 1, Priority 3) also calls xQueueReceive() at the same time, and is quicker than Task A to enter the critical section.
  • Task C ends up reading the message while Task A spins waiting to enter the critical section. Thus Task C as "stolen" Task A's message, even though Task C had a lower priority.

I'm not sure if there's clean way we can work around this SMP race condition, unless we add some information in the queue/mutexes regarding the last task that was unblocked on the queue.

Regardless, the PR changes LGTM. Thanks for the contribution.

@Dazza0
Copy link
Contributor

Dazza0 commented Nov 25, 2022

sha=355abfdff6f1039f3b98b598c484f6103ade9749

@Dazza0 Dazza0 added the PR-Sync-Merge Pull request sync as merge commit label Nov 25, 2022
@espressif-bot espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new Resolution: NA Issue resolution is unavailable labels Nov 28, 2022
@espressif-bot espressif-bot merged commit 2976703 into espressif:master Nov 29, 2022
@timoxd7 timoxd7 deleted the queue-send-fix branch November 29, 2022 20:24
@AxelLin
Copy link
Contributor

AxelLin commented Dec 8, 2022

@Dazza0 Stable branches also need this fix?

@Erlkoenig90
Copy link
Contributor

Thanks for merging this into IDF! I would greatly appreciate it if it would be merged into 4.4.3 . I have manually applied this change to 4.4.2 and it works fine for me.

@Dazza0
Copy link
Contributor

Dazza0 commented Dec 8, 2022

@AxelLin @Erlkoenig90 Backports have already been internally raised and pending merge.

@AxelLin
Copy link
Contributor

AxelLin commented Dec 25, 2022

@Dazza0 The fix is in v4.4+, but v4.3 also needs fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-Sync-Merge Pull request sync as merge commit Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants