Skip to content

SMP: Set affinity for idle tasks #1264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ianstcdns
Copy link
Contributor

If configUSE_CORE_AFFINITY is set, update prvCreateIdleTasks() to restrict idle tasks to each run only on their intended core.

The default of (configTASK_DEFAULT_CORE_AFFINITY == tskNO_AFFINITY) allows multiple idle tasks to be started on the same core. More importantly, some cores could have no idle task running. This results in the following potential hang:

  1. Core 0 attempts to yield core N by calling prvYieldCore(), which sets pxCurrentTCBs[N-1]->xTaskRunState to -2 (taskTASK_SCHEDULED_TO_YIELD) and calls portYIELD_CORE(N-1).
  2. In our port, portYIELD_CORE triggers an interrupt on core (N-1); the subsequent ISR calls portYIELD_FROM_ISR(1) -> vPortYieldFromInt() -> vTaskSwitchContext() -> prvSelectHighestPriorityTask(), which correctly does not change pxCurrentTCBs[N-1] but incorrectly does not change pxCurrentTCBs [N-1]->xTaskRunState.
  3. The next time core 0 attempts to yield core N, pxCurrentTCBs[N-1]->xTaskRunState is still set to taskTASK_SCHEDULED_TO_YIELD, which prevents any further calls to portYIELD_CORE(N-1).

Pinning the yield task to core N-1 allows prvSelectHighestPriorityTask() in step 2 to correctly update pxCurrentTCBs[N-1]->xTaskRunState to N, and the subsequent interrupt in step 3 to trigger again.

  • [ X ] I have tested my changes. No regression in existing tests.
  • [ X ] I have modified and/or added unit-tests to cover the code changes in this Pull Request.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Ian Thompson <ianst@cadence.com>
@ianstcdns ianstcdns requested a review from a team as a code owner April 11, 2025 22:32
@ianstcdns
Copy link
Contributor Author

@aggarg @chinglee-iot please review whenever time permits. Thank you in advance.

@chinglee-iot
Copy link
Member

@ianstcdns
Thank you for creating this PR.
I need your help to help me understand the purpose of this PR. Assuming your platform has N cores and index starts from 0 ~ ( N - 1 ).

The default of (configTASK_DEFAULT_CORE_AFFINITY == tskNO_AFFINITY) allows multiple idle tasks to be started on the same core. More importantly, some cores could have no idle task running.

In the design, a core can only run a task at a time. Can you help to elaborate more about this description above?

The scheduler creates equal number to configNUMBER_OF_CORES idle tasks to ensure that each core can find a task to run. The logic in prvSelectHighestPriorityTask() set the task run state to RUNNING by setting xTaskRunState to core ID when it is able to find a task to run.

One of the possible reason in step 2 description is that

  1. The task currently running on core ( N - 1 ) is blocked and the core ( N - 1 ) is requested to yield.
  2. The scheduler can't find a task to run. This could be the idle tasks are blocked. Therefore, prvSelectHighestPriorityTask() can't select an idle task to run on the core.

This situation can be observed by the following assertion. This assertion will help identify if the scheduler fails to find any runnable task, which shouldn't happen if idle tasks are properly managed.

    static void prvSelectHighestPriorityTask( BaseType_t xCoreID )
    {
        ...
        while( xTaskScheduled == pdFALSE )
        {
            ...
            if( uxCurrentPriority > tskIDLE_PRIORITY )
            {
                uxCurrentPriority--;
            }
            else
            {
                /* This function is called when idle task is not created. Break the
                 * loop to prevent uxCurrentPriority overrun. */
                configASSERT( pdFALSE ); // <= Assert here to confirm that the scheduler is not able to find a task to run on the core
                break;
            }
        }
        ...
    }

Can you help to do the experiment and feedback your observation? If we confirm that the scheduler is not able to find a task to run, then we can further investigate the run state of idle tasks.

@ianstcdns
Copy link
Contributor Author

Hi, @chinglee-iot, thanks for your reply.

The default of (configTASK_DEFAULT_CORE_AFFINITY == tskNO_AFFINITY) allows multiple idle tasks to be started on the same core. More importantly, some cores could have no idle task running.

In the design, a core can only run a task at a time. Can you help to elaborate more about this description above?

Apologies--my description was unclear. I had meant that any of multiple idle tasks could be started on the same core, but that isn't very relevant. The main observation I was trying to convey is that it appears there is a condition (perhaps only in our port?) where some cores do not run any idle task.

This situation can be observed by the following assertion. This assertion will help identify if the scheduler fails to find any runnable task, which shouldn't happen if idle tasks are properly managed.

    static void prvSelectHighestPriorityTask( BaseType_t xCoreID )
    {
        ...
        while( xTaskScheduled == pdFALSE )
        {
            ...
            if( uxCurrentPriority > tskIDLE_PRIORITY )
            {
                uxCurrentPriority--;
            }
            else
            {
                /* This function is called when idle task is not created. Break the
                 * loop to prevent uxCurrentPriority overrun. */
                configASSERT( pdFALSE ); // <= Assert here to confirm that the scheduler is not able to find a task to run on the core
                break;
            }
        }
        ...
    }

Can you help to do the experiment and feedback your observation? If we confirm that the scheduler is not able to find a task to run, then we can further investigate the run state of idle tasks.

This new assertion does indeed fire, regardless of whether this PR's work-around is present or not. The call trace is: _frxt_dispatch() -> vTaskSwitchContext(xCoreID = 3) -> prvSelectHighestPriorityTask(xCoreID = 3) -> vAssertCalled(). I saw this happening during the first context switch triggered from xPortStartScheduler(), so I modified the assertion to be checked only after 10 clock ticks, and it fires with that modification as well.

I don't know whether it's relevant, but we are currently enabling the following config options for our SMP platforms:

/* Multicore settings */
#define configNUMBER_OF_CORES                           XCHAL_SUBSYS_NUM_CORES
#define configUSE_PASSIVE_IDLE_HOOK                     1
#define configRUN_MULTIPLE_PRIORITIES                   1
#define configUSE_CORE_AFFINITY                         1
#define configUSE_TASK_PREEMPTION_DISABLE               1

What further experiments would be helpful in narrowing this down? Are there any idle task management details that our port needs to be aware of that I may have missed? Thanks in advance.

@chinglee-iot
Copy link
Member

@ianstcdns
Thank you for sharing further information.

From your description, the assertion happens in xPortStartScheduler(). Idle tasks are created by scheduler before xPortStartScheduler() is called in vTaskStartScheduler().

If you have debugger attached, we can check the idle task status when the assertion happens.
example gdb command:

print *xIdleTaskHandles[0]
print *xIdleTaskHandles[1]
....

or we can make use of the print function of your platform to print the idle task information

    static void prvSelectHighestPriorityTask( BaseType_t xCoreID )
    {
        ...
        while( xTaskScheduled == pdFALSE )
        {
            ...
            if( uxCurrentPriority > tskIDLE_PRIORITY )
            {
                uxCurrentPriority--;
            }
            else
            {
                /* This function is called when idle task is not created. Break the
                 * loop to prevent uxCurrentPriority overrun. */
                for( UBaseType_t uxIdleTaskIndex = 0; uxIdleTaskIndex  < configNUMBER_OF_CORES; uxIdleTaskIndex ++ )
                {
                    configPRINTF( "xIdleTaskHandles[%u]->xTaskRunState = %d\r\n", uxIdleTaskIndex, xIdleTaskHandles[ uxIdleTaskIndex ]->xTaskRunState );
                    configPRINTF( "xIdleTaskHandles[%u] state list = %p\r\n", uxIdleTaskIndex, listLIST_ITEM_CONTAINER( &( xIdleTaskHandles[ uxIdleTaskIndex ]->xStateListItem ) ) );
                }
                configASSERT( pdFALSE ); // <= Assert here to confirm that the scheduler is not able to find a task to run on the core
                break;
            }
        }
        ...
    }

We try to find out the reason why the idle tasks can not be selected to run on the core with the debug information above.

One of the possible reason is that the idle task is accidentally blocked. Then the idle task can't be selected in prvSelectHighestPriorityTask() and the core has no task to run. I also observed that you enabled configUSE_PASSIVE_IDLE_HOOK. The scheduler requests not to call any function the could block in the idle hook. Can you also help to check the idle hook function?

@ianstcdns
Copy link
Contributor Author

Hi @chinglee-iot thanks for the debugging tips. We've found and fixed a separate issue that has also resolved the need for this patch. If you don't mind, I'd like to leave this PR open for a little longer to confirm, at which I will plan to close it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants