Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/atmega_common: return to non-interrupt context swaps #8904

Merged
merged 5 commits into from
May 4, 2018

Conversation

ZetaR60
Copy link
Contributor

@ZetaR60 ZetaR60 commented Apr 9, 2018

Second attempt. See also, first attempt: #8898

This removes the ISR based context swaps on the ATmegas. They were originally introduced in #5745. ISR swaps were introduced to handle the problems with context swaps while inside an ISR. However, there aren't really any differences in the execution environment within an ISR on the ATmega, which means that there are simpler ways to handle this problem.

The behavior is as follows:

  • If thread_yield_higher is called outside of an ISR, it changes context and then changes the program counter using the 'ret' (subroutine return) instruction. This is the same behavior as prior to AVR thread yield #5745.
  • If thread_yield_higher is called within an ISR, it sets sched_context_switch_request and does nothing else.
  • The function thread_yield_isr is introduced, which is meant to be called within ISRs to yield. It changes the context, tells RIOT it is no longer within an ISR (using __exit_isr) and then changes the program counter using the 'reti' (interrupt return) instruction.

tests/thread_* on mega-xplained all passed (including tests/thread_race added by #8897). Also tested with xtimer_* on mega-xplained and all passed except for xtimer_longterm (did not run this test) and xtimer_now64_continuity which showed some inaccuracy. This may be due to the low xtimer accuracy on mega-xplained however.

References:

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 10, 2018

When #8904 is applied to #8842 it fixes one race condition, but reveals another. Requires more testing to see if it is caused by #8904.

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 10, 2018

Issue in #8842 is unrelated to #8904. See #8914

@kYc0o kYc0o self-assigned this Apr 13, 2018
@kYc0o kYc0o added Platform: AVR Platform: This PR/issue effects AVR-based platforms quality defect labels Apr 13, 2018
@kYc0o kYc0o added this to the Release 2018.04 milestone Apr 13, 2018
@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 16, 2018

Results of a large number of tests with trunk and fix applied to trunk side by side. I did not examine what each test was doing in detail, as long as the output looked the same and didn't look like a failure.

Installed compiler toolchains 
-----------------------------
             native gcc: missing
      arm-none-eabi-gcc: missing
                avr-gcc: avr-gcc (Fedora 7.2.0-1.fc27) 7.2.0
       mips-mti-elf-gcc: missing
             msp430-gcc: missing
   riscv-none-embed-gcc: missing
                  clang: missing

Installed compiler libs
-----------------------
   arm-none-eabi-newlib: missing
    mips-mti-elf-newlib: missing
riscv-none-embed-newlib: missing
               avr-libc: "2.0.0" ("20150208")

Installed development tools
---------------------------
                  cmake: cmake version 3.11.0
               cppcheck: missing
                doxygen: missing
                 flake8: missing
                    git: git version 2.14.2
             coccinelle: missing
Test				Trunk				Fix
-------------------------------------------------------------------------------------------------------------------
events				reports success			identical output
evtimer_msg			received expected output	identical, except slightly different times reported
evtimer_underflow		received expected output	identical output
fault_handler			received expected output	identical output
irq				reports success			identical output
isr_yield_higher		thread done after success	ordering different
memarray			received expected output	identical output, except slightly different
								base and current addresses
msg_avail			reports success			identical output
msg_send_receive		reports success			identical output
msg_try_receive			received expected output	identical output
mutex_order			hangs after T7 locked		correct according to README
mutex_unlock_and_sleep		received expected output	identical output
posix_semaphore			threads do not report waking	received expected output
				in TEST2, TEST3
posix_time			received expected output	identical output
ps_schedstatistics		hangs after threads,		identical output
				and before shell
rmutex				hangs after T4 (prio 4,		T7 and T4 switched from README
				depth 0): unlocked mutex
sched_testing			received expected output	identical output
sizeof_tcb			reports success			identical output
thread_basic			received expected output	identical output
thread_cooperation		reports success			identical output
thread_exit			received expected output	identical output
thread_flags			reports success			identical, except trunk reports 100ms timeout
								as 160us and fix reports as 34648us
thread_flags_xtimer		reports success			identical output
thread_flood			reports success			identical output
thread_msg			reports success			identical output
thread_msg_block_wo_queue	received expected output	identical output
thread_msg_block_w_queue	received expected output	identical output
thread_msg_seq			reports success, but did not	reports success, did receive
				receive "THREAD nr3 (pid:5) end."
thread_priority_inversion	received expected output	identical output
trickle				TRICKLE_RESET			FAIL
warn_conflict			nothing to do			identical
xtimer_drift			received expected output	identical, except slightly different times reported
xtimer_hang			reports success			identical output
xtimer_msg			received expected output	identical, except slightly different times reported
xtimer_msg_receive_timeout	reports success			identical output
xtimer_now64_continuity		reports FAILURE,		identical output
				min=10, avg=10, max=1040
xtimer_periodic_wakeup		received expected output	identical, except slightly different times reported
								(identical min/max error reported)
xtimer_remove			reports success			identical output
xtimer_reset			received expected output	identical, except slightly different times reported
xtimer_usleep			received expected output	identical, except slightly different times reported
xtimer_usleep_short		reports success			identical output

tests of interest (significant divergences):

  • isr_yield_higher
  • mutex_order
  • posix_semaphore
  • rmutex
  • thread_flags
  • thread_msg_seq
  • trickle

I will not be able to follow up on this right away. I will be out of town for a week.

@kYc0o kYc0o added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Apr 17, 2018
@kYc0o
Copy link
Contributor

kYc0o commented Apr 17, 2018

Testing on arduino-mega2560:

  • isr_yield_higher Almost OK, it prints the success message a bit before the timer gets triggered, but it's OK IMHO.

  • mutex_order OK

  • posix_semaphore Fails, but has better results than master. Here I think it has something to do with xtimer or UART speed.

  • rmutex OK

  • thread_flags Almost OK. Again timer issues.

  • thread_msg_seq OK

  • trickle Fail. Timing issues. Although it makes use of random which is not really supported on AVR.

I will try to investigate the timing issues, but I'd say this PR is a step forward.

Can you just confirm that this is getting to the exact previous implementation? I mean, before the context switching was implemented with interrupts.

@kYc0o
Copy link
Contributor

kYc0o commented Apr 17, 2018

Can you just confirm that this is getting to the exact previous implementation? I mean, before the context switching was implemented with interrupts.

I just looked by myself and see that this is somehow a new approach, thus is not exactly coming back to a previous state.

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 18, 2018

@kYc0o The old approach did not handle context swaps inside ISRs correctly. The context swap would run inside the ISR and the other thread would be executing in ISR mode. That was why the old approach was changed to use the interrupts. Non-ISR handling should be the same with this fix.

Differences within an ISR from the old approach:

  • thread_yield and thread_yield_higher do not save / restore context or yield, but set sched_context_switch_request instead.
  • thread_yield_isr is used at the end of ISRs so that if a yield was attempted (and sched_context_switch_request was set) then it: saves/restores contexts, exits from the ISR (which really only means it tells RIOT it is no longer in an ISR by using __exit_isr), and then issues a return from interrupt instruction (which actually switches the program counter).

This fixes the problem the old code had with ISRs.

@jnohlgard
Copy link
Member

@ZetaR60 from your textual description it sounds like the avr with this PR behaves like the Cortex-M port does

@kYc0o
Copy link
Contributor

kYc0o commented Apr 25, 2018

@Josar can you test this in your jiminy board?

@Josar
Copy link
Contributor

Josar commented Apr 26, 2018

@kYc0o test results.

  1. mutex_order OK
  2. posix_semaphore FAILED: test 4
  3. rmutex
    {THREAD_PRIORITY_MAIN - 1, 4, 5, 2, 4}; thread 4 and 7 have same prio, different than in readme.
    Thus, T7 comes after T4. Besides that OK
  4. thread_flags
    thread(): received flags: 0x0006 means flag 2 and 4 simultanously.
    time passed: 34672us thus timeout error triggered.
    FAILED?
  5. thread_msg_seq seems OK
  6. trickle FAILED

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 27, 2018

Rebased on master and merge conflicts solved.

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 29, 2018

thread_flags looks okay to me after a closer look. 01-run.py expects thread(): received flags: 0x0006 and it is intentionally waited for by _thread (because 0x2 | 0x4 == 0x6). The timeout is intentional (not an error); testing the timeout functionality I guess? The nominal value for the timeout should be 100ms (or 100,000us), so it is about 60-70ms off. But this is better than the previous behavior (160us for me), which is almost an order of magnitude off. I suggest we consider this one to be okay.

EDIT: in core/include/thread_flags.h the define THREAD_FLAG_TIMEOUT is set to (0x1<<14), or 2^15 = 32768, which I think means that the test message is wrong? The behavior of the fix is close to perfect while the original method is wildly off.

Still looking into posix_semaphore and trickle.

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 29, 2018

The issues with posix_semaphore seem to be related to a xtimer problem and not to this fix. There is a separate issue for that problem: #9049. I think posix_semaphore would pass with #8904 if #9049 was fixed.

This is probably also the cause of the trickle failure.

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented Apr 30, 2018

The problems with trickle is not caused by the same underlying issue as posix_semaphore. There seems to be some problems with how tests/trickle works, at least on ATmegas, and is probably unrelated to this fix. Issue for tests/trickle: #9052

I have investigated the three main problem tests (thread_flags, posix_semaphore, and trickle). If the rest of you think that the other tests are okay or have only insignificant differences, then I think that all the currently outstanding problems with this fix have been solved.

@kYc0o
Copy link
Contributor

kYc0o commented May 3, 2018

Ok, I understand. Let's tackle the other problems separately.

@kYc0o kYc0o added this to the Release 2018.07 milestone May 3, 2018
Copy link
Contributor

@kYc0o kYc0o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Please squash only the last fixup.

@ZetaR60
Copy link
Contributor Author

ZetaR60 commented May 3, 2018

@kYc0o Squashed. Thanks for your help!

@kYc0o kYc0o merged commit 53c3f83 into RIOT-OS:master May 4, 2018
@ZetaR60 ZetaR60 mentioned this pull request May 16, 2018
24 tasks
@miri64 miri64 added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) and removed quality defect Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation labels Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: AVR Platform: This PR/issue effects AVR-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants