Skip to content

Patch for an Uart issue in ISR #5446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Patch for an Uart issue in ISR #5446

wants to merge 3 commits into from

Conversation

soleilplanet
Copy link
Contributor

@soleilplanet soleilplanet commented Nov 7, 2017

Description

Changes:
1. SerialBase.cpp: the change of separating initialization of 'ret' is to avoid the compiler optimizing it. It is observed that on ublox EVK NINA-B1, 'ret' is always returning false as it has been optimized.
2. Serial.cpp/Serial.h: the change is to avoid calling mutex lock/unlock and accessing file in ISR context.

Note: there's risk that with this patch putc/getc will behave differently in ISR and non-ISR context. Further discussions required.

note:

       observed in ublox EVK NINA-B1: 
           **Mutex 0x20005690 error -6: Not allowed in ISR context** (occurred in Serial::lock() and Serial::unlock())
           **Error - reading from a file in an ISR or critical section** (occurred in Stream::getc() and Stream::putc() )

reference issue : #4686

Status

IN DEVELOPMENT

Migrations

Serial class will have its own instance of putc() and getc() implementations.

Todos

  • [v] Tests

Deploy notes

No changes required.

@0xc0170 0xc0170 requested a review from hasnainvirk November 7, 2017 09:26
@soleilplanet soleilplanet changed the title Sensize uart issue 4686 Patch for an Uart issue in ISR Nov 7, 2017
@0xc0170 0xc0170 requested a review from SenRamakri November 7, 2017 09:27
@@ -55,15 +55,17 @@ void SerialBase::format(int bits, Parity parity, int stop_bits) {

int SerialBase::readable() {
lock();
int ret = serial_readable(&_serial);
int ret = 0;
ret = serial_readable(&_serial);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. SerialBase.cpp: the change of separating initialization of 'ret' is to avoid the compiler optimizing it. It is observed that on ublox EVK NINA-B1, 'ret' is always returning false as it has been optimized.

I fail to see how this is different - how this can produce a different output by a compiler. Are there any optimization that you are seeing with debug/release here? I recall from your report that return value is set to false in the release build.

The output should be the same as in the current form. Isn't the problem in the below layer - HAL ? How serial_readable is implemented for your target that you are using? @

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the serial_readable() on this platform actually references a certain register, which may look 'constant' to the compiler in O3 optimization profile in my experience.
I've tried three ways: (all with release profile)

  1. original -> serial_readable() always returns false
  2. separating initialization (ret = 0; ret = serial_readable(), this is trying to tell the compiler that the value of this variable may change) -> works
  3. adding volatile to ret in the original code -> works
    I didn't disassembly the binary to see the actual assembly code, but the three binaries actually differs: the (2) and (3) are pretty close, but (2) and (3) are far different from (1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which compiler is this? I wasn't aware we had any compiler profile in use that would do cross-compilation-unit optimisations - so as serial_readable itself isn't inline, if it had a problem with a missing volatile or something, I wouldn't have expected anything you do in a caller to affect such a problem. But apparently it is...?

And if there was a platform problem, the same bug would be affecting other code - this class isn't the only place serial_readable is used.

Looking at the platform code (Nina B1->NRF52->SDK11), I don't see an obvious flaw:

__STATIC_INLINE bool nrf_uart_event_check(NRF_UART_Type * p_reg, nrf_uart_event_t event)
{
    return (bool)*(volatile uint32_t *)((uint8_t *)p_reg + (uint32_t)event);
}

int serial_readable(serial_t *obj)
{
(void)obj;
#if DEVICE_SERIAL_ASYNCH
    if (UART_CB.rx_active) {
        return 0;
    }
#endif
    return (nrf_uart_event_check(UART_INSTANCE, NRF_UART_EVENT_RXDRDY));
}

NRF_UART_Type has is a struct with a bunch of volatile fields, it gets cast to non-volatile uint8_t *, but back to volatile uint32_t * before dereferencing. Should be okay.

Only thing I'd say is "hmm, only one serial port supported at once. Really?" Any chance you've somehow got some interference - two serial ports being accessed? Maybe stdin/stdout messing with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm using GCC ARM 6-2017-q2 update, and I don't see other messages running. I doubted the same thing before I get the ublox board as you commented that there are a bunch of volatile fields in the low level implementation which looked okay, but when I get the board and gave it a try, adding volatile or touching the ret variable in the caller actually helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this really were to be a compiler bug, which I don't think we've concluded yet, I would want to see it fixed by a pragma lowering optimisation in serial_readable or maybe nrf_uart_event_check - or maybe even a global toolchain option change.

This sort of "spelling change that really shouldn't cause a difference" in the caller doesn't really cut it as a workaround - it's too inexplicable why it should work, and could easily be accidentally "unfixed" later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as syntax of the pragma to change the optimization level varies among compilers, is there a suggested way to do that?
btw, confirming if this is actually a compiler bug requires a look at the assembly code, is there a way to do that without hardware debugger?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soleilplanet -O3 is not used in any of our profiles.

I've isolated the readable member function and the code generated with GCC 6.3 seems fine.

  1. original -> serial_readable() always returns false
  2. separating initialization (ret = 0; ret = serial_readable(), this is trying to tell the compiler that the value of this variable may change) -> works
  3. adding volatile to ret in the original code -> works

(1) and (2) should not be different; ret and is address is not used while 0 is assigned to it therefore, the 0 assignment can be skipped. Of course (3) is different.

btw, confirming if this is actually a compiler bug requires a look at the assembly code, is there a way to do that without hardware debugger?

You can use Objdump to dump the code of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to precise gcc profiles since it is what @soleilplanet use apparently.

@soleilplanet Have you test with other compilers and reproduced the issue ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan- I don't see this problem with GCC ARM 5.4.1 on ubuntu either, this happens on Windows environment.
For the disassembly, I just realized that there is a symbol map file generated, I'll check that with objdump.

@@ -98,6 +98,8 @@ class Serial : public SerialBase, public Stream, private NonCopyable<Serial> {
{
return SerialBase::writeable();
}
int getc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing virtual for these methods? (Hint: use commit message to contain details for changes, in this case it would be a problem with having virtual getc/putc and how this change fixes the problem)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the getc/putc in Serial class inherits from Stream class, and in Stream class, putc/getc are not virtual functions, so that's why I removed the virtual here. in Serial class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xc0170 - he's not removing virtual from the existing code, he's changing his earlier commit, which added a getc() to Serial to override the non-virtual method it inherits from Stream.

But that lack of virtual in Stream is why this won't work properly. The resulting FileHandle/Stream will not use this getc() unless you're accessing it through a Serial pointer.

And on top of that, even if it is called, you've created a mutex-locked class that doesn't lock reliably - it has a mutex it will claim when called from thread, but that won't stop it being called from interrupt. It's effectively "claim the lock, but only in certain contexts" - not really a lock any more.

If you really want to create an "non-mutex, callable from interrupt" version of any of these HAL classes, I believe the standard pattern is to derive from them, and override their virtual lock/unlock methods to no-ops (or enter critical section, maybe). So derive an UnlockedSerial from Serial, if that's what you really want - I think that should work in your case.

In the case of Serial though, you can instead use RawSerial, which has null lock by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deriving an UnlockedSerial form Serial sounds just another version of switching to RawSerial and UARTSerial. @sg- Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deriving an UnlockedSerial form Serial sounds just another version of switching to RawSerial

Well, yes. And actually it would be trickier than I was thinking, as it's not just the lock in the HAL class, like the other HAL classes, it's the fact that all the calls use the C library. Serial is designed as a C stream-based class.

Serial::putc is supposed to be "call fputc on my FileHandle", and as the C library is not callable from interrupt, Serial::putc cannot be used from interrupt. Similarly for Serial::scanf, etc.

If you want something that gives you simple getc and putc that are interrupt-safe, not using the C library, then that is RawSerial. As its header says:

/** A serial port (UART) for communication with other serial devices
 * This is a variation of the Serial class that doesn't use streams,
 * thus making it safe to use in interrupt handlers with the RTOS.
 *

Your patch is an incomplete attempt to make Serial act like RawSerial if called from interrupt - "don't claim the mutex, and don't use the C library", but it only covers the API that RawSerial has. If that's all you're using, you may as well just use RawSerial, rather than attempt to bend Serial.

An UnlockedSerial would sort of work - it would let you access the FileHandle API, which would be useable from interrupt. You'd just have to avoid the C library-based API.

class UnlockedSerial : public Serial {
      // override lock/unlock with empty functions
}

UnlockedSerial serial;
// get past Stream's bizarre protection of its FileHandle base -
// we can't call serial.read() directly
FileHandle &fh =serial;
n = fh.read(buffer, 1);  // only reading 1 is ever safe, if from IRQ - have to precheck that 1 with serial.readable().

In general FileHandles are not expected to be useable from interrupt, but it happens that Stream's implementation is, given an appropriate lock.

Not sure what that would really gain you though - it's just a very complicated way to keep using two horrible classes instead of the sensible RawSerial and UARTSerial though. Stream's broken FileHandle isn't very useful - eg the fact you have to read 1 above.

@@ -36,12 +37,42 @@ int Serial::_putc(int c) {
return _base_putc(c);
}

int Serial::getc() {
// Mutex is already held
if(!core_util_is_isr_active())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this codeblock achieving - getting getc/putc to be accesible from interrupts? Can you provide a bit more details (a reason to change this) - how Stream::getc() helps here with invokation if it's interrupt active?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the risk of behavior inconsistency I mentioned in the PR description - I'm adding these because in the original code, when it is in ISR context, calling putc()/getc() in Stream class raises this error:
Error - reading from a file in an ISR or critical section**
So my change here is to call _base_putc/_base_getc in ISR context.

@0xc0170 0xc0170 requested a review from kjbracey November 7, 2017 09:41
@kjbracey
Copy link
Contributor

kjbracey commented Nov 7, 2017

You're not allowed to call Serial from interrupt - if you do want to access the serial port from interrupt, you can use RawSerial.

Serial is on the way to being deprecated - it cannot be used from interrupt, yet effectively needs to be called from interrupt due to lack of buffering, at least for RX. Hence it is not terribly useful.

If you want to access the serial port from a normal application, I suggest UARTSerial - this is superceding Serial - it has its own interrupt handling and buffering, reducing system load on TX and avoiding the need to have your own RX interrupt handler.

As to the "ret" thing - if that code change does anything you have a serious system/compiler problem. Are you sure you're even getting a different binary with that?

@sg-
Copy link
Contributor

sg- commented Nov 7, 2017

Serial is on the way to being deprecated - it cannot be used from interrupt, yet effectively needs to be called from interrupt due to lack of buffering, at least for RX. Hence it is not terribly useful.

Whoa! Why would you say that? I'd bet its the most used class. Fix the problem, dont deprecate.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 8, 2017

Fix the problem, dont deprecate.

UARTSerial is fixing the problem - that's why it was created.

If Serial was fixable, we could have fixed it, but any fix would be breaking backwards-compatibility; it's broken by design - it provides a FileHandle that can't really be used due to its read/write API flaws and the lack of buffering. (And that's before you start on the CPU load.) You were involved in those design discussions from the start.

The intent was always to deprecate Serial and Stream, extend FileHandle, and create UARTSerial to be a working FileHandle serial.

People wanting just direct HAL serial access, useable from interrupt, can use RawSerial; applications wanting a stream with buffering they can read and write to from a normal thread, eg with fprintf or fread, can use UARTSerial.

@sg-
Copy link
Contributor

sg- commented Nov 8, 2017

The only thing preventing Serial from being "fixed" was public inheritance from Stream while read/write and _putc/_getc was protected. Is that why we have a UARTSerial class?!?

@kjbracey
Copy link
Contributor

kjbracey commented Nov 9, 2017

Serial is a non-buffered FileHandle, which can only be used from application space. But a non-buffered serial port cannot be safely used from application space, as you can't won't normally meet the necessary RX latency requirements to avoid data loss. A non-buffered serial FileHandle is a paradoxical API.

If you want to resolve the paradox by having something just giving you a thin layer on the HAL you can use from interrupt, then you can use RawSerial. If you want to resolve the paradox by adding a buffer to use it from threads or event queues, then there's UARTSerial, which is the out-of-tree BufferedSerial combined with a working FileHandle implementation.

The reason we didn't touch Serial is it's too widely used, and due to its oddities users are very often intimately relying on details of its implementation - they tend to maybe use it for output and bypass it with HAL serial_ calls and an interrupt pump for input. There's no way we could add a buffer to it.

Then there's the fact that Stream's FileHandle implementation is broken for read and the way it was based on _getc() and _putc() could not be straightforwardly extended to cover non-blocking - there's no backwards-compatible way to make Stream handle non-blocking, as we required.

Serial does kind of work, but only for TX, and with massive CPU load. Some thought was put to minor tweaks, but in the end review decided to not risk backwards compatibility, and not touch Stream or Serial at all, and plan to deprecate them, leaving RawSerial and UARTSerial I suggest you re-read the discussions.

@adbridge
Copy link
Contributor

@sg- This PR needs to be considered alongside the other Serial ones that are currently ongoing. I'm going to mark this as Do not merge until the whole serial issues is addressed.

@MarceloSalazar
Copy link

Internal reference: IOTMORF-2319

@cmonr
Copy link
Contributor

cmonr commented May 29, 2018

As an FYI, the internal issue is still being tracked, but since this PR has become stale we're going to close this PR.

It can either be reopened in the future, or a new PR re-addressing the issues can be made.

@cmonr cmonr closed this May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants