-
-
Notifications
You must be signed in to change notification settings - Fork 7k
HardwareSerial improvements #1711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HardwareSerial improvements #1711
Conversation
I just noticed these lines at the start of the HardwareSerial.cpp:
These should be using the USE_HWSERIALn constants as well. |
On the mailing list, there is some in-depth discussion regarding automatically including or excluding HardwareSerial instances, instead of using explicit DISABLE_HWSERIALn defines. For some reason, two of my posts with a big piece of analysis were silently dropped by the list, so I'll just post them here instead. You might want to skip to the second post directly, which contains the relevant conclusions.
And here's the second post:
In the meantime, I've whipped up an implementation that seems to work. |
I (force) updated this pullrequest again, with the following changes:
Note that github orders the commit in weird way, the atomic commit is really the topmost commit and it wasn't changed (except being rebased). |
I just did another force push. I accidentally prepended some unrelated commits in the previous push, which are removed now. Nothing else changed since yesterday. |
There are two begin methods, one which accepts just a baud rate and uses the default bit settings and one which accepts both a baudrate and a bit config. Previously, both of these contained a complete implementation, but now the former just calls the latter, explicitely passing the default 8N1 configuration. Technically, this causes a small change: Before the UCSRC register was untouched when calling begin(baud), now it is explicitely initialized with 8N1. However, since this is the default configuration for at least the Uno and the Mega (didn't check any others), probably for all avrs, this shouldn't effectively change anything. Given that the Arduino documentation also documents this as the default when none is passed, explicitly setting it is probably a good idea in any case.
This simplifies the baud rate calculation, removing the need for a goto and shortening the code a bit. Other than that, this code should not use any different settings than before. Code was suggested by Rob Tillaart on github. Closes: arduino#1262
Recent avr-libc releases define one, but this allows using it also on older avr-libc releases.
Previously, the constants to use for the bit positions of the various UARTs were passed to the HardwareSerial constructor. However, this meant that whenever these values were used, the had to be indirectly loaded, resulting in extra code overhead. Additionally, since there is no instruction to shift a value by a variable amount, the 1 << x expressions (inside _BV and sbi() / cbi()) would be compiled as a loop instead of being evaluated at compiletime. Now, the HardwareSerial class always uses the constants for the bit positions of UART 0 (and some code is present to make sure these constants exist, even for targets that only have a single unnumbered UART or start at UART1). This was already done for the TXC0 constant, for some reason. For the actual register addresses, this approach does not work, since these are of course different between the different UARTs on a single chip. Of course, always using the UART 0 constants is only correct when the constants are actually identical for the different UARTs. It has been verified that this is currently the case for all targets supported by avr-gcc 4.7.2, and the code contains compile-time checks to verify this for the current target, in case a new target is added for which this does not hold. This verification was done using: for i in TXC RXEN TXEN RXCIE UDRIE U2X UPE; do echo $i; grep --no-filename -r "#define $i[0-9]\? " /usr/lib/avr/include/avr/io* | sed "s/#define $i[0-9]\?\s*\(\S\)\+\s*\(\/\*.*\*\/\)\?$/\1/" | sort | uniq ; done This command shows that the above constants are identical for all uarts on all platforms, except for TXC, which is sometimes 6 and sometimes 0. Further investigation shows that it is always 6, except in io90scr100.h, but that file defines TXC0 with value 6 for the UART and uses TXC with value 0 for some USB-related register. This commit reduces program size on the uno by around 120 bytes.
The actual interrupt vectors are of course defined as before, but they let new methods in the HardwareSerial class do the actual work. This greatly reduces code duplication and prepares for one of my next commits which requires the tx interrupt handler to be called from another context as well. The actual content of the interrupts handlers was pretty much identical, so that remains unchanged (except that store_char was now only needed once, so it was inlined). Now all access to the buffers are inside the HardwareSerial class, the buffer variables can be made private. One would expect a program size reduction from this change (at least with multiple UARTs), but due to the fact that the interrupt handlers now only have indirect access to a few registers (which previously were just hardcoded in the handlers) and because there is some extra function call overhead, the code size on the uno actually increases by around 70 bytes. On the mega, which has four UARTs, the code size decreases by around 70 bytes.
This is slightly more clear than the previous explicit comparison.
The flush() method blocks until all characters in the serial buffer have been written to the uart _and_ transmitted. This is checked by waiting until the "TXC" (TX Complete) bit is set by the UART, signalling completion. This bit is cleared by write() when adding a new byte to the buffer and set by the hardware after tranmission ends, so it is always guaranteed to be zero from the moment the first byte in a sequence is queued until the moment the last byte is transmitted, and it is one from the moment the last byte in the buffer is transmitted until the first byte in the next sequence is queued. However, the TXC bit is also zero from initialization to the moment the first byte ever is queued (and then continues to be zero until the first sequence of bytes completes transmission). Unfortunately we cannot manually set the TXC bit during initialization, we can only clear it. To make sure that flush() would not (indefinitely) block when it is called _before_ anything was written to the serial device, the "transmitting" variable was introduced. This variable suggests that it is only true when something is transmitting, which isn't currently the case (it remains true after transmission is complete until flush() is called, for example). Furthermore, there is no need to keep the status of transmission, the only thing needed is to remember if anything has ever been written, so the corner case described above can be detected. This commit improves the code by: - Renaming the "transmitting" variable to _written (making it more clear and following the leading underscore naming convention). - Not resetting the value of _written at the end of flush(), there is no point to this. - Only checking the "_written" value once in flush(), since it can never be toggled off anyway. - Initializing the value of _written in both versions of _begin (though it probably gets initialized to 0 by default anyway, better to be explicit).
…hile It turns out there is an additional corner case. The analysis in the previous commit wrt to flush() assumes that the data register is always kept filled by the interrupt handler, so the TXC bit won't get set until all the queued bytes have been transmitted. But, when interrupts are disabled for a longer period (for example when an interrupt handler for another device is running for longer than 1-2 byte times), it could happen that the UART stops transmitting while there are still more bytes queued (but these are in the buffer, not in the UDR register, so the UART can't know about them). In this case, the TXC bit would get set, but the transmission is not complete yet. We can easily detect this case by looking at the head and tail pointers, but it seems easier to instead look at the UDRIE bit (the TX interrupt is enabled if and only if there are bytes in the queue). To fix this corner case, this commit: - Checks the UDRIE bit and only if it is unset, looks at the TXC bit. - Moves the clearing of TXC from write() to the tx interrupt handler. This (still) causes the TXC bit to be cleared whenever a byte is queued when the buffer is empty (in this case the tx interrupt will trigger directly after write() is called). It also causes the TXC bit to be cleared whenever transmission is resumed after it halted because interrupts have been disabled for too long. As a side effect, another race condition is prevented. This could occur at very high bitrates, where the transmission would be completed before the code got time to clear the TXC0 register, making the clear happen _after_ the transmission was already complete. With the new code, the clearing of TXC happens directly after writing to the UDR register, while interrupts are disabled, and we can be certain the data transmission needs more time than one instruction to complete. This fixes arduino#1463 and replaces arduino#1456.
When interrupts are disabled, writing to HardwareSerial could cause a lockup. When the tx buffer is full, a busy-wait loop is used to wait for the interrupt handler to free up a byte in the buffer. However, when interrupts are disabled, this will of course never happen and the Arduino will lock up. This often caused lockups when doing (big) debug printing from an interrupt handler. Additionally, calling flush() with interrupts disabled while transmission was in progress would also cause a lockup. When interrupts are disabled, the code now actively checks the UDRE (UART Data Register Empty) and calls the interrupt handler to free up room if the bit is set. This can lead to delays in interrupt handlers when the serial buffer is full, but a delay is of course always preferred to a lockup. Closes: arduino#672 References: arduino#1147
Before, the interrupt was disabled when it was triggered and it turned out there was no data to send. However, the interrupt can be disabled already when the last byte is written to the UART, since write() will always re-enable the interrupt when it adds new data to the buffer. Closes: arduino#1008
Before, this decision was made in few different places, based on sometimes different register defines. Now, HardwareSerial.h decides wich UARTS are available, defines USE_HWSERIALn macros and HardwareSerial.cpp simply checks these macros (together with some #ifs to decide which registers to use for UART 0). For consistency, USBAPI.h also defines a HAVE_CDCSERIAL macro when applicable. For supported targets, this should change any behaviour. For unsupported targets, the error messages might subtly change because some checks are moved or changed. Additionally, this moves the USBAPI.h include form HardareSerial.h into Arduino.h and raises an error when both CDC serial and UART0 are available (previously this would silently use UART0 instead of CDC, but there is not currently any Atmel chip available for which this would occur).
By putting the ISRs and HardwareSerial instance for each instance in a separate compilation unit, the compile will only consider them for linking when the instance is actually used. The ISR is always referenced by the compiler runtime and the Serialx_available() function is always referenced by SerialEventRun(), but both references are weak and thus do not cause the compilation to be included in the link by themselves. The effect of this is that when multiple HardwareSerial ports are available, but not all are used, buffers are only allocated and ISRs are only included for the serial ports that are used. On the mega, this lowers memory usage from 653 bytes to just 182 when only using the first serial port. On boards with just a single port, there is no change, since the code and memory was already left out when no serial port was used at all. This fixes arduino#1425 and fixes arduino#1259.
This prevents a potential race condition where for example write() would detect that there is room in the tx buffer, an interrupt would trigger where the interrupt handler writes to Serial filling up that room, and write() would then reset the head pointer, effectively throwing away the bytes written by the interrupt handler. References: arduino#1147
This helps improve the effective datarate on high (>500kbit/s) bitrates, by skipping the interrupt and associated overhead. At 1 Mbit/s the implementation previously got up to about 600-700 kbit/s, but now it actually gets up to the 1Mbit/s (values are rough estimates, though).
And another update:
|
Moreover, declaring pointers-to-registers as const and using initializer list in class constructor allows the compiler to further improve inlining performance. This change recovers about 50 bytes of program space on single-UART devices. See arduino#1711
I'm staging this pull request for merging here: I picked all your commits except the last two: There is an additional commit 275c0a0 that inlines the RX ISR. I previously tried to inline also the TX ISR, but as you pointed out in your devlist comment the commit: |
Your extra commit looks good. Smart to move code like this to help the compiler. I haven't tested it, but I see what you're doing and it looks good. Not including the atomicity commit sounds good, it seems there is no consensus about whether that is a good idea. I'll see if I can figure out what problems could potentially occur when printing from an ISR now. If it's just messing up output, I guess it's ok. If it's also possible to deadlock, we might want to see if we can fix that still. How does that sound? What's the reason not to include the bypass commit? |
Sounds good. I've just added the "bypass" commit too... I don't know why but for some reasons I thought it was dependent on the "atomicity" commit. |
Merged, finally :-) |
Thanks! :-D |
This is a new version of #1369, but now against ide-1.5.x instead of master.
In addition, I changed a few things:
The commits were reordered a bit, to put the "easy" ones first. In particular, I think that all of them can be merged, except perhaps the latter two, which might still need some discussion.