Skip to content

HardwareSerial improvements #1711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

matthijskooijman
Copy link
Collaborator

This is a new version of #1369, but now against ide-1.5.x instead of master.

In addition, I changed a few things:

  • There is a commit defining _NOP now
  • It is now possible to disable UART instances using a define.
  • I added a commit renaming Serial to Serial0 or CDCSerial (with a #define Serial to select either one in the variant header files).
  • I dropped the patch wrt to blocking behaviour, since I'm not sure what the proper API should be (and I'm not particularly interested in this myself...).

The commits were reordered a bit, to put the "easy" ones first. In particular, I think that all of them can be merged, except perhaps the latter two, which might still need some discussion.

@matthijskooijman
Copy link
Collaborator Author

I just noticed these lines at the start of the HardwareSerial.cpp:

// this next line disables the entire HardwareSerial.cpp, 
// this is so I can support Attiny series and any other chip without a uart
#if defined(UBRRH) || defined(UBRR0H) || defined(UBRR1H) || defined(UBRR2H) || defined(UBRR3H)

These should be using the USE_HWSERIALn constants as well.

@matthijskooijman
Copy link
Collaborator Author

On the mailing list, there is some in-depth discussion regarding automatically including or excluding HardwareSerial instances, instead of using explicit DISABLE_HWSERIALn defines. For some reason, two of my posts with a big piece of analysis were silently dropped by the list, so I'll just post them here instead. You might want to skip to the second post directly, which contains the relevant conclusions.

On Mon, Dec 02, 2013 at 07:45:05PM -0600, Brian Cook wrote:

If the code and data for each Serial object was in its own source
file (each of which would get compiled into separate object files)
I believe the linker would automatically do what you want for all
users.
I don't think so. The linker will omit code/data that the user
doesn't reference, but ISRs are by definition always referenced, and
THEY reference the other serial code and data.

But the references to ISRs are weak (by declaration in the run-time
library). Consider this... If ISRs are always referenced, why aren't
they included in an empty sketch?

In theory, I'd say William is right. The ISR macro sets the following
attributes:

attribute ((signal,used,externally_visible))

(this is from avr-libc 1.8.0, I didn't check older versions).

Note that there is no weak in there, nor does the Arduino code seem to
declare the ISRs weak. It does declare the serialEvent() function
weak, perhaps that's were you got the idea? Also, todo.txt says:

    Add weak attribute to signal handlers: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1203798214

so it was an idea at some point (but the topic is from 2008...).

The avr docs say:

    externally_visible
        This attribute, attached to a global variable or function,
        nullifies the effect of the -fwhole-program command-line
        option, so the object remains visible outside the current
        compilation unit.

    signal
        Use this attribute on the AVR to indicate that the specified
        function is an interrupt handler. The compiler generates
        function entry and exit sequences suitable for use in an
        interrupt handler when this attribute is present.

    used
        This attribute, attached to a function, means that code must
        be emitted for the function even if it appears that the
        function is not referenced. This is useful, for example,
        when the function is referenced only in inline assembly.
    weak
        The weak attribute causes the declaration to be emitted as a
        weak symbol rather than a global. This is primarily useful
        in defining library functions that can be overridden in user
        code, though it can also be used with non-function
        declarations. Weak symbols are supported for ELF targets,
        and also for a.out targets when using the GNU assembler and
        linker.

Of these, it seems that "used" should prevent an ISR from being silently
discarded when it is unused. From the docs, it doesn't look like "weak"
should affect this at all.

However, in practice Brian seems to be right: If the HardwareSerial
objects are not referenced, the memory and ISRs for them are not
emitted. This currently seems like a all-or-nothing deal: if you
reference any of the Serial ports, you get them all.

I don't quite understand why this happens, though. I would actually
expect that the compiler (because of the "used" attribute) would always
keep the ISRs and since the ISRs reference the buffers, it would keep
those as well. Alternatively, the serialEventRun function refers to all
available Serial objects as well, so that should also keep all of them
around (even when none of them are referenced). serialEventRun is
declared as weak, so perhaps that is somehow influencing things (in an
undocumented manner).

If the ISRs would somehow not be marked as used after all (which sounds
dangerous to me, wouldn't the compiler than magically remove ISRs which
it thinks are unused?), then I would expect each HardwareSerial instance
to be treated separately. The compilation process uses
"-ffunction-sections -fdata-sections", which should cause all
HardwareSerial instances to be put into their own sections. The gcc
manpage says:

    -ffunction-sections
    -fdata-sections
            Place each function or data item into its own section in
            the output file if the target supports arbitrary
            sections.  The name of the function or the name of the
            data item determines the section's name in the output
            file.

Furthermore, linking happens with -Wl,--gc-sections,--relax which should
cause unused sections to be left out. The ld manpage says:

   --gc-sections
   --no-gc-sections
       Enable garbage collection of unused input sections. [...]

       --gc-sections decides which input sections are used by
       examining symbols and relocations.  The section containing
       the entry symbol and all sections containing symbols
       undefined on the command-line will be kept, as will sections
       containing symbols referenced by dynamic objects.  Note that
       when building shared libraries, the linker must assume that
       any visible symbol is referenced.  Once this initial set of
       sections has been determined, the linker recursively marks as
       used any section referenced by their relocations.  See
       --entry and --undefined.

--relax apparently causes using RJMP instead of JMP, but seems
unrelated.

As for the original suggestion: Separating the different HardwareSerial
instances into different files. After seeing a comment by Paul
yesterday, I already tried moving the definition of Serial1 into a
separate file, which didn't help. Just now I realized I forgot the ISRs,
so I also moved those to the new file, which didn't help either.
However, when I also removed the following line form HardwareSerial.cpp,
things did work as we'd like (memory for Serial1 was only allocated when
Serial1 was actually used from the sketch).

    if (Serial1.available()) serialEvent1();

Note that I couldn't move this line to the new file, since it's inside a
function. Perhaps we could do some other weak function magic to work
around this somehow. Also note that I had to move both the ISRs and the
Serial1 definition to the new file, either of them alone didn't work.

This suggests that the compiler is somehow doing a file-by-file garbage
collection, and does not properly listen to the "used" function
attribute.

Anyone else has some idea about what is happening exactly here? I'll dig
around a bit further, but I think this is long enough as it is now
already ;-p

And here's the second post:

Hey folks,

I did some more testing and I think I found out what happens wrt
automatically including and excluding the Serial stuff. This is what
happens:

  1. All core .cpp files are compiled. This produces a single .o file.
    Because of -fdata-sections and -ffunction-sections, each function and
    variable within the .cpp file ends up in its own section.
  2. The .cpp file is included into core.a. core.a is an archive, which
    is simply a collection of object files (i.e. this is not a flat
    collection of sections or symbols, the contents of the archive are
    still grouped by the original .o files).
  3. The sketch is compiled. This produces a single .o file.
  4. The sketch and core.a are linked together to produce the final
    program. The linker always includes all .o files in the link. For
    each undefined reference, it looks in core.a. If it finds the
    missing symbol, it takes the entire object file from the archive
    and includes it in the link.
    This mechanism is what was historically
    used to discard unused functions (before -ffunction-sections,
    -fdata-sections and --gc-sections were available), by putting each
    function into its own .c file).
  5. Because --gc-sections is specified, the linker now figures out if it
    can omit any sections out of the object files it selected for
    linking in the previous step.

See also the avr-libc documentation tat documents part of what
I've found out. Apparently it's mostly "just how the linker works",
but it's also good to have an avr-specific doc confirm that :-)

Weak functions can mean two things:

  • A weak function definition can be replaced with a strong function
    definition.
  • A weak function reference means the compiler won't mind leaving it
    unfulfilled during link time. In particular, it won't include an
    object file from core.a in step 4 just to fulfill a weak reference.

This seems to be properly documented at 5, and even though it's for
ARM, it seems to apply to AVR as well.

In our case, main() contains a weak reference to serialEventRun. This
means that if the main Sketch doesn't reference any HardwareSerial, the
linker won't pull in HardwareSerial.o and the serialEventRun symbol is
left undefined, which is checked at runtime by main():

    if (serialEventRun) serialEventRun();

Furthermore, an ISR is declared with the used attribute. However, this
attribute only takes effect in step 5 above. If the linker pulls in the
ISR, then gc-sections won't be able to remove it. However, if the ISR is
never pulled in to satisfy another dependency, then it's "used"
attribute won't be seen.

Using the above, we should be able to make the "allocate only when
needed" approach work. We should:

  • Create a HardwareSerialn.cpp for each of the four serial ports.
  • Move the ISRs and Serialn definition to the new files
  • Include a new, Serialn_available() function (that just calls
    Serialn.available()) and let SerialEventRun() weakly refer to those
    instead of the direct Serialn.available() that it has now.

This should make sure that if any method on a HardwareSerial object is
used, all of its methods, its buffers and its ISRs are pulled in by the
linker. Additionally, because of --gc-sections any methods on Serial
that are unused should still be discarded for the final link.

I might have a bit of time to try and implement this tonight, not sure
yet :-)

In the meantime, I've whipped up an implementation that seems to work.
I need to clean up a few things and do some reordering, I'll push the
changes ASAP.

@matthijskooijman
Copy link
Collaborator Author

I (force) updated this pullrequest again, with the following changes:

  • I dropped the rename Serial to Serial0 patch. I wasn't sure if it's was a good idea and I realized that for consistency, the serialEvent() function should also be renamed. I couldn't think of any way to do this, without using a macro to sneakily change the name of the function defined by the user, and I didn't like that one bit. Better just leave things as-is.
  • I separated all HardwareSerial instances into different .cpp files, so they now automatically get enabled only when used.
  • I dropped the DISABLE_HWSERIALx macro patch, since autodetecting is obviously better.
  • I renamed USE_HWSERIALx to HAVE_HWSERIALx, since it now always reflects the presence of the port, not if its used or not.
  • I added HAVE_CDCSERIAL and added an error message for (currently non-existing) targets with both UART 0 and CDC serial.

Note that github orders the commit in weird way, the atomic commit is really the topmost commit and it wasn't changed (except being rebased).

@matthijskooijman
Copy link
Collaborator Author

I just did another force push. I accidentally prepended some unrelated commits in the previous push, which are removed now. Nothing else changed since yesterday.

There are two begin methods, one which accepts just a baud rate and
uses the default bit settings and one which accepts both a baudrate and
a bit config. Previously, both of these contained a complete
implementation, but now the former just calls the latter, explicitely
passing the default 8N1 configuration.

Technically, this causes a small change: Before the UCSRC register was
untouched when calling begin(baud), now it is explicitely initialized
with 8N1. However, since this is the default configuration for at least
the Uno and the Mega (didn't check any others), probably for all avrs,
this shouldn't effectively change anything. Given that the Arduino
documentation also documents this as the default when none is passed,
explicitly setting it is probably a good idea in any case.
This simplifies the baud rate calculation, removing the need for a goto
and shortening the code a bit. Other than that, this code should not use
any different settings than before.

Code was suggested by Rob Tillaart on github.

Closes: arduino#1262
Recent avr-libc releases define one, but this allows using it also on
older avr-libc releases.
Previously, the constants to use for the bit positions of the various
UARTs were passed to the HardwareSerial constructor. However, this
meant that whenever these values were used, the had to be indirectly
loaded, resulting in extra code overhead. Additionally, since there is
no instruction to shift a value by a variable amount, the 1 << x
expressions (inside _BV and sbi() / cbi()) would be compiled as a loop
instead of being evaluated at compiletime.

Now, the HardwareSerial class always uses the constants for the bit
positions of UART 0 (and some code is present to make sure these
constants exist, even for targets that only have a single unnumbered
UART or start at UART1).

This was already done for the TXC0 constant, for some reason. For the
actual register addresses, this approach does not work, since these are
of course different between the different UARTs on a single chip.

Of course, always using the UART 0 constants is only correct when the
constants are actually identical for the different UARTs. It has been
verified that this is currently the case for all targets supported by
avr-gcc 4.7.2, and the code contains compile-time checks to verify this
for the current target, in case a new target is added for which this
does not hold. This verification was done using:

for i in TXC RXEN TXEN RXCIE UDRIE U2X UPE; do echo $i; grep --no-filename -r "#define $i[0-9]\? " /usr/lib/avr/include/avr/io* | sed "s/#define $i[0-9]\?\s*\(\S\)\+\s*\(\/\*.*\*\/\)\?$/\1/" | sort | uniq ; done

This command shows that the above constants are identical for all uarts
on all platforms, except for TXC, which is sometimes 6 and sometimes 0.
Further investigation shows that it is always 6, except in io90scr100.h,
but that file defines TXC0 with value 6 for the UART and uses TXC with
value 0 for some USB-related register.

This commit reduces program size on the uno by around 120 bytes.
The actual interrupt vectors are of course defined as before, but they
let new methods in the HardwareSerial class do the actual work. This
greatly reduces code duplication and prepares for one of my next commits
which requires the tx interrupt handler to be called from another
context as well.

The actual content of the interrupts handlers was pretty much identical,
so that remains unchanged (except that store_char was now only needed
once, so it was inlined).

Now all access to the buffers are inside the HardwareSerial class, the
buffer variables can be made private.

One would expect a program size reduction from this change (at least
with multiple UARTs), but due to the fact that the interrupt handlers
now only have indirect access to a few registers (which previously were
just hardcoded in the handlers) and because there is some extra function
call overhead, the code size on the uno actually increases by around
70 bytes. On the mega, which has four UARTs, the code size decreases by
around 70 bytes.
This is slightly more clear than the previous explicit comparison.
The flush() method blocks until all characters in the serial buffer have
been written to the uart _and_ transmitted. This is checked by waiting
until the "TXC" (TX Complete) bit is set by the UART, signalling
completion. This bit is cleared by write() when adding a new byte to the
buffer and set by the hardware after tranmission ends, so it is always
guaranteed to be zero from the moment the first byte in a sequence is
queued until the moment the last byte is transmitted, and it is one from
the moment the last byte in the buffer is transmitted until the first
byte in the next sequence is queued.

However, the TXC bit is also zero from initialization to the moment the
first byte ever is queued (and then continues to be zero until the first
sequence of bytes completes transmission). Unfortunately we cannot
manually set the TXC bit during initialization, we can only clear it. To
make sure that flush() would not (indefinitely) block when it is called
_before_ anything was written to the serial device, the "transmitting"
variable was introduced.

This variable suggests that it is only true when something is
transmitting, which isn't currently the case (it remains true after
transmission is complete until flush() is called, for example).
Furthermore, there is no need to keep the status of transmission, the
only thing needed is to remember if anything has ever been written, so
the corner case described above can be detected.

This commit improves the code by:
 - Renaming the "transmitting" variable to _written (making it more
   clear and following the leading underscore naming convention).
 - Not resetting the value of _written at the end of flush(), there is
   no point to this.
 - Only checking the "_written" value once in flush(), since it can
   never be toggled off anyway.
 - Initializing the value of _written in both versions of _begin (though
   it probably gets initialized to 0 by default anyway, better to be
   explicit).
…hile

It turns out there is an additional corner case. The analysis in the
previous commit wrt to flush() assumes that the data register is always
kept filled by the interrupt handler, so the TXC bit won't get set until
all the queued bytes have been transmitted. But, when interrupts are
disabled for a longer period (for example when an interrupt handler for
another device is running for longer than 1-2 byte times), it could
happen that the UART stops transmitting while there are still more bytes
queued (but these are in the buffer, not in the UDR register, so the
UART can't know about them).

In this case, the TXC bit would get set, but the transmission is not
complete yet. We can easily detect this case by looking at the head and
tail pointers, but it seems easier to instead look at the UDRIE bit
(the TX interrupt is enabled if and only if there are bytes in the
queue). To fix this corner case, this commit:
 - Checks the UDRIE bit and only if it is unset, looks at the TXC bit.
 - Moves the clearing of TXC from write() to the tx interrupt handler.
   This (still) causes the TXC bit to be cleared whenever a byte is
   queued when the buffer is empty (in this case the tx interrupt will
   trigger directly after write() is called). It also causes the TXC bit
   to be cleared whenever transmission is resumed after it halted
   because interrupts have been disabled for too long.

As a side effect, another race condition is prevented. This could occur
at very high bitrates, where the transmission would be completed before
the code got time to clear the TXC0 register, making the clear happen
_after_ the transmission was already complete. With the new code, the
clearing of TXC happens directly after writing to the UDR register,
while interrupts are disabled, and we can be certain the data
transmission needs more time than one instruction to complete. This
fixes arduino#1463 and replaces arduino#1456.
When interrupts are disabled, writing to HardwareSerial could cause a
lockup. When the tx buffer is full, a busy-wait loop is used to wait for
the interrupt handler to free up a byte in the buffer. However, when
interrupts are disabled, this will of course never happen and the
Arduino will lock up. This often caused lockups when doing (big) debug
printing from an interrupt handler.

Additionally, calling flush() with interrupts disabled while
transmission was in progress would also cause a lockup.

When interrupts are disabled, the code now actively checks the UDRE
(UART Data Register Empty) and calls the interrupt handler to free up
room if the bit is set.

This can lead to delays in interrupt handlers when the serial buffer is
full, but a delay is of course always preferred to a lockup.

Closes: arduino#672
References: arduino#1147
Before, the interrupt was disabled when it was triggered and it turned
out there was no data to send. However, the interrupt can be disabled
already when the last byte is written to the UART, since write() will
always re-enable the interrupt when it adds new data to the buffer.

Closes: arduino#1008
Before, this decision was made in few different places, based on
sometimes different register defines.

Now, HardwareSerial.h decides wich UARTS are available, defines
USE_HWSERIALn macros and HardwareSerial.cpp simply checks these macros
(together with some #ifs to decide which registers to use for UART 0).
For consistency, USBAPI.h also defines a HAVE_CDCSERIAL macro when
applicable.

For supported targets, this should change any behaviour. For unsupported
targets, the error messages might subtly change because some checks are
moved or changed.

Additionally, this moves the USBAPI.h include form HardareSerial.h into
Arduino.h and raises an error when both CDC serial and UART0 are
available (previously this would silently use UART0 instead of CDC, but
there is not currently any Atmel chip available for which this would
occur).
By putting the ISRs and HardwareSerial instance for each instance in a
separate compilation unit, the compile will only consider them for
linking when the instance is actually used. The ISR is always referenced
by the compiler runtime and the Serialx_available() function is always
referenced by SerialEventRun(), but both references are weak and thus do
not cause the compilation to be included in the link by themselves.

The effect of this is that when multiple HardwareSerial ports are
available, but not all are used, buffers are only allocated and ISRs are
only included for the serial ports that are used. On the mega, this
lowers memory usage from 653 bytes to just 182 when only using the first
serial port.

On boards with just a single port, there is no change, since the code
and memory was already left out when no serial port was used at all.

This fixes arduino#1425 and fixes arduino#1259.
This prevents a potential race condition where for example write() would
detect that there is room in the tx buffer, an interrupt would trigger
where the interrupt handler writes to Serial filling up that room, and
write() would then reset the head pointer, effectively throwing away the
bytes written by the interrupt handler.

References: arduino#1147
This helps improve the effective datarate on high (>500kbit/s) bitrates,
by skipping the interrupt and associated overhead. At 1 Mbit/s the
implementation previously got up to about 600-700 kbit/s, but now it
actually gets up to the 1Mbit/s (values are rough estimates, though).
@matthijskooijman
Copy link
Collaborator Author

And another update:

cmaglie added a commit to cmaglie/Arduino that referenced this pull request Jan 22, 2014
Moreover, declaring pointers-to-registers as const and using initializer
list in class constructor allows the compiler to further improve inlining
performance.

This change recovers about 50 bytes of program space on single-UART devices.

See arduino#1711
@cmaglie
Copy link
Member

cmaglie commented Jan 22, 2014

hi @matthijskooijman

I'm staging this pull request for merging here:
https://github.com/cmaglie/Arduino/commits/serial-patch-2

I picked all your commits except the last two:
Make all access to the HardwareSerial ringbuffers atomic matthijskooijman@e1599f3
In HardwareSerial::write, bypass the queue when it's empty matthijskooijman@6e03352

There is an additional commit 275c0a0 that inlines the RX ISR. I previously tried to inline also the TX ISR, but as you pointed out in your devlist comment the commit:
Fix lockup when writing to HardwareSerial with interrupts disabled matthijskooijman@eb86af7
adds extra calls to the TX ISR making the code much more bigger.

@matthijskooijman
Copy link
Collaborator Author

Your extra commit looks good. Smart to move code like this to help the compiler. I haven't tested it, but I see what you're doing and it looks good.

Not including the atomicity commit sounds good, it seems there is no consensus about whether that is a good idea. I'll see if I can figure out what problems could potentially occur when printing from an ISR now. If it's just messing up output, I guess it's ok. If it's also possible to deadlock, we might want to see if we can fix that still. How does that sound?

What's the reason not to include the bypass commit?

@cmaglie
Copy link
Member

cmaglie commented Jan 22, 2014

Sounds good.

I've just added the "bypass" commit too... I don't know why but for some reasons I thought it was dependent on the "atomicity" commit.

@cmaglie
Copy link
Member

cmaglie commented Jan 28, 2014

Merged, finally :-)

@cmaglie cmaglie closed this Jan 28, 2014
@matthijskooijman
Copy link
Collaborator Author

Thanks! :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants