Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HardwareSerial library sometimes sends the same data twice #120

Closed
johnholman opened this issue Aug 30, 2015 · 54 comments
Closed

HardwareSerial library sometimes sends the same data twice #120

johnholman opened this issue Aug 30, 2015 · 54 comments
Assignees

Comments

@johnholman
Copy link
Contributor

Running a Arduino Mega 2560 program that makes heavy use of interrupts and that reports by sending lines of text (about 150 characters) in the main loop via the serial port at 115200 baud, I noiticed that additional spurious characters were sent on average every 2000 lines or so. Looking at the problem in more detail, what was transmitted was a line of exactly 66 characters consisting of the first few characters of the new line followed by last part of the line that had already been sent. The next line missed the first two characters, and then following ones were normal until the next occurrence.

These symptoms seemed consistent with void HardwareSerial::_tx_udr_empty_irq(void) occasionally being invoked at a time when the transmit buffer is in fact empty (i.e. _tx_buffer_head == _tx_buffer_tail), even though this should never happen. So I added some instrumentation:

void HardwareSerial::_tx_udr_empty_irq(void)
{
  // the buffer shouldn't be empty if we get here, but ...
  if (_tx_buffer_head == _tx_buffer_tail) {
    serial_errors++;

where serial_errors is defined in HardwareSerial.h as a private member

volatile uint8_t serial_errors;

and added a method to read and print the error count as part of the main loop.

This confirmed that the problem is indeed associated with the ISR being entered at a time when the buffer is empty, as the error count increments by one each time the problem occurs.

My original code is quite complicated, but I've been able to replicate the problem with a simple testcase in which the loop prints as before while a timer interrupt runs every 0.5ms with an associated ISR running for 0.3ms. The error seems to occur more often at higher baud rates - at 500,000 baud it occurs every few seconds. I can provide the testcase if required.

As a workaround, I've added this section to the start of _tx_udr_empty_irq:

void HardwareSerial::_tx_udr_empty_irq(void)
{
  // the buffer shouldn't be empty if we get here, but ...
  if (_tx_buffer_head == _tx_buffer_tail) {
    serial_errors++;
    // empty buffer so the interrupt should be disabled. Make it so and return.
    cbi(*_ucsrb, UDRIE0);
    return;
  }
...

This fix seems to work for me, but I'm certainly not convinced that it is correct in all circumstances.

I've not been able to figure out why the ISR is ever entered in this situation. Most but not all of the errors disappear if you comment out the part in the busy wait loop in write() that calls the ISR directly when interrupts are disabled globally. (No, I'm not printing to the serial port from an ISR!) Almost all disappear if in addition you comment out the optimisation that writes directly into the output register when the buffer is empty. However I've seen the problem very occasionally (but not with the test case) even when both optimisations are taken out.

@PaulStoffregen
Copy link
Contributor

I can provide the testcase if required.

Yes, please do post a ready-to-run test case!

@johnholman
Copy link
Contributor Author

OK .. not sure about github conventions and I can't see how to attach files so just pasting here:

Arduino code - serial_test.ino:

unsigned long nlines = 0;
#define MLEN 100
char diagmsg[MLEN+1];

volatile unsigned long cnt = 0;
volatile unsigned long duration = 0;

void setup() {
  Serial.begin(500000);
  delay(100);

 cli();
  TCCR4A = 0;
  TCCR4B = (1 << WGM42) | (1 << CS40) | (1 << CS42); // prescale by 1024 and set CTC mode 
  TCNT4  = 0;
  OCR4A = 8;  // compare with 8 to get interrupt every 0.5 ms
  TIMSK4 = (1 << OCIE4A);  // enable timer overflow interrupt (and no others)
  sei(); 
}

void loop() {
  delay(20);
  nlines++;  
  char c = '0' + nlines % 10;
  int i=0;
  while(i<MLEN) {
    diagmsg[i] = c;
    i++;
  }
  diagmsg[i] = '\0';
  unsigned long start = micros();
  Serial.print(diagmsg);
  Serial.print(" ");
  Serial.print(micros()-start);
  Serial.print(" ");
  Serial.print(cnt);
  Serial.print(" ");
  Serial.print(duration);
  Serial.print(" ");
  Serial.print(Serial.error_count());
  Serial.print(" ");
  Serial.println(nlines);
}

#define NLOOP 200
void test_isr() {
  unsigned long start = micros();
  for (int i=0; i<NLOOP; i++) {
      cnt++;    
  }
  duration = micros() - start;
}

ISR(TIMER4_COMPA_vect) {
  test_isr();
}

Python client, serial_test.py

from __future__ import print_function

import serial
import time

if __name__ == "__main__":

    link = serial.Serial('COM7', 500000)
    print(link)

    try:
        n = 0
        while True:
            n += 1
            start = time.clock()            
            response = link.readline()
            stop = time.clock()
            latency = (stop - start)*1000
            nchars = len(response)
            print("%d %5.2f ms nchars: %d: response: %s"  %(n, latency, nchars, response))
    finally:
        print("closing serial link")
        link.close()

Note that the test case assumes that HardwareSerial.h and .cpp have been amended to add the
instrumentation I mentioned above. Just comment out the line
Serial.print(Serial.error_count());
to see the symptoms but not the diagnostic.

@mattskimi
Copy link

Hi, I have the same issue. Sometimes the logic in the hardwareserial::write fails when the tx buffer is empty. I commented out the speedup code when the buffer is empty and made a flush() before reading ack:s from my device, then it started to work. My assumption is that the pointer overruns and the whole buffer is resent. I did not have this problem with version 1.0.x of IDE.
/*******
if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(__ucsra, UDRE0)) {
*_udr = c;
sbi(__ucsra, TXC0);
return 1;
}
******/

@NicoHood
Copy link
Contributor

I also had a serial error. Different than this but it makes sense that this is the same reason.

I know its a lot but you can read this up here:
arduino/Arduino#3207 (comment)

as @mattskimi said its caused by this code. gcc seems (from what ive tested) to maybe optimize the code wrong. Then the data is a) sent twice and sbi(*_ucsrb, UDRIE0); is executed even though its not needed. this causes flush to stuck forever.

The code above seems to not return. For now I dont know why this happens. I have some workaround to fix my issue, but after reading this here it seems that I then still have doubled bytes (havent tested the output, since I send the same data byte).

@NicoHood
Copy link
Contributor

Replacing the udr pointer with the register fixes the issue for me:
screenshot from 2015-09-15 17 18 34

It seems that (on the left side) it loads the main class pointer, adds the value for the udr pointer (16) and subtracts it back (17), but the subtraction is wrong. And because of this the offset is -1 for _ucsra.

Files:
https://gist.github.com/NicoHood/288e6ee609954d60ab85

@NicoHood
Copy link
Contributor

Maybe some inline asm can help? I tried but got a compile error:

  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    //*_udr = c;
    asm (
        "st %a[_udr], %[c]\n\t"     // (2) Save byte in buffer
        // Outputs
        : [_udr] "+e" (_udr) // Input and output
        // Inputs
        :  [c] "r" (c)
    );
    sbi(*_ucsra, TXC0);
    return 1;
  }
cores/arduino/HardwareSerial.cpp: In member function 'virtual size_t HardwareSerial::write(uint8_t)':
cores/arduino/HardwareSerial.cpp:229:9: error: read-only member 'HardwareSerial::_udr' used as 'asm' output
        );
         ^
Error compiling.

@matthijskooijman
Copy link
Collaborator

So effectively, the sbi(*_ucsra, TXC0) fails because the _ucsra is loaded from the wrong place. When constructing the address of _ucsra, this address is 1 too small. Instead of loading _ucsra, it loads the last byte of _ubrrl and the first byte of _ucsra. On a 328p, _ubrrl is 0x00C4 and _ucsra is 0x00C0. Note that AVR is little endian, so this loads the MSB of _ubrrl (0x00) and the LSB of _ucsra (0xC0) and interprets the result as 0xC000. The two-byte value at that address is loaded, which is then used as the value of _ucsra, so it is dereferenced and the result is written to.

Since 0xC000 is out-of-bounds for a 328p, we need to wonder how AVR handles this. By dumping the entire memory, as well as a bunch of bytes after it using this sketch, you can see that any reads after the end of the memory (0x900 and onward) return the same thing as reads from 0x100 and onward. Since the SRAM is mapped from 0x100 to 0x8ff, it seems that the SRAM region wraps around. This means that 0xC00 really reads from address hex((0xc000 - 0x100) % 0x800 + 0x100) = 0x800, which is some address well on the stack. This does not seem to be any particular value, so the address of _ucsra that is used (and written to) is just random, which doesn't really explain the behaviour we're seeing.

However, trying to print *((uint16_t)0xc000) gives me 0x0000, which is different from what's on 0x800, so perhaps the SRAM only wraps once? If reading 0xc000 would always return 0x0, then sbi(*_ucsra, TXC0) would set a bit in r0, would that explain this behaviour? No time to look more closely at this right now.

Of course, the TXC0 bit is not cleared due to this bug, but I'm not sure how this would cause the described behaviour (since this bit is pretty much only used inside flush()).
'0x800'

@NicoHood
Copy link
Contributor

Have you looked at the other assembly? Maybe this isnt the only place that is wrong. It solves my issue/test case. But maybe there is more wrong with the code. Maybe also in other functions.

One could try the newer 5.1 and see if that fixes the issue. If not, this is a very poor issue to fix. I am afraid that this could occur on other places as well.
arduino/Arduino#660 (comment)

@NicoHood
Copy link
Contributor

5.1 with LTO doesnt fix the problem. It lacks some debug information in the output, but now I know the place to search for:

 2e4:   0c c0           rjmp    .+24        ; 0x2fe <_ZN14HardwareSerial5writeEh+0xac>
 2e6:   d6 01           movw    r26, r12
 2e8:   56 96           adiw    r26, 0x16   ; 22
 2ea:   ed 91           ld  r30, X+
 2ec:   fc 91           ld  r31, X
 2ee:   57 97           sbiw    r26, 0x17   ; 23

The sketch was also tested on a real board!

output:
https://gist.github.com/NicoHood/10a15742a3c76dda3755

tested sketch:
https://gist.github.com/NicoHood/e3d47e5e994a10793688

@NicoHood
Copy link
Contributor

virtual size_t write(uint8_t) __attribute__((optimize("O3")));
Seems to not work. It was worth a try.

O3 output:
https://gist.github.com/NicoHood/1e35660adb92686321d1

Also tested O0 which seems to work \o/ (adds a lot more code)
https://gist.github.com/NicoHood/e664f39e4142a5f834a4

The test were made via uploading and trying. I did not look at the asm this time since it changed too much and I dont want to do that now.

Rearranging the class members did not help with a (single) quick test.

Should we report this upstream? This is a very bad testcase I think. At least one doesnt need an arduino to see the asm is incorrect.

@matthijskooijman
Copy link
Collaborator

I guess that reducing the code to a smaller testcase would be the next step, so we can report a bug at gcc. Here's the steps I'd take:

  • Take just Print.h, Stream.h, HardewareSerial.h and HardwareSerial.cpp, and make that compile (possibly adding some includes and type definitions if needed).
    • Remove all but the write() method from HardwareSerial.cpp. Perhaps this can be done in the first step already, if that makes it easier to get it compiling.
  • No longer let HardwareSerial be a subclass of Stream (I suspect this will stop the bug from occuring, though I'm not sure).
  • Remove all unused members from HardwareSerial
  • Remove as much code as possible from write()

After every modification, check the generated assembly to see if the bug still exists. If the bug disappears, undo the last change and try to make the change less invasive, or skip to the next change.

I don't have time to dive in right now, perhaps @NicoHood can?

@matthijskooijman
Copy link
Collaborator

Also, disabling optimization for a single function using an attribute was smart to try - that might very well be a useful workaround until we can get a fixed compiler. However, reducing the example code should be a first step, since this will also offer insight into what the essential parts are that cause this bug, which might help to figure out a less invasive workaround, and perhaps allows searching the source code for other instances of this bug.

@NicoHood
Copy link
Contributor

If you tell me how to compile those functions outside the IDE I can do this. I have linux with avr-gcc 4.8.1 and also 5.1 available. I just need to know how to manually compile them and get an asm output. Then I can try to debug it.

@matthijskooijman
Copy link
Collaborator

Basically just take the avr-gcc command that Arduino uses for HardwareSerial.cpp (from the verbose output) and use that. That should produce a .o file (not linked into .elf yet), but that should already show this bug and can be disassembled using avr-objdump -S just like the .elf. Hope that helps?

@NicoHood
Copy link
Contributor

I setup a repo for this:
https://github.com/NicoHood/SerialIssueTest

I still use the IDE for easy debugging (for now).

It seems that you need the tx function and that it needs to also contain the sbi instruction.
https://github.com/NicoHood/SerialIssueTest/blob/master/cores/arduino/HardwareSerial.cpp#L15

Also notice it has nothing to do with inheritance nor with a virtual function.

ASM can be found here:
https://github.com/NicoHood/SerialIssueTest/blob/master/build7dadea4ec5c47a9be4550e81267f0286.tmp/asm.txt#L210-L213

I think it would make sense to analyze the asm, maybe the asm is correct, even though I dont think so. A compiler error would be very bad.

@NicoHood
Copy link
Contributor

Now a reduced asm analysis:
This code was compiled with the current IDE hourly build and no own modifications under linux.

The code was reduced to focus on the gcc bug, not the serial functions. The code looks like this:

//.h file
    volatile uint8_t * const _ucsra; // 0,1
    volatile uint8_t * const _ucsrb; // 2,3
    volatile uint8_t * const _udr; // 4,5

    volatile tx_buffer_index_t _tx_buffer_head; // 6
    volatile tx_buffer_index_t _tx_buffer_tail; // 7

//.cpp
void HardwareSerial::_tx_udr_empty_irq(void)
{
 sbi(*_ucsra, TXC0);
}

void HardwareSerial::write(uint8_t c)
{
  // If the buffer and the data register is empty, just write the byte
  // to the data register and be done. This shortcut helps
  // significantly improve the effective datarate at high (>
  // 500kbit/s) bitrates, where interrupt overhead becomes a slowdown.
  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    *_udr = c;
    sbi(*_ucsra, TXC0);
    return;
  }

  while (0== _tx_buffer_tail) {
    _tx_udr_empty_irq();
  }
  _tx_buffer[_tx_buffer_head] = c;

  return;
}

The asm looks like this with comments:

void HardwareSerial::_tx_udr_empty_irq(void)
{
 sbi(*_ucsra, TXC0);
 // sets bit as intended. is a real function, so not inlined
  b4:   dc 01           movw    r26, r24
  b6:   ed 91           ld  r30, X+
  b8:   fc 91           ld  r31, X
  ba:   80 81           ld  r24, Z
  bc:   80 64           ori r24, 0x40   ; 64
  be:   80 83           st  Z, r24
  c0:   08 95           ret

000000c2 <_ZN14HardwareSerial5writeEh>:
}

void HardwareSerial::write(uint8_t c)
{

// function setup
  c2:   0f 93           push    r16
  c4:   1f 93           push    r17
  c6:   cf 93           push    r28
  c8:   df 93           push    r29
  ca:   1f 92           push    r1

// load a pointer into Y
  cc:   cd b7           in  r28, 0x3d   ; 61
  ce:   de b7           in  r29, 0x3e   ; 62


  // If the buffer and the data register is empty, just write the byte
  // to the data register and be done. This shortcut helps
  // significantly improve the effective datarate at high (>
  // 500kbit/s) bitrates, where interrupt overhead becomes a slowdown.
  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {

  // load udr pointer address
  // copy r24,25 into r16,17
  // copy the same into 26,27 (Xpointer)
  d0:   8c 01           movw    r16, r24
  d2:   dc 01           movw    r26, r24

  // load head into r25
  d4:   16 96           adiw    r26, 0x06   ; 6
  d6:   9c 91           ld  r25, X
  d8:   16 97           sbiw    r26, 0x06   ; 6

  // load tail into r24
  da:   17 96           adiw    r26, 0x07   ; 7
  dc:   8c 91           ld  r24, X
  de:   17 97           sbiw    r26, 0x07   ; 7

  // compare head and tail, skip if equal
  // if head == tail, jump to
  e0:   98 13           cpse    r25, r24
  e2:   05 c0           rjmp    .+10        ; 0xee <_ZN14HardwareSerial5writeEh+0x2c>

  // load ucsra lo byte + high byte
  // X is now at 1
  e4:   ed 91           ld  r30, X+
  e6:   fc 91           ld  r31, X
  // get the value of ucsra
  e8:   80 81           ld  r24, Z

  // if bit in ucsra is set (if is true), jump +30 0x10c
  ea:   85 fd           sbrc    r24, 5
  ec:   0f c0           rjmp    .+30        ; 0x10c <_ZN14HardwareSerial5writeEh+0x4a>
    *_udr = c;
    sbi(*_ucsra, TXC0);
    return;
  }

// while loop
  while (0== _tx_buffer_tail) {
  ee:   f8 01           movw    r30, r16
  f0:   87 81           ldd r24, Z+7    ; 0x07
  f2:   81 11           cpse    r24, r1
  f4:   06 c0           rjmp    .+12        ; 0x102 <_ZN14HardwareSerial5writeEh+0x40>
    _tx_udr_empty_irq();
  f6:   c8 01           movw    r24, r16
  f8:   69 83           std Y+1, r22    ; 0x01
  fa:   0e 94 5a 00     call    0xb4    ; 0xb4 <_ZN14HardwareSerial17_tx_udr_empty_irqEv>
  fe:   69 81           ldd r22, Y+1    ; 0x01
 100:   f6 cf           rjmp    .-20        ; 0xee <_ZN14HardwareSerial5writeEh+0x2c>
  }
  _tx_buffer[_tx_buffer_head] = c;
 102:   86 81           ldd r24, Z+6    ; 0x06
 104:   e8 0f           add r30, r24
 106:   f1 1d           adc r31, r1
 108:   60 87           std Z+8, r22    ; 0x08

  return;
 10a:   0b c0           rjmp    .+22        ; 0x122 <_ZN14HardwareSerial5writeEh+0x60>


  // If the buffer and the data register is empty, just write the byte
  // to the data register and be done. This shortcut helps
  // significantly improve the effective datarate at high (>
  // 500kbit/s) bitrates, where interrupt overhead becomes a slowdown.
  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    *_udr = c;

    // function jumps here
    // reset X pointer from backup (beginning of the function)
 10c:   d8 01           movw    r26, r16


 10e:   14 96           adiw    r26, 0x04   ; 4
 // load udr lo byte
 110:   ed 91           ld  r30, X+
 // pointer is now at +5
 // load udr hi byte
 112:   fc 91           ld  r31, X

 // the pointer now points back at the low byte of ucsra, as intended
 // the increment above confused us.
 114:   15 97           sbiw    r26, 0x05   ; 5

 // save byte to udr
 116:   60 83           st  Z, r22

    sbi(*_ucsra, TXC0);
    // load the ucsra address
 118:   ed 91           ld  r30, X+
 11a:   fc 91           ld  r31, X

 // load byte into Z, set bit, write it back
 11c:   80 81           ld  r24, Z
 11e:   80 64           ori r24, 0x40   ; 64
 120:   80 83           st  Z, r24

    _tx_udr_empty_irq();
  }
  _tx_buffer[_tx_buffer_head] = c;

  return;
}

//function shutdown
 122:   0f 90           pop r0
 124:   df 91           pop r29
 126:   cf 91           pop r28
 128:   1f 91           pop r17
 12a:   0f 91           pop r16
 12c:   08 95           ret

This is not a compiler bug. If you look closer at this, the pointer increment is correct.
Also see this:

void HardwareSerial::write(uint8_t c)
{
    *_udr = c;
}

void HardwareSerial::write(uint8_t c)
{
    *_udr = c;
  b4:   dc 01           movw    r26, r24
  b6:   14 96           adiw    r26, 0x04   ; 4
  b8:   ed 91           ld  r30, X+
  ba:   fc 91           ld  r31, X
  bc:   15 97           sbiw    r26, 0x05   ; 5
  be:   60 83           st  Z, r22
  c0:   08 95           ret

@NicoHood
Copy link
Contributor

I also tried

  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    UDR0 = c;
    _NOP();
    _NOP();
    _NOP();
    _NOP();
    _NOP();
    _NOP();
    _NOP();
    _NOP();
    _NOP();
    _NOP();
    sbi(*_ucsra, TXC0);
    return 1;
  }

But this still works. So I have no idea whats really wrong here!??!

The only thing I can imagine of is that st takes 2 cycles instead of sts (as 16 bit).
But since I also tested this (doesnt work), I have no idea what else to do:

  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    cli();
    *_udr = c;
    sei();
    sbi(*_ucsra, TXC0);
    return 1;
  }

@NicoHood
Copy link
Contributor

It must be a race condition or ISR problem. I still search for the reason, but if you take this example sketch, it works if you put cli() and sei() around write():

// arduino uno, 64 byte serial tx rx buffers, default

const int dledPin =  12;
const int ledPin =  13;
bool ledState = LOW;
unsigned long previousMillis = 0;

void setup() {
  pinMode(ledPin, OUTPUT);
  pinMode(dledPin, OUTPUT);
  pinMode(11, OUTPUT);
  pinMode(10, OUTPUT);
  pinMode(9, OUTPUT);
  digitalWrite(dledPin, 0);
  Serial.begin(2000000);
}

void loop() {
  unsigned long currentMillis = millis();

  if ((currentMillis - previousMillis) >= 1000) {

    ledState = !ledState;

    digitalWrite(dledPin, 1);
    for (int i = 0; i < 500; i++) {
      cli();
      Serial.write('a');
      sei();
      //digitalWrite(dledPin, 0);
      Serial.flush();
      //digitalWrite(dledPin, 1);
      delay(6);
    }
    digitalWrite(dledPin, 1);

    previousMillis = millis();

    digitalWrite(ledPin, ledState);

  }
}

@NicoHood
Copy link
Contributor

Possible fix, please try this. Sorry for the many posts, but before I forget something, I better write it down here.

  cli();
  bool emptyBuffer = _tx_buffer_head == _tx_buffer_tail;
  sei();
  if (emptyBuffer && bit_is_set(*_ucsra, UDRE0)) {
    *_udr = c;
    sbi(*_ucsra, TXC0);
    return 1;
  }

It seems to work for my testcase, even though it does not make sense to me (right now) why this should fix anything. The ISR only modifies the tail, not the head. And if we read the head first and then the tail that should be fine!?

@PaulStoffregen
Copy link
Contributor

Could it be related to sbi() not being atomic for SFRs not in the low 32 byte I/O address space?

@NicoHood
Copy link
Contributor

Comparison cli() sei() versus normal version. (avr-gcc 5.1 lto, custom, older IDE, but shouldnt matter)
screenshot from 2015-09-18 21 20 34

The weird thing is why several workarounds fix this issue. Replacing the pointer with a real register also works. I have no idea. Maybe there is a race condition which is avoided with these additional instructions, but the nop test also failed here. Maybe I placed the nops wrong (before instead of after the sbi). I have no idea. But the logic seems correct to me.

@NicoHood
Copy link
Contributor

Paul that might be a good idea. This also works for my testcase:

size_t HardwareSerial::write(uint8_t c)
{
  _written = true;
  // If the buffer and the data register is empty, just write the byte
  // to the data register and be done. This shortcut helps
  // significantly improve the effective datarate at high (>
  // 500kbit/s) bitrates, where interrupt overhead becomes a slowdown.
  //cli();
  bool emptyBuffer = _tx_buffer_head == _tx_buffer_tail;
  //sei();
  if (emptyBuffer && bit_is_set(*_ucsra, UDRE0)) {
    cli();
    *_udr = c;
    sbi(*_ucsra, TXC0);
    sei();
    return 1;
  }
  tx_buffer_index_t i = (_tx_buffer_head + 1) % SERIAL_TX_BUFFER_SIZE;

  // If the output buffer is full, there's nothing for it other than to 
  // wait for the interrupt handler to empty it a bit
  while (i == _tx_buffer_tail) {
    if (bit_is_clear(SREG, SREG_I)) {
      // Interrupts are disabled, so we'll have to poll the data
      // register empty flag ourselves. If it is set, pretend an
      // interrupt has happened and call the handler to free up
      // space for us.
      if(bit_is_set(*_ucsra, UDRE0))
    _tx_udr_empty_irq();
    } else {
      // nop, the interrupt handler will free up space for us
    }
  }

  _tx_buffer[_tx_buffer_head] = c;
  _tx_buffer_head = i;
    cli();
  sbi(*_ucsrb, UDRIE0);
  sei();
  return 1;
}

I think we also have this problem in the loop. At least there I could explain what happens. But since we never disable ISRs it doesnt solve our problem, but should be considered still:

Pretending:
The ucsra register is loaded. The bit is set. Now the ISRS occurs, the last byte got flushed. If is executed, byte is sent again. While loop will break, head will be incremented once while the tail got incremented twice. We would miss the whole buffer than because head is now the tail. Does this sound correct? However this could cause an issue for sure, but not in my testcase since I do not disable interrupts in the loop().

  // If the output buffer is full, there's nothing for it other than to 
  // wait for the interrupt handler to empty it a bit
  while (i == _tx_buffer_tail) {
    if (bit_is_clear(SREG, SREG_I)) {
      // Interrupts are disabled, so we'll have to poll the data
      // register empty flag ourselves. If it is set, pretend an
      // interrupt has happened and call the handler to free up
      // space for us.
      cli();
      bool flag = bit_is_set(*_ucsra, UDRE0);
      sei();
      if(flag){
    _tx_udr_empty_irq();
    }
    } else {
      // nop, the interrupt handler will free up space for us
    }
  }

@matthijskooijman
Copy link
Collaborator

Ok, so this is not a miscompilation after all. The difference between add and subtract in the assembly is correct, because X is incremented while it is being used. Good catch.

I'm surprised that disabling interrupts during sbi(*_ucsra, TXC0); helps, since the TXC0 flag is only used in flush(), so even if that flag is incorrect it shouldn't mess up transmission like it does. However, the UCSRA register contains more interrupt flags. Typically, interrupt flags are cleared by writing 1, so you would write _ucsra = (1 << TXC0) to clear that flag instead of using sbi(). Since sbi() preserves all other values, that would clear all currently pending flags (which are 1 and will be written 1) instead of just the TXC0 flag. This would apply to most interrupt flag registers, but UCSRA is a bit special because:

  • The register also contains 2 configuration bits (U2X and MPCM), so if you do _ucsra = (1 << TXC0) you would clear them.
  • All other interrupt flags in the register are automatically cleared, not by writing a 1. However, the datasheet says, for some (error) flags like frame error "Always set this bit to zero when writing to UCSRnA.". If those error flags ever get set, this would be violated.

Perhaps some erratic behaviour is caused because we are writing 1 to a (read-only) flag register? Perhaps doing this will actually cause the flag to be set? If so, and the interrupt flag is set just before sbi() loads UCSR0A, and the interrupt runs before writing UCSR0A, this could cause the flag to get set and the ISR to run a second time, which could explain what happens?

When you use the direct register name, I suspect that the actual sbi instruction will be used, which sets only a single bit at the hardware level, so this probably does not trigger this problem.

To fix this, you would have to make sure to only set selected bits in the UCSR0A register. Could you try replacing that sbi() line with:

_ucsra = _ucsra & ((1 << U2X0) | (1 << MPCM0)) | (1 << TXC0);

This should write UCSRA with the current value of the 2 configuration bits, and TXC0 to clear that flag, while writing 0 to all other bits, hopefully preventing this problem.

Your last point about the same problem occuring in the loop does not seem to hold, since the code in the loop only runs if interrupts are disabled (bit_is_clear(SREG, SREG_I)), so an interrupt cannot mess that up AFAICS.

@mikaelpatel
Copy link

I was inspired by this issue and did some work on the Cosa UART. I implemented a fast track and then a synchronization delay to achieve 99% effective baud-rate at 1 Mbps and 97% at 2 Mbps (Pro-micro Clone with Sparkfun FTDI, Linux). https://github.com/mikaelpatel/Cosa/blob/master/cores/cosa/Cosa/IOStream/Driver/UART.cpp#L88

There are several issues with the code above. One is actually the order of clearing the TXC and writing to UDR. They need to be in the correct order or in an atomic block as above.

To force race conditions I modified the Cosa UART Benchmark and added a Periodic Job with a period of as low as 100 us (10.000 interrupts per second). This really helped debug the code. https://github.com/mikaelpatel/Cosa/blob/master/examples/Benchmarks/CosaBenchmarkUART/CosaBenchmarkUART.ino

I also used a Logic Analyzer to study the behavior. Something that really took some time to get working correctly at higher baud-rates, 5000+ interrupts per second, and the fast track was flush(). https://github.com/mikaelpatel/Cosa/blob/master/cores/cosa/Cosa/IOStream/Driver/UART.cpp#L118

There is a spreadsheet with the results from the optimizations. https://github.com/mikaelpatel/Cosa/tree/master/examples/Benchmarks/CosaBenchmarkUART

Cheers!

@NicoHood
Copy link
Contributor

@matthijskooijman You are right, we should not use sbi to clear a flag:
http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_intbits

My "fixes" just seem like a workaround to fix this specific testcase. It might be random or just a race condition. Adding/changing the code a bit "fixes" the issue, because the timing then might be correct again. since we are in a very small loop with a high Serial speed avery clock cycle change could lead the ISR to trigger on the correct or wrong place. Maybe even NOPs could "fix" this testcase, but not the issue itself. I have not that knowledge yet about the serial and its registers.

As @mikaelpatel has stated the order might be totally wrong, but I dont know. It would be interesting to know if you have got a fix for this issue after your long testing experiments?

For the last part of your answer:
Yes, as I said its not a problem HERE in this case, but it WILL be a problem IF we would disable interrupts. So we should keep this in mind as well ;)

@matthijskooijman
Copy link
Collaborator

@mikaelpatel, I believe that this same issue is also present in the UART.cpp you linked, but you were probably aware of that (you're using |= instead of sbi(), but that's essentially the same).

@NicoHood, Did you get a chance to try the fix I suggested in my previous comment? AFAICS that should be a proper fix, if we analyzed the problem correctly now.

Yes, as I said its not a problem HERE in this case, but it WILL be a problem IF we would disable interrupts. So we should keep this in mind as well

Not entirely sure what you mean here. Do you mean that if write() was modified to disable interrupts, some problem will surface? Or when write() is called with interrupts disabled? And what problem then exactly?

One is actually the order of clearing the TXC and writing to UDR. They need to be in the correct order or in an atomic block as above.

First off, this seems to be the cause of the issue at hand, but it is very much related to this code, so let's discuss it here anyway. I actually think both the order needs to be correct and an atomic block is needed (I don't see an atomic block in the UART.cpp you linked, what did you mean with "as above" exactly?). Consider:

  1. First clearing TXC and then writing UDR. It seems you're using this approach, but I think this leaves a race condition. Assume that a byte is currently being transmitted, but UDR is empty. Now, if you first clear TXC, but the byte finishes transmission directly after clearing TXC, but before UDR is written, TXC will be set again, even though transmission is not actually complete. I don't believe disabling interrupts will help to fix this, since TXC isn't set by an ISR, but by hardware which will happen always anyway.
  2. First writing UDR and then clearing TXC. This does not have the issue above, but now there is a chance that the byte written to UDR completes (setting TXC) before TXC is cleared, resulting in TXC being clear while transmission is complete. Since TXC is being set at least one byte time after writing UDR, you can assume that TXC is always cleared in time. However, when using high UART speeds, combined with an interrupt triggering between the UDR write and TXC clear (delaying the TXC clear) can trigger this. Fixing this is easy - just disable interrupts between UDR write and TXC clear. I believe this is a change needed to the HardwareSerial.cpp code, though it is unrelated to this particular issue (it could cause flush() to hang indefinitely, no transmit bytes twice).

So, first UDR then TXC, with interrupts disabled, seems to be the only solution that is fully reliable.

@NicoHood
Copy link
Contributor

I was referring to the 2nd part of this comment (when called write() with interrupts disabled):
https://github.com/arduino/Arduino/issues/3745#issuecomment-141546878

No I did not test your fix yet. in a few hours I will try.

I also think that the order itself is correct (from what I read) (2nd option inside an atomic block)

I suggest to put every pointer that is accessed via interrupt into an atomic block, since the pointer loading and writing is not a single instruction (at least for cbi and sbi, setting (=) would be okay I think.)

@matthijskooijman
Copy link
Collaborator

I was referring to the 2nd part of this comment (when called write() with interrupts disabled):

Ok, so you are saying that it might be needed to disable interrupts during bool flag = bit_is_set(*_ucsra, UDRE0);? If so, then no, that's not needed since:

  • This just loads a single byte, which is atomic already (it doesn't do read-modify-write which is not atomic).
  • This code is only run when interrupts are disabled, so disabling interrupts again won't make a difference.

No I did not test your fix yet. in a few hours I will try.

Awesome, thanks.

I suggest to put every pointer that is accessed via interrupt into an atomic block, since the pointer loading and writing is not a single instruction (at least for cbi and sbi, setting (=) would be okay I think.)

I think that all access are already atomic when needed (assuming a 1-byte head/tail index). Atomic blocks are only needed for read-modify-write cycles of variables that are also written by an ISR, just reading the variable from an ISR won't cause any problems.

The problem at hand is caused by a hardware register behaving somewhat unexpectedly, so it's not a normal atomicity problem caused by a read-modify-write being interrupted.

@NicoHood
Copy link
Contributor

My fault, you are right. How obvious.

Well will test this in a few hours :)

@matthijskooijman
Copy link
Collaborator

Does work. Not said that the Serial communication works properly, my led just blinks and flush will not hang. The extra line IS compiled, it adds 2 extra bytes.

Huh? You're testing if flush() hangs? I thought you were testing for a double transmission of a byte? Flush hanging could occur without an atomic block around the UDR write and TXC clear as described before, I didn't think that was the issue we were trying to fix... To fix that, just put an atomic block around the UDR write and TXC clear.

@NicoHood
Copy link
Contributor

Yes I was trying to fix the flush, because both issues are related. And if the flush doesnt work, this issue is still there. Because the flush only hangs if a byte was sent twice.

Why should an atomic block fix the issue? If the interrupt occurs after setting udr it adds the flag the the a register. the isr ends and the code does the same again. why should this fix anything?

I mean it does fix my flush testcase, but I still dont see the why it should. I know that my testcase is not good and not realted to the issue. but if this fails, the issue also fails. I could test things for sure, but you never know if its just a race condition that luckly happens/doesnt happen. Idk i have no idea. I also have not enough knowledge, please someone else should test this and not rely on me for this.

@matthijskooijman
Copy link
Collaborator

I think there are two issues here:

  1. sbi(*_ucsra, TXC0); might write a "1" to the UDRE0 bit, when the bit is 1 when sbi loads the register value. I suspect (though we have not yet verified this), that this causes the UDRE0 bit to be set if it was not set when the write happens (because the ISR runs in between the load and write of sbi and clears the flag), causing the ISR to run a second time. This issue can be fixed by making sbi atomic (the UDR write does need to be atomic here), so that the UDRE0 flag cannot be cleared between the read and write done by sbi(). An alternative (and, IMHO better) is to never write a 1 to UDRE0 and other interrupt flags, as the datasheet suggests.
  2. When a byte is written to UDR, and transmission completes (setting TXC0) before sbi(*_ucsra, TXC0); clears it, the end result will be that transmission is complete, but TXC0 is cleared. If flush() is now called, it will wait forever for TXC0 to be set. This can really only occur when an interrupt happens between the UDR write and TXC0 clear, since otherwise transmitting a full byte will take long enough to guarantee this race condition never triggers. So issue can be fixed by putting the UDR write and TXC0 clear in a single atomic block, so that the TXC0 clear always happens soon enough after the UDR write.

Since both of these issue can be fixed using an atomic block, they might seem like the same issue, but really are two issues. Also, I don't think the atomic block is the right fix for issue 1, since it still violates the datasheet recommendations. So I would propose to apply both the atomic block and the changes to the *_ucsra assignment I proposed. It might be useful to verify if the latter without the former actually fixes the double transmission bug (even though it will not fix the flush hang).

Does this clarify?

@NicoHood
Copy link
Contributor

  1. Why should it be a problem to write to a bit thats already set? You are not clearing this bit in the ISR, you are setting it. And then you are setting it twice. If the really causes the problem and an atomic block could solve this, an atomic block with twice this instruction should break it again.
    https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L93
  2. Well that could make sense. At least adding this block wont hurt. Its worth a try.

Do you think this issue is fixed then but the flush issue not?

@matthijskooijman
Copy link
Collaborator

Why should it be a problem to write to a bit thats already set? You are not clearing this bit in the ISR, you are setting it. And then you are setting it twice. If the really causes the problem and an atomic block could solve this, an atomic block with twice this instruction should break it again. https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L93

Woops, I meant the UDRE0 flag in point 1, which is the flat for the TX ISR. For clarity, I've updated my previous comment, please read point 1 in there again to see if this is more clear.

Do you think this issue is fixed then but the flush issue not?

Not sure what you mean here. I've suggested fixes for both issues?

@mikaelpatel
Copy link

@matthijskooijman @NicoHood
Did you try this?

  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    *_ucsra |= (1 << TXC0);    
    *_udr = c;
    return 1;
   }

@matthijskooijman
Copy link
Collaborator

@mikaelpatel I believe that approach suffers from problem 1 described in https://github.com/arduino/Arduino/issues/3745#issuecomment-142244433 and problem 1 described in https://github.com/arduino/Arduino/issues/3745#issuecomment-142365575j. I haven't tested any of this though, so far I've just been theorizing based on @NicoHood's testing. Could you check these comments to see if you think my reasoning is sound?

@mikaelpatel
Copy link

@matthijskooijman
I went back to the drawing-board with this issue and forced interrupts and delays around the fast track. The code I ended up with is:

int
UART::putchar(char c)
{
  ...
  // Fast track when transmitter is idle; put directly into the pipeline
  if (((*UCSRnB() & _BV(UDRIE0)) == 0) && ((*UCSRnA() & _BV(UDRE0)) != 0)) {
    synchronized {
      *UDRn() = c;
      *UCSRnA() |= _BV(TXC0);
    }
    ...
    return (c & 0xff);
  }

  // Check if the buffer is full
  while (m_obuf->putchar(c) == IOStream::EOF)
    yield();

  // Enable the transmitter
  *UCSRnB() |= _BV(UDRIE0);
  return (c & 0xff);
}

Putting the clear of the TXC flag before writing data to UDR could give a flush error, i.e. miss a TXC flag and continue. Putting the clear after could give another type of flush error, i.e., lock up. The clear must be atomic together with the write of UDR.

I had an interrupt handler for TX completion to support RS485 (bus disconnect). This interrupt handler was enabled by the UDR empty interrupt handler when the output buffer was empty. This obviously did not work anymore with the fast track and higher baud-rates. After deciding to refactor this code and moving it to the RS485 driver the Cosa UART::flush() code became much simpler.

I have run the Cosa UART benchmark with different background interrupt loads and baudrates and it is stable with the correct end-of-transmission synchronization (i.e. flush). My focus has now moved to optimizing itoa/utoa (integer to string conversion). The standard AVR library implementation is not so fast and has become a bottle-neck. I found a bug-report and implementation that was useful. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=627899

Cheers!

@NicoHood
Copy link
Contributor

I will test this today.

Dont we also need an atomic block around this?
https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L243-L245
What if after line 243 the bit was flushed out (because the ISR was already enabled) and then the ISR is enabled again. Wouldnt this cause the whole buffer to be flushed out?

@matthijskooijman
Copy link
Collaborator

Putting the clear of the TXC flag before writing data to UDR could give a flush error, i.e. miss a TXC flag and continue. Putting the clear after could give another type of flush error, i.e., lock up. The clear must be atomic together with the write of UDR.

Yes, that is what I was trying to convey before :-)

I still believe this statement is wrong:

      *UCSRnA() |= _BV(TXC0);

See point 1 in this comment. This might write 1 to various error / interrupt flags, which are supposed to be written as 0 always according to the datasheet. However, I also suspect that because this happens atomically, this will not actually cause problems, since the bits will always be written 1 when they are already 1 (without the atomic block, they could be cleared between reading and writing the register).

I have tried to verify if writing a 1 to the UDRE0 bit when it is 0 actually causes it to become 1, but this didn't appear to be the case. So perhaps this only happens sometimes, or something more intricate is causing this double transmission?

@mikaelpatel
Copy link

@NicoHood
I believe that the HardwareSerial code could fail and resend a buffer if the interrupt handler is already enabled and the buffer buffered is emptied. https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L243-L245. This would cause the interrupt to be enabled when the buffer is empty.

The Cosa UART interrupt handler always checks for empty buffer. https://github.com/mikaelpatel/Cosa/blob/master/cores/cosa/Cosa/IOStream/Driver/UART.cpp#L204

I believe that there must be either an atomic block in write() as you suggest or a check for empty buffer first in https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L81.

Cheers!

@NicoHood
Copy link
Contributor

Wouldnt it be also an idea to add the check at the beginning of the ISR as you suggested?
https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L95-L98

If we check this first and then return (and disable the ISR flag) that would work around the problem for sure and solve it maybe. It might not be the best solution but a clear fix?

@matthijskooijman
Copy link
Collaborator

I believe that the HardwareSerial code could fail and resend a buffer if the interrupt handler is already enabled and the buffer buffered is emptied. https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L243-L245. This would cause the interrupt to be enabled when the buffer is empty.

Ah, so then the new bytes is written to the buffer just before the ISR triggers, which reads the byte from the buffer and disables itself (then being re-enabled by write())? However, this can only occur if the buffer is currently empty, in which case the ISR would already have disabled itself? The only way I can think of how this could happen is if there is one byte in the buffer already, and then the ISR triggers twice between adding a second byte to the buffer and enabling the interrupt flag (which is not entirely impossible, if another interrupt also triggers at the right moment and takes up >= 1 byte time, I think). Does this analysis look sound?

To fix this, I'd prefer an atomic block in write over a check in the ISR - best to keep a consistent state than to mop it up later?

@NicoHood
Copy link
Contributor

Ahm one thing I want to mention (has nothing to do with your previous comment):
If you take this example (which is fairly not perfect):
https://github.com/arduino/Arduino/issues/3745#issuecomment-141474605

You can see that it always sends 1 byte and then should flush. So in this case the ISR should never ever be enabled right? Because it uses the first shortcut and then flushes the data.

However the sketch crashes. This can only be cause if the ISR bit was set (which it should never ever to) or the TXC0 bit is cleared and stays cleared. Maybe it happens like this:
Byte is written:
https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L221
Flag in TXC0 is set because at baud 2M it transmits very fast.
Flag in TXC0 is cleared manually again and wont be set again because the byte is already transmitted.
Maybe thatswhy then flush() hangs:
https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/HardwareSerial.cpp#L201

This might not fix the sending the data twice problem.

@matthijskooijman
Copy link
Collaborator

@NicoHood, everything you say in your last comment seems correct to me. This is what I described under point 2 in https://github.com/arduino/Arduino/issues/3745#issuecomment-142365575 and indeed, it would explain a flush lockup, but not a double transmit.

@mikaelpatel
Copy link

@NicoHood @matthijskooijman
I have written an additional benchmark where approx. 1 Mbyte is written in a burst. https://github.com/mikaelpatel/Cosa/blob/master/examples/Benchmarks/CosaBenchmarkUARTburst/CosaBenchmarkUARTburst.ino. The benchmark 1) measures actual performance and 2) amount of processing needed. Below are some number (snippet from the source code):

 * @section Description
 * Benchmarking IOStream and UART functions; measure time to print
 * approx. 1 Mbyte (10.000 lines (97 characters) with line number).
 * Default format: 11 bits = 1 start bit, 8 data bits, and 2 stop bits.
 *
 * The print of the line number is an integer to string conversion
 * with 5 digits. Up to 500 Kbps the conversion is in parallel with
 * the print and full effective baudrate is achieved.
 *
 * The sketch will also capture idle time. At 115.200 bps the idle
 * time is almost 80%. Only 20% of the processing power is needed to
 * handle the transmission and number to string conversion. At 250
 * Kbps the idle time drops to 62% and at 500 Kbps it is down to
 * 12%. Higher baudrates require all processing power and there is no
 * idle time.
 *
 * @section Warning
 * At 2 Mbps the execution time is about 7 seconds, at 500 Kbps
 * about 22 seconds, at 115.200 bps over 95 seconds. Running at
 * 9.600 bps will take nearly 20 minutes!

I have run this successfully up to 2 Mbps using the Cosa build support (with LTO) and serial monitor (miniterm.py). No transfer errors. The effect of LTO shows up as increased idle time.

Please note that idle time is due to buffer full and in Cosa is a yield with sleep. Arduino is burning amps instead. Might be something to consider if you want to support low-power design?

Cheers!

@NicoHood
Copy link
Contributor

Sorry its hard for me to follow sometimes. May you please try out arduino/Arduino#3865 yourself? It fixes my testcase at least. And from theory all bugs should now be removed. If not we are definitely setting some wrong registers.

Maybe the _tx_buffer_head == _tx_buffer_tail comparison for 16bit bytes should be atomic too. But this is another issue.

@johnholman @mattskimi may you please try this out with your testcases as well?

@johnholman
Copy link
Contributor Author

serial_test.txt

I'm sorry to come back to this after such a long time - I've been using my workaround, which has proved reliable. However I finally upgraded to 1.8.4 and find that the problem is still there.

I can now confirm that NicoHood's pull request arduino/Arduino#3865 also seems to correct the error.

To demonstrate the original issue here I've attached a simplified test program which doesn't require prior changes to the library code. Targeted at the Mega, it uses timer interrupt 4 to generate a heavy ISR load.

When connected to serial monitor at 500K baud the output should look like this:

1 111111111111111111111111111111111111111111111111111111111111 14350
2 222222222222222222222222222222222222222222222222222222222222 30100
3 333333333333333333333333333333333333333333333333333333333333 45500
4 444444444444444444444444444444444444444444444444444444444444 61600
5 555555555555555555555555555555555555555555555555555555555555 77350
6 666666666666666666666666666666666666666666666666666666666666 93100
7 777777777777777777777777777777777777777777777777777777777777 108500
8 888888888888888888888888888888888888888888888888888888888888 124600
9 999999999999999999999999999999999999999999999999999999999999 140700
10 000000000000000000000000000000000000000000000000000000000000 157500
11 111111111111111111111111111111111111111111111111111111111111 173950

(the number at the end of each is the number of ISR calls so far)

This is sample output when the error is present, here appearing as repeated data at lines 7 and 12.

1 111111111111111111111111111111111111111111111111111111111111 7700
2 222222222222222222222222222222222222222222222222222222222222 16800
3 333333333333333333333333333333333333333333333333333333333333 26250
4 444444444444444444444444444444444444444444444444444444444444 35700
5 555555555555555555555555555555555555555555555555555555555555 45500
6 666666666666666666666666666666666666666666666666666666666666 54950
7 77777777777777555555555066666666666666666666666666666666666666666660
7777777777777777777777777777777777777777777777777777777 66500
8 888888888888888888888888888888888888888888888888888888888888 78750
9 999999999999999999999999999999999999999999999999999999999999 88200
10 000000000000000000000000000000000000000000000000000000000000 98000
11 111111111111111111111111111111111111111111111111111111111111 107800
12 2222222222222222222222222222222222111111111111111111111111111111111111 78022222222222222222222222222222222222222222222222222 119700
13 333333333333333333333333333333333333333333333333333333333333 133350

I do apologise for the delay on getting to this, but hope it receives attention as in less extreme situations the underlying problem may occur very rarely and remain undetected but still have potentially serious consequences.

@matthijskooijman
Copy link
Collaborator

Good to re-activate this issue, it still seems relevant to me (even though it completely dropped off my radar). Looking back at the issue and @NicoHood's PR, I think there's two symptoms here: Double transmits, and flush lockups. Looking at the code, we can identify three separate issues here:

  1. sbi(*_ucsra, TXC0); violates the datasheet recommendations by potentially writing ones to read-only bits. https://github.com/arduino/Arduino/issues/3745#issuecomment-141931449 describes this problem. It also suggestst that writing to the read-only UDRE0 might set it, causing the double-transmit problems, but I'm not convinced that this is indeed the case (since Remove regex library oro.jar Arduino#3 below gives a simpler cause for the double-transmit).
  2. Writing UDR0 and clearing TXC0 should be done atomically, otherwise flush could lock up. See (the last part of) https://github.com/arduino/Arduino/issues/3745#issuecomment-142244433 for a description of this problem. This was reported again as Hardwareserial: Disable interrupts while setting TXC bit Arduino#6685.
  3. Writing _tx_buffer_head and setting UDRIE0 should happen atomically. This issue was briefly discussed at https://github.com/arduino/Arduino/issues/3745#issuecomment-143145084. What happens with the current code, is that there is a single byte left in the buffer (so the UDRIE is set). tx_buffer_head is advanced, but before UDRIE0 is set, the ISR triggers twice. I believe this can happen if a second interrupt eats up sufficient time. Because UDRIE0 is set using sbi, which involves a read-modify-write cycle, there seems to be sufficient instructions to trigger the three ISRs needed for this (two times data empty, one other ISR to fill up some time).

@NicoHood's PR fixes 2 and 3 and I think fixes both symptoms.

Issue 1 is not fixed by the PR, but it would be good to also fix it (suggested fix here). The fix had some testing, and IIUC my suggested fix didn't actually fix either the double transmit, or the flush lockup, but looking at it now I think it shouldn't (it should just fix datasheet compatibility).

One interesting point is that clearing TXC0 using *_ucsra = (1 << TXC0); does fix the flush lockup. Looking back, I suspect this is because simply writing a 1 is shorter than a sbi read-modify-write cycle, so there is a smaller window for the race condition to trigger, making it seem fixed even though it is not.

I'll add a comment to the PR too, to see what can be improved there.

pyokagan referenced this issue in pyokagan/DanceDance Oct 2, 2017
A race condition (likely exacerbated by FreeRTOS which allows task
switching inside the uart1_write() routine) causes the arduino to
retransmit old data.

One reason could be the "TX empty" ISR is called even though the
transmit buffer is empty (i.e. _tx_buffer_head == _tx_buffer_tail). This
causes the _tx_buffer_tail to be incremented past _tx_buffer_head,
resulting in the arduino transmitting old data.

Alternatively, what might be happening is that the "TX empty" ISR might
be enabled BEFORE _tx_buffer_head is incremented, which results in
_tx_buffer_head being incremented past _tx_buffer_tail. (Ref: The
HardwareSerial::write() code, a memory barrier might need to be added in
between the `_tx_buffer_head = i` and `sbi(*_ucsrb, UDRIE0)`)

The raspi code (particularly ucomm_SampleAssembler) is written with the
assumption that old packets won't be received again. However, the above
problem results in behavior which breaks this assumption, and bad things
happen as a result.

Fix this by wrapping uart1_write() with a critical section, so that no
task switching or ISRs will be called when it is in progress. This
should hopefully fix the race condition occurring with
_tx_buffer_head/_tx_buffer_tail.

Ref: https://github.com/arduino/Arduino/issues/3745
@johnholman
Copy link
Contributor Author

PR arduino/Arduino#6855 fixes both issues discussed here and has been approved by @matthijskooijman. It is based on @NicoHood's PR arduino/Arduino#3865 (now closed) but incorporates suggestions for improvement made by @matthijskooijman.

@sandeepmistry sandeepmistry transferred this issue from arduino/Arduino Sep 16, 2019
@cmaglie
Copy link
Member

cmaglie commented Sep 18, 2019

Closing as fixed by arduino/Arduino#6855

@cmaglie cmaglie closed this as completed Sep 18, 2019
MCUdude added a commit to MCUdude/MCUdude_corefiles that referenced this issue Jan 21, 2023
Preserve values of configuration bits MPCMn and U2Xn.
Avoid setting other read-only bits for datasheet conformance.
See arduino/ArduinoCore-avr#120 for details
MCUdude added a commit to MCUdude/MightyCore that referenced this issue Jan 21, 2023
…dc14

55de1dc14 ISO c++ doesn't support binary contants however GNU c++ does
c3890dc86 Fix weak attributions on operator new and delete
22d364ecc Mute unused variable warnings
2ffc04d4f Update abi and new It's now up to date with the one ArduinoCore-avr uses
133dcfd3d Fix unused variable and guard warning
7827dd8b3 Improve how TXCn bit is cleared in USCRnA register Preserve values of configuration bits MPCMn and U2Xn. Avoid setting other read-only bits for datasheet conformance. See arduino/ArduinoCore-avr#120 for details
6103f33f3 Formatting
c8bef0bf5 Improvements to HardwareSerial.cpp
45437133d Add null pointer check to String destructor
811b51677 Adding parenthesis around 'bitvalue' allowing correct macro expansion when using with ternary operator such as bitWrite(value, bit, some_computed_value == 5 ? 1: 0);
e16f14c78 Add bitToggle macro
221900dc9 Removes unnecessary if branch(because length is checked in while statement below the if-clause).
13d1f7607 Improve wiring_shift function
64786bc16 Use ADC instead of combining ADCL and ADCH

git-subtree-dir: avr/cores/MCUdude_corefiles
git-subtree-split: 55de1dc14dc8c530388cb162564b6e1c95eb7a39
MCUdude added a commit to MCUdude/MiniCore that referenced this issue Jan 21, 2023
…dc14

55de1dc14 ISO c++ doesn't support binary contants however GNU c++ does
c3890dc86 Fix weak attributions on operator new and delete
22d364ecc Mute unused variable warnings
2ffc04d4f Update abi and new It's now up to date with the one ArduinoCore-avr uses
133dcfd3d Fix unused variable and guard warning
7827dd8b3 Improve how TXCn bit is cleared in USCRnA register Preserve values of configuration bits MPCMn and U2Xn. Avoid setting other read-only bits for datasheet conformance. See arduino/ArduinoCore-avr#120 for details
6103f33f3 Formatting
c8bef0bf5 Improvements to HardwareSerial.cpp
45437133d Add null pointer check to String destructor
811b51677 Adding parenthesis around 'bitvalue' allowing correct macro expansion when using with ternary operator such as bitWrite(value, bit, some_computed_value == 5 ? 1: 0);
e16f14c78 Add bitToggle macro
221900dc9 Removes unnecessary if branch(because length is checked in while statement below the if-clause).
13d1f7607 Improve wiring_shift function
64786bc16 Use ADC instead of combining ADCL and ADCH

git-subtree-dir: avr/cores/MCUdude_corefiles
git-subtree-split: 55de1dc14dc8c530388cb162564b6e1c95eb7a39
MCUdude added a commit to MCUdude/MegaCore that referenced this issue Jan 21, 2023
…dc14

55de1dc14 ISO c++ doesn't support binary contants however GNU c++ does
c3890dc86 Fix weak attributions on operator new and delete
22d364ecc Mute unused variable warnings
2ffc04d4f Update abi and new It's now up to date with the one ArduinoCore-avr uses
133dcfd3d Fix unused variable and guard warning
7827dd8b3 Improve how TXCn bit is cleared in USCRnA register Preserve values of configuration bits MPCMn and U2Xn. Avoid setting other read-only bits for datasheet conformance. See arduino/ArduinoCore-avr#120 for details
6103f33f3 Formatting
c8bef0bf5 Improvements to HardwareSerial.cpp
45437133d Add null pointer check to String destructor
811b51677 Adding parenthesis around 'bitvalue' allowing correct macro expansion when using with ternary operator such as bitWrite(value, bit, some_computed_value == 5 ? 1: 0);
e16f14c78 Add bitToggle macro
221900dc9 Removes unnecessary if branch(because length is checked in while statement below the if-clause).
13d1f7607 Improve wiring_shift function
64786bc16 Use ADC instead of combining ADCL and ADCH

git-subtree-dir: avr/cores/MCUdude_corefiles
git-subtree-split: 55de1dc14dc8c530388cb162564b6e1c95eb7a39
MCUdude added a commit to MCUdude/MajorCore that referenced this issue Jan 21, 2023
55de1dc1 ISO c++ doesn't support binary contants however GNU c++ does
c3890dc8 Fix weak attributions on operator new and delete
22d364ec Mute unused variable warnings
2ffc04d4 Update abi and new It's now up to date with the one ArduinoCore-avr uses
133dcfd3 Fix unused variable and guard warning
7827dd8b Improve how TXCn bit is cleared in USCRnA register Preserve values of configuration bits MPCMn and U2Xn. Avoid setting other read-only bits for datasheet conformance. See arduino/ArduinoCore-avr#120 for details
6103f33f Formatting
c8bef0bf Improvements to HardwareSerial.cpp
45437133 Add null pointer check to String destructor
811b5167 Adding parenthesis around 'bitvalue' allowing correct macro expansion when using with ternary operator such as bitWrite(value, bit, some_computed_value == 5 ? 1: 0);
e16f14c7 Add bitToggle macro
221900dc Removes unnecessary if branch(because length is checked in while statement below the if-clause).
13d1f760 Improve wiring_shift function
64786bc1 Use ADC instead of combining ADCL and ADCH

git-subtree-dir: avr/cores/MCUdude_corefiles
git-subtree-split: 55de1dc14dc8c530388cb162564b6e1c95eb7a39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants