Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buffer retransmission and flush hang problems in hardware serial support #6855

Closed
wants to merge 4 commits into from
Closed

Fix buffer retransmission and flush hang problems in hardware serial support #6855

wants to merge 4 commits into from

Conversation

johnholman
Copy link

@johnholman johnholman commented Oct 25, 2017

Fixes #3745 fixes #6821

Based on #3865 by @NicoHood and suggestions from @matthijskooijman

Fixes two separate issues both discussed in #3745: serial buffer re-transmissions under heavy interrupt load and flush hangs at high baud rates. Tested with testcase for the buffer transmission problem and @NicoHood's testcase for the flush hang problem as posted in #3745 with both default buffer size and large buffer size (512 bytes) on Arduino Mega.

Also incorporates suggestions from @matthijskooijman:

  1. use the ATOMIC_BLOCK macro for guards around critical sections
  2. create macro TX_BUFFER_ATOMIC for guards required only for large buffer
  3. improve how the TXC bit is cleared to preserve configuration bits and avoid
    setting read-only bits in the USCRnA register

@johnholman
Copy link
Author

johnholman commented Oct 25, 2017

Note that I've not included one of the changes in #3865 - making tx_buffer_head == _tx_buffer_tail in write() atomic. This is because

  1. I couldn't see this change discussed anywhere (may have missed it)
  2. Does not compile for large buffers as defines local variable in scope of the for loop body generated
    by the ATOMIC_BLOCK macro
  3. Does not seem necessary to fix either of the problems raised in #3745.

There may of course be other potential problems that this does address but given the slight performance penalty I think it may need further discussion before implementation.

Also I've only tested that the specific issues are fixed, and on the Arduino Mega platform only.

@matthijskooijman
Copy link
Collaborator

Note that I've not included one of the changes in #3865 - making tx_buffer_head == _tx_buffer_tail in write() atomic.

This change is needed when the TX buffer is > 256 bytes. In that case, the head and tail are 2 bytes, so there is potential for reading half of the value before the ISR updates it and half of the value after. In fact, all (read and write) access of tail (which is written by the ISR) should be atomic, and all write access of head and tail (which are read by the ISR) should be atomic (using TX_BUFFER_ATOMIC). It might be even clearer to make all accesses to the head and tail atomic.

This is a tricky problem to reproduce, since it will only happen when:

  • The ISR triggers exactly between the two read instructions
  • The ISR modifies both bytes of _tx_buffer_tail

I had a look around the code and it seems there's a lot of head and tail accesses other than the one in write() that would have to be made atomic. If you want to add a (separate) commit for that, that would be great, but if not that's fine too (it's only for a corner case that requires changes to HardwareSerial.h and is documented there). I don't think this should block merging this PR.

I have two more minor remarks:

  • The commit message of your first commit is a bit confusing, it talks about unconditional atomicity using ATOMIC_BLOCK but then does not apply that. The message should probably just focus on the introduction of the TX_BUFFER_ATOMIC macro and explain why that is useful?
  • The last commit should probably provide a bit more detail about the race condition it protects against (in the commit message, or probably better in a comment in the code). Something like:

If TXC is cleared before writing UDR, and the previous byte completes before writing to UDR, TXC will be set, but a byte is still being transmitted (causing flush() to return too soon). So, UDR must be written before clearing TXC. If this is not done atomically, there is a chance interrupts delay the TXC clear so the byte written to UDR is transmitted (setting TXC) before clearing TXC (especially at high baudrates), so then TXC will be cleared but no bytes are left to transmit (causing flush() to hang).

Other than that, this PR looks great to me!

@johnholman
Copy link
Author

I understood the possibility of the ISR changing the second byte of the buffer tail in between a read of the first and second byte, but wasn't sure why only this particular example was fixed since as you say there are several others. Anyway, to avoid making too many changes in the same pull request and causing possible delay I think I'll create a new PR for making buffer pointer accesses atomic where needed for large buffers.

Now to figure out how to change commit messages ..

@matthijskooijman
Copy link
Collaborator

Agreed! As for changing commit messages, check out interactive rebase (e.g. git rebase -i origin/master) and then use the "reword" action (or "edit" combined with git commit --amend to edit the contents of a previous commit).

New macro TX_BUFFER_ATOMIC makes the following code block atomic
only if the transmit buffer is larger than 256 bytes. SREG is restored
on completion.
The macro is then used to simplify code for availableForWrite()
Moving the head buffer pointer and setting interrupt flag is now
atomic in write(). Previously an intervening ISR could empty the
buffer before the second ISR is triggered causing retransmission.

Fixes: #3745 (original issue only)
Preserve values of configuration bits MPCMn and U2Xn.
Avoid setting other read-only bits for datasheet conformance.

See #3745
@johnholman
Copy link
Author

Changes requested by @matthijskooijman made locally by interactive rebase and forced pushed to the remote - hopefully that's an appropriate workflow?

Make write to UDR and clearing of TXC bit in flush() atomic
to avoid race condition.

Fixes #3745 (second different issue introduced later but discussed
in the same issue)
Copy link
Collaborator

@matthijskooijman matthijskooijman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's perfect. This PR looks good to merge to me. @facchinm, perhaps you have some more hardware to test this on?

@facchinm facchinm self-assigned this Oct 30, 2017
@facchinm
Copy link
Member

@matthijskooijman looks good to me too; I'll test it later tomorrow with a handful of boards. It would probably be a good idea to apply it to https://github.com/arduino/arduinocore-avr so we can start using it proficiently

@johnholman
Copy link
Author

@facchinm how did the testing go? Anything I might need to do?

@facchinm
Copy link
Member

No problem on my side, all tests were just fine 😉
I'm trying to understand how we can handle the transition to https://github.com/arduino/ArduinoCore-avr and if it's better to merge it here (for the fix to appear in nightly) or there (so 1.6.21 will be based on it)
@matthijskooijman any idea?

@matthijskooijman
Copy link
Collaborator

@facchinm, I have no clue about the status of that external repository, so can't really comment on that (other than it appears that we have two places for the Arduino AVR core now, which seems like it could cause confusion. If we're going to migrate to there, we should probably remove the avr core here? Merging PR's can then probably be done using git format-patch, a bit of sed and git am, or perhaps git even has something easier to translate paths?)

@facchinm facchinm added the Architecture: AVR Applies only to the AVR microcontrollers (Uno, etc.) label Nov 13, 2017
@johnholman
Copy link
Author

Hi @facchinm I see this was merged into ArduinoCore-avr a while ago. It isn't appearing in the current beta download yet though (Arduino-PR-beta1.9-BUILD27) even though the AVR core has been removed from the beta branch. Is that how it should be?

@facchinm
Copy link
Member

The beta channel currently tracks the latest official release of AVR core (1.6.20). We are working on the infrastructure but, at the moment, we are severely outnumbered compared to the number of projects we are involved in, so some efforts are slowing down "a bit". Sorry for that.

@facchinm
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture: AVR Applies only to the AVR microcontrollers (Uno, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HardwareSerial library sometimes sends the same data twice Serial.flush() hangs when using 2Mbps
3 participants