Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flow of ACKs to ensure we do not hang on final buf write #155

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shaunrs
Copy link

@shaunrs shaunrs commented Apr 22, 2021

Using Windows 10 and a pyboard, often times a file transfer hangs and the only solution is to unplug the board and kill rshell. I observe this behaviour with 9 out of 10 larger file transfers (4KiB). Smaller transfers do not seem affected.

I'm not familiar enough with the specific mechanism of file transfer to describe the root cause here, but this behaviour appears to happen when send_file_to_remote has returned (no more data to send), but recv_file_from_remote is still executing, causing Device.remote("recv_file_from_host", "send_file_to_remote" .. ) to hang indefinitely. Presumably recv is waiting for more data that is never coming?

It seems the current code would never send/receive the final packet's ack, as they are processed at the start of the while loop. It is therefore possible for a last-packet timeout to go unnoticed by rshell . By ensuring acks are only sent after the remote has received and written the packet, and subsequently verified after local has sent a full packet to the remote we can make this more robust. This resolves my file transfer issues.

Copy link
Collaborator

@davehylands davehylands left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having an initial ACK is also important. It basically tells the host that the pyboard is running the code, and has opened the file and is ready to receive data.

Usually the reason for one side to hang is because it's waiting for a block of data and one or more bytes get lost. The ACK isn't really a data integrity ACK but rather a flow control ACK. This way the host only sends data to the board when the board is waiting for it. Writing to flash or sdcard can cause interrupts to be disabled, and characters arriving during that time interval can get lost.

@shaunrs
Copy link
Author

shaunrs commented Apr 24, 2021

Good point, that makes sense. I'd like to keep the last-packet ACK in there, whilst it isn't a data integrity ACK there is value in being able to detect last-packet timeouts which otherwise cause rshell to hang indefinitely.

I'm happy to implement this, and get feedback. Which method would you prefer:

  1. Initial ACK to start transmission - as it was before my PR, but including last-packet ACK
  2. Initial XON (DC1) to start transmission - feels a little contrary to ACK for flow control, but is explicit. The sender is 100% sure that receiver is in the correct state, at the start of a fresh recv:while loop. However, I don't see a case where you could end up in a bad state without this, so it might be overkill.

Personally: I'd go for 2 purely because it is much more explicit. It accounts for cases where the code changes in future and/or logic is added to this feature that may introduce other edge cases. But will implement this how you see best fits the project :)

@davehylands
Copy link
Collaborator

The actual character used doesn't really matter. I'd avoid using Control-C, but DC1 (0x11) should be fine.

@shaunrs
Copy link
Author

shaunrs commented Aug 9, 2021

The actual character used doesn't really matter. I'd avoid using Control-C, but DC1 (0x11) should be fine.

Sorry for the long delay, been very busy on other things. This is pushed using DC1.

@shaunrs
Copy link
Author

shaunrs commented Jul 19, 2022

@davehylands I understand you're likely super busy, but I'm keen to get this merged if possible? I'm currently using a local fork in all my projects :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants