Skip to content

tu_fifo_write may leave the fifo locked when full #1278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
deymo opened this issue Jan 14, 2022 · 16 comments · Fixed by #1286
Closed

tu_fifo_write may leave the fifo locked when full #1278

deymo opened this issue Jan 14, 2022 · 16 comments · Fixed by #1286

Comments

@deymo
Copy link

deymo commented Jan 14, 2022

The following line returns from the tu_fifo_write function when the fifo is full, leaving f->mutex_wr locked, I think we just need to release the mutex before returning false:

if ( _tu_fifo_full(f, w, f->rd_idx) && !f->overwritable ) return false;

@ghost
Copy link

ghost commented Jan 17, 2022

I reckon that could be the bug that was causing this in my testing on the Espressif ESP32S2:

espressif/arduino-esp32#5727 (comment)

Problem for me was, they were not interested in checking and simply closed the issue, and they don't compile from source but instead supply a pre-built library, meaning I was never going to find the problem.

If that's the issue though, thank you! You're worth more than Espressif are, that's for sure!

@deymo
Copy link
Author

deymo commented Jan 17, 2022

@Jasoroony The way I found this problem was that my FreeRTOS task was getting stuck in the lock function after writing a lot of data out while the host was not reading it from the USB serial port. To know whether it is related check with a debug probe and gdb where is your device getting stuck.

Anyway this bug is a really easy fix, just wanted some confirmation from the tinyusb devs that I'm not missing anything.

@ghost
Copy link

ghost commented Jan 18, 2022

I, unfortunately, don't have the tools needed to see where it gets stuck. I also can't easily test if the code fix works (though I'm sure it will - at least, it looks perfect to me). I had considered setting up the needed & completely different build environment for ESP32, but it didn't seem worthwhile once I realised "support" there is so fickle. So I stopped testing these devices over 2 months ago, even though I had mostly worked around the problem - but when you know a problem can still occur, there's no real way forward with any planned use-case that puts end-users at risk.

But my test-case (brought on by actual repeated failure in my real application) does keep my PC busy from reading USB data, seems to be how the Windows priority of write/read is with USB emulated COM ports, while force-writing a large (but never passed the known buffer size) amount of data while the device sends back even more data (but again within the specified limits).

I'm fairly certain, we are doing the same thing to show how the issue presents.

Again, thank you! I just need to wait for any change to trickle through to what I'm building with - no hurry though, these guy's devices can sit in their box for a few more months for all I care.

@hathach
Copy link
Owner

hathach commented Jan 19, 2022

thank you very much for spotting the issue. PR #1286 will fix this, if you have time, please try to test it out and let me know if that works for you.

@ghost
Copy link

ghost commented Jan 25, 2022

I'm sad to say, I don't think the ESP32-S2 usbser library uses the single byte write function that was fixed here. I had zapped the library to rename this function, thinking I could compile my own replacement when the linker complained of the missing import, but it never complained. Then I replaced the entire libarduino_tinyusb.a library with a very small .a library (a copy of another small one in the same directory, but renamed) and I got all the unresolved links I expected... Then I copied source files from here over until it all linked:

directory

And it uploaded and run fine... My test case (source code on the esp32 site) shows it still works perfectly if the PC only sends 64 bytes at a time, before reading 64x16 (1024) bytes back. When I go to send 128 bytes from the PC, it still locks up. This is what I was seeing with their library, and my buffers are set to be larger than 2048 bytes. The only questions is, is it an issue in the TinyUSB library, or is it an issue in their implementation of the CDC serial device (which I can see uses a lot of the CDC code here anyway)?

Maybe someone can test my example code that shows the problem, on their non-Adruino-ESP32 platform?

@hathach
Copy link
Owner

hathach commented Jan 25, 2022

have you tried to narrow down which line of code or function that causes the issue. It is probably another issue than this one.

@ghost
Copy link

ghost commented Jan 25, 2022

Not yet, but it's a public holiday in my country tomorrow... So I will take a look. The thing is though, this test case has been run on a number of micro-controllers, and the Raspberry Pi in gadget mode (it's a really simple test, for each byte the MC gets, it outputs it 16 times) and they all work for as long as the test runs. The ESP32-S2 quickly fails and no longer responds. My request above was to track down which source files (Espressif or yours) were the likely issue.

@hathach
Copy link
Owner

hathach commented Jan 25, 2022

there is no rush, you can look at this at any time. However, this bug is already fixed. If it does not fix your problem, then yours is caused by another one. If you could trace down the issue to the file belongs to this repo, feel free to open a new issue with detailed info on where & why it failed.

@ghost
Copy link

ghost commented Jan 25, 2022

Agreed, I'll enjoy my public holiday tomorrow instead. It's probably a hardware bug in the chip anyway, which would explain the prompt closure of my issue on the Espressif github site.

@ghost
Copy link

ghost commented Jan 26, 2022

The issue I'm having appears to be because there is traffic in both directions within a short time period. PC sends two 64 byte packets, ESP32-S2 starts sending back packets (a multiple of what it received) as soon as it sees the first PC packet. Within a few seconds of it running, there is some sort of clash with the 2nd incoming PC packet (in dcd_esp32sx.c maybe) and one ESP packet (Queue EP 84) does not get a queue entry for tud_task() to process saying it ever finished (USBD Xfer complete on EP 84). This leaves the endpoint both busy and claimed, there is no timeout check, there is no attempt to re-process, it just remains that way (stopping any more writes) until the chip is reset.

The Espressif write logic then ends up in an endless loop, waiting for space to become available, unless the PC disconnects from the COM port... But even then there appears to be no code to reset the "busy" & "claimed" flags - so... only reset works.

Sorry for posting in this thread, but it appears it could be specific to the ESP32-S2 - so I wanted to say don't bother testing with any other chip, and ignore my testing request above.

@ghost
Copy link

ghost commented Jan 26, 2022

From some logging, I can see, if in dcd_esp32sx.c - dcd_edpt_xfer() - it manages to set both an in packet request and out packet request, just a little bit before the PC sends data, then you only get an interrupt for the packet from the PC, the transmitted data complete interrupt doesn't fire. A lot of the time, you will get an interrupt as soon as a TUSB_DIR_OUT request is made (this is why it runs for a few seconds) but eventually the two will be set together, and it all stops working.

This is such an easy thing to test really, just have data going both ways that are unrelated. Most use-cases though are terminals, where a command is sent in full, and a response is generated - which is why this bug in the Espressif hardware hasn't been noticed.

These little buggers are going back in their box.

@hathach
Copy link
Owner

hathach commented Jan 26, 2022

as mentioned above, please open an new issue with detailed information on how to reproduce the issue with one of the stock example or your modified version. Testing with arduino-esp32 won't qualify as bug report since it involves other party code. If you are not sure whether it is a bug or not, your could open a discussion, people with the same issue could comment in.

@ghost
Copy link

ghost commented Jan 26, 2022

Thanks, but as I wrote, it's these devices and I'm not using them anymore.

@hathach
Copy link
Owner

hathach commented Jan 26, 2022

ah sorry, my bad, I haven't read your explanation carefully enough. I guess we are all good then :). If you find any bugs, feel free to file an issue for it.

@ghost
Copy link

ghost commented Jan 29, 2022

No, no actual software bugs, so you're golden, just bad hardware here. But it seems I can reliably detect the Tx FIFO being loaded with 64 bytes of data to send to the PC, but the "clashing" Rx FIFO interrupt reports (with some new scanning of all the registers) that the Tx FIFO has emptied in that next interrupt... And any attempt to reset or retry that transmit also doesn't work. I guess Adafruit's first ESP32-S2 attempt here....... doesn't really work...... But in most other cases, like simple terminal style or mass storage device, where the PC controls how data moves, works fine. It's only when things like in "linux" you might attempt to send a ctrl-c on a process emitting a lot of constant output that you want to stop - this "driver" clashes with the bad hardware.

Kind of poor, I'm glad I had other options.

@ghost
Copy link

ghost commented Jul 28, 2022

I kind of forgot about this bug-thread, but the Arduino-ESP32 people have agreed they see my issue in the library here using this example:

espressif/arduino-esp32#6221 (comment)
and
espressif/arduino-esp32#6221 (comment)

They also say the same issue happens in their IDF using the SuGlider's python test module posted in that thread:

espressif/arduino-esp32#6221 (comment)

I remembered this thread only because I started looking at the Raspberry Pi Zero W gadget mode for bare metal, and noticed these devices seem to have the same USB OTG device... It's quite possible this lock-up affects more than just the ESP32-S2 but I'm not currently anywhere near having the Raspberry code up and running, and I no longer use the ESP32-S2. I am hesitant in moving forward actually, I would hate to run into the same issue on the new device I'm trying to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants