Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC without DTR being set #506

Merged
merged 16 commits into from
Nov 23, 2020
Merged

CDC without DTR being set #506

merged 16 commits into from
Nov 23, 2020

Conversation

duempel
Copy link
Collaborator

@duempel duempel commented Sep 4, 2020

As previously discussed, I'd like to share ideas how we could solve the DTR bit problem within CDC class. Sorry if you have been waiting for this. I couldn't work on it before since I've been on vacation. 🙈

Describe the PR
This PR is one possibility to solve #482 . The most important change is, that tud_cdc_n_connected(itf) is not checked within tud_cdc_n_write_flush anymore. This will allow us to transmit data, even if the host does not support control line states. In case the DTR bit is not set, we do not know if a host actually drains data from tx fifo or not. To make only the newest data available to a new connected host (e.g. logs), the tx fifo will be modified to become overwritable.
Whenever a new connection with DTR bit is esatblished, we know that connection is valid from now and can optionally remove all old data.

Additional context
This is just a draft PR to get a view on what was discussed in #401 and #482 . Please everyone feel free to bring up new ideas or changes to get the best behaviour for all applications.

A main goal is to not change the behaviour of old applications. Since the current implementation works only with DTR being set, we have the chance to define a new behaviour for all other cases.

@hathach
Copy link
Owner

hathach commented Sep 8, 2020

hihi, sorry for late response, I am scratching my head with another race condition bug, will check this out as soon as I could.

Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again, great PR, I think this over most of our discussion on this topic before. Strangely the code is actually much less than the discussion itself, which is actually a really good things, since we write less but cover most scenario. I only has a few comments, most importantly to remove the edpt_abort() since that is lots of works and may not needed since the case is not that severe. Most user won't even notice that. I like to add code step by step, it takes lots more time to get thing mature but ensure that we don't spent much time implement feature that 90% user don't need.

src/class/cdc/cdc_device.c Outdated Show resolved Hide resolved
src/class/cdc/cdc_device.c Outdated Show resolved Hide resolved
src/class/cdc/cdc_device.c Outdated Show resolved Hide resolved
src/class/cdc/cdc_device.c Show resolved Hide resolved
examples/device/cdc_msc/src/main.c Outdated Show resolved Hide resolved
@duempel
Copy link
Collaborator Author

duempel commented Sep 28, 2020

@hathach thanks for your review. I kept this draft pending as it is right now, because I hoped someone else could take a look and comment on it. Since I am defining a very new functionality for the CDC, some more ideas might be useful.
This is also the reason why I made this PR a little bit bigger than it will eventually become. Just to show an idea of the code to what we discussed before.

I'm a bit busy at the moment, but I'll work on the changes you recommended and push the update we made because of the race condition bug as soon as possible.

@hathach
Copy link
Owner

hathach commented Sep 30, 2020

@hathach thanks for your review. I kept this draft pending as it is right now, because I hoped someone else could take a look and comment on it. Since I am defining a very new functionality for the CDC, some more ideas might be useful.
This is also the reason why I made this PR a little bit bigger than it will eventually become. Just to show an idea of the code to what we discussed before.

I'm a bit busy at the moment, but I'll work on the changes you recommended and push the update we made because of the race condition bug as soon as possible.

No rush, I am busy at the moment at well :( .

@dobairoland
Copy link
Contributor

Hi @duempel. Thank you for working on this. I suspect that this is what I'm looking for. However, currently there are conflicts with the master branch. I've tried to resolve the conflicts myself because I also need patches from the master branch. Probably I did something wrong because I'm getting the following error with the latest version of ESP-IDF:

  0 panic'ed (IntegerDivideByZero). Exception was unhandled.

Core  0 register dump:
0x4008b01b: dcd_edpt_xfer at /home/dragon/esp/esp-idf/components/tinyusb/tinyusb/src/portable/espressif/esp32s2/dcd_esp32s2.c:324
0x40023b25: _xt_user_exc at /home/dragon/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:627


Backtrace:0x4008b018:0x3ffc8020 0x4008a0d1:0x3ffc8040 0x40088285:0x3ffc8060 0x400882fd:0x3ffc8080 0x40087dfd:0x3ffc80a0 0x400288b5:0x3ffc8110
0x4008b018: dcd_edpt_xfer at /home/dragon/esp/esp-idf/components/tinyusb/tinyusb/src/portable/espressif/esp32s2/dcd_esp32s2.c:324

0x4008a0d1: usbd_edpt_xfer at /home/dragon/esp/esp-idf/components/tinyusb/tinyusb/src/device/usbd.c:1179 (discriminator 2)

0x40088285: tud_cdc_n_write_flush at /home/dragon/esp/esp-idf/components/tinyusb/tinyusb/src/class/cdc/cdc_device.c:194

0x400882fd: tud_cdc_n_write at /home/dragon/esp/esp-idf/components/tinyusb/tinyusb/src/class/cdc/cdc_device.c:171

0x40087dfd: tud_cdc_write at /home/dragon/esp/esp-idf/examples/peripherals/usb/bridge/build/../../../../../components/tinyusb/tinyusb/src/class/cdc/cdc_device.h:219
 (inlined by) usb_sender_task at /home/dragon/esp/esp-idf/examples/peripherals/usb/bridge/build/../main/serial.c:135

0x400288b5: vPortTaskWrapper at /home/dragon/esp/esp-idf/components/freertos/xtensa/port.c:170

Are you still working on this? Could you please resolve the conflicts in your branch?

Thank you very much!

@duempel
Copy link
Collaborator Author

duempel commented Nov 3, 2020

Hello @dobairoland , thank you for your interest in this feature. Since TinyUSB's master branch has received a lot of updates in the last months, there are currently conflicts with this PR.

I will try to merge the current master into my branch tomorrow to update the PR draft quickly. The further changes to create the final PR will be done in the following days.

If you want to test the changes, I would be very happy. Keeping you updated.

@duempel
Copy link
Collaborator Author

duempel commented Nov 4, 2020

Alright, I just merged all the changes into the feature branch. @dobairoland you can check if it's working for you now (I sadly don't have an esp32s2 to test here right now - but it's working on other platforms). In a couple of days I also want to finalize the PR with the changes that @hathach requested.

@dobairoland
Copy link
Contributor

@duempel Thank you for resolving the conflicts. Unfortunately, I'm getting the same error as before.

There is no device connected in the beginning when it fails on the first call of tud_cdc_write() for the reason of division by zero here.

I don't have this issue with 075334a.

I believe that it can work on other platforms because the exception occurs in the ESP32-S2-specific code.

I'm ready to help with any further testing and debugging but unfortunately, I don't know the internals of tinyusb well enough to fix it by myself.

@dobairoland
Copy link
Contributor

Ok, I've fixed my previous issue with a tud_cdc_available() check.

@dobairoland
Copy link
Contributor

I'm sorry for posting again. tud_cdc_available() never true for me so tud_cdc_connected() should be used instead. Then I can read with minicom what was sent and works fine for me.

Unfortunately, this doesn't solve my original problem I was hoping to solve: to be able to read and write while DTR is not set.

@duempel
Copy link
Collaborator Author

duempel commented Nov 5, 2020

@dobairoland thanks for your feedback. It's not easy to spot the cause of this issue now. As far as I understand you are using your own user application. Have you tried to run the cdc_msc_freertos example on your esp?

My first thoughts were: It seems that an endpoint transfer (dcd_edpt_xfer) is called before the endpoint was opened correctly. This could happen for example if you write and flush data before calling tusb_init (due to task priority or other reasons). This could also explain why it's working with use of tud_cdc_available() or tud_cdc_connected(). The use could delay your endpoint access calls until it's fully initialized.

I would recommend to 1st try the cdc_msc_freertos exmaple. If you already did this please turn on TinyUSB's logging function for debugging (CFG_TUSB_DEBUG = 2). We could see here if functions were called in wrong order.

Unfortunately, this doesn't solve my original problem I was hoping to solve: to be able to read and write while DTR is not set.

This is exactly what this PR is trying to solve. Let's hope to find the cause as soon as possible.

@duempel
Copy link
Collaborator Author

duempel commented Nov 5, 2020

@hathach based on the explained behaviour from my last comment I think it is useful to provide more stability and block application mistakes. In this PR I deleted the tud_cdc_n_connected() check within the tud_cdc_n_write_flush(). What do you think about adding a tud_ready() call or introducing a ready variable for every cdc interface to check before usbd_edpt_xfer calls?

@dobairoland
Copy link
Contributor

Thank you @duempel for the ideas.

I see in my logs that everything has been initialized.

USBD init
CDC init
MSC init
VENDOR init

It is not possible for my application to write before initialization because TASK A calls tusb_init and after that it starts TASK B which in the end starts TASK C which writes to CDC.

I will try the cdc_msc_freertos example to modify and report back on the results.

@dobairoland
Copy link
Contributor

dobairoland commented Nov 6, 2020

The issue can be reproduced with the cdc_msc_freertos example as well by not checking tud_cdc_available() before attempting to read:

I (336) cpu_start: Starting scheduler on PRO CPU.
Guru Meditation Error: Core  0 panic'ed (IntegerDivideByZero). Exception was unhandled.

Core  0 register dump:
PC      : 0x4008579a  PS      : 0x00060f30  A0      : 0x80082edc  A1      : 0x3ffc35e0  
A2      : 0x00000000  A3      : 0x00000040  A4      : 0x00000000  A5      : 0x00000040  
A6      : 0x00000000  A7      : 0x3ffc35e0  A8      : 0x8008574c  A9      : 0x3ffc35b0  
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x00000000  A13     : 0x00000000  
A14     : 0x00000000  A15     : 0x3ffc35b0  SAR     : 0x00000000  EXCCAUSE: 0x00000006  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x40023ded  

Backtrace:0x40085797:0x3ffc35e0 0x40082ed9:0x3ffc3620 0x40083681:0x3ffc3660 0x400837fa:0x3ffc36a0 0x4008b061:0x3ffc36e0 0x400294f1:0x3ffc3750


ELF file SHA256: Guru Meditation Error: Core  0 panic'ed (Unhandled debug exception). 
Debug exception reason: Stack canary watchpoint triggered (cdc) 
Core  0 register dump:
PC      : 0x40022f9a  PS      : 0x00060e36  A0      : 0x80027376  A1      : 0x3ffc3450  
A2      : 0x3ffc3480  A3      : 0x00000000  A4      : 0x00099cf0  A5      : 0x3007ee00  
A6      : 0x3ffbe328  A7      : 0x00000000  A8      : 0x3ffc347f  A9      : 0x00000031  
A10     : 0x00000008  A11     : 0x00000010  A12     : 0x00000009  A13     : 0x00000009  
A14     : 0x00060f23  A15     : 0x00000040  SAR     : 0x00000001  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x00000009  LEND    : 0x00000009  LCOUNT  : 0x40024e30  

Backtrace:0x40022f97:0x3ffc3450 0x40027373:0x3ffc3470 0x40027536:0x3ffc34e0 0x40027a89:0x3ffc3530 0x40023e2e:0x3ffc3550 0x40085797:0x3ffc35e0 0x40082ed9:0x3ffc3620 0x40083681:0x3ffc3660 0x400837fa:0x3ffc36a0 0x4008b061:0x3ffc36e0 0x400294f1:0x3ffc3750


ELF file SHA256: c9b3f39bfc328ec1

Rebooting...

@hathach
Copy link
Owner

hathach commented Nov 8, 2020

My first thoughts were: It seems that an endpoint transfer (dcd_edpt_xfer) is called before the endpoint was opened correctly. This could happen for example if you write and flush data before calling tusb_init (due to task priority or other reasons). This could also explain why it's working with use of tud_cdc_available() or tud_cdc_connected(). The use could delay your endpoint access calls until it's fully initialized.

All other API must be called after tusb_init(), application must ensure this. There is un-initialized state therefore undefined behavior otherwise. I don't think the stack need to safe-guard this scenario at all.

@hathach based on the explained behaviour from my last comment I think it is useful to provide more stability and block application mistakes. In this PR I deleted the tud_cdc_n_connected() check within the tud_cdc_n_write_flush(). What do you think about adding a tud_ready() call or introducing a ready variable for every cdc interface to check before usbd_edpt_xfer calls?

We could add necessary check but first we need to locate the root cause first, if the scenario is tusb_init() isn't called yet, then I don't think it is the stack responsibility at all

@dobairoland please try to post only necessary info in the log, it also help to point out where thing go wrong as well. People have their own preferred hardware for developing the stack, and may not be familiar with esp32s2 your do. When you do modify the example, it is better to post your modification line here in code block as well.

@duempel
Copy link
Collaborator Author

duempel commented Nov 9, 2020

@dobairoland sorry for my late reply. The initialization process you described with task A,B and C seems to be good. But even in this scenario task C should not directly write to a cdc interface. The init call only prepares memory for cdc to work. But it is not opening an endpoint. This is done by the usb host when setting a configuration. You can see when a configuration is set within the log:

USBD Setup Received 00 09 01 00 00 00 00 00 
  Set Configuration
  Open EP 81 with Size = 8
  Open EP 02 with Size = 64
  Open EP 82 with Size = 64
  Queue EP 02 with 64 bytes ... OK
  CDC opened
  Queue EP 80 with zlp Status
USBD Xfer Complete on EP 80 with 0 bytes

This is an example of just one cdc interface with data endpoint 2. Calling the read and write api before this event can result in unexpected behaviour. A more detailed log can be helpful here.

The issue can be reproduced with the cdc_msc_freertos example as well by not checking tud_cdc_available() before attempting to read:

Did you only remove the tud_cdc_available() check from the example and did not change anything else? This is an important information. I think there's a bug here if we want to allow read calls before checking the availability: Calling tud_cdc_n_read will always call _prep_out_transaction. If we do a read call before setting the configuration we will queue on EP0 here which can throw an TU_ASSERT at usbd_edpt_xfer if this endpoint is already busy. I will try to evaluate this. Also writing will lead into EP0 transfers. @hathach do you know if specification allows a cdc data endpoint to be configured as EP0? If not, blocking unopened data endpoints with checking if p_cdc->ep_in or p_cdc->ep_out equals zero is pretty efficient.

@dobairoland
Copy link
Contributor

Did you only remove the tud_cdc_available() check from the example and did not change anything else?

Yes, that was the only change:

diff --git a/examples/device/cdc_msc_freertos/src/main.c b/examples/device/cdc_msc_freertos/src/main.c
index 71225588..8cf013fe 100644
--- a/examples/device/cdc_msc_freertos/src/main.c
+++ b/examples/device/cdc_msc_freertos/src/main.c
@@ -169,7 +169,7 @@ void cdc_task(void* params)
   while ( 1 )
   {
     // is data available to read from rx fifo
-    if ( tud_cdc_available() )
+    if ( 1 )
     {
       uint8_t buf[64];

I'm used to working with tinyusb as a submodule of ESP-IDF so I have a little trouble enabling the log here. These doesn't seem to work for me:

make BOARD=esp32s2_saola_1 PORT=1 DEBUG=1 LOG=2 all
make BOARD=esp32s2_saola_1 CFG_TUSB_DEBUG=2 PORT=1 DEBUG=1 LOG=2 all

@hathach
Copy link
Owner

hathach commented Nov 10, 2020

make BOARD=esp32s2_saola_1 PORT=1 DEBUG=1 LOG=2 all

PORT=0 or just drop PORT=

@dobairoland
Copy link
Contributor

I'm sorry but I tried the following sequence of commands but still I see no tinyusb debug output in the log. The output remains the same as I posted earlier.

rm -r _build/
make BOARD=esp32s2_saola_1 DEBUG=1 LOG=2 all
make BOARD=esp32s2_saola_1 PORT=1 flash

@hathach
Copy link
Owner

hathach commented Nov 10, 2020

ah esp32s2 doesn't get LOG= option support, please try to modify the tusb_config.h directly to enable it https://github.com/hathach/tinyusb/blob/master/examples/device/cdc_msc_freertos/src/tusb_config.h#L72

#define CFG_TUSB_DEBUG 1

@dobairoland
Copy link
Contributor

Thank you for the help. So I see only this:

USBD init
CDC init
MSC init
Guru Meditation Error: Core  0 panic'ed (IntegerDivideByZero). Exception was unhandled.

There is nothing else from tinyusb in the log.

@hathach
Copy link
Owner

hathach commented Nov 12, 2020

There is nothing else from tinyusb in the log.

there is divide by zero exception, the cpu is hanged. You should try to see at where and when this happens.

- auto flush welcome message at connection event
- provide information to the user if the terminal did not set DTR
@duempel
Copy link
Collaborator Author

duempel commented Nov 18, 2020

@hathach I finally got some minutes to make the changes you suggested within your review. I kept this PR as draft, because I just add some changes to cdc example code, which I don't know if you like them. But we can make this to merge very soon. 😉

The updated cdc example uses tud_cdc_rx_cb callback to recognize if there is a connected terminal. If we receive character here we can be pretty sure that a connection is set up. Within the callback we then check the line state and can print information to the user, that his terminal is not setting DTR bit. This way we can keep the example just working as it was before but can tell the user why his example maybe does not work.
Also I add auto flush to the welcome message. I think it's more useful if we see this message just after opening our terminal and not after sending some character first.

If you like the changes I will add them to the freertos example as well.

@duempel
Copy link
Collaborator Author

duempel commented Nov 18, 2020

@dobairoland it's very difficult to help you debugging because I do not have the ESP here. But you could check if the error still occurs if you do the following changes:

  • Add TU_VERIFY( p_cdc->ep_out, ); at the beginning of _prep_out_transaction in cdc_device.c #L85

  • Add TU_VERIFY( p_cdc->ep_in, 0 ); at the beginng of tud_cdc_n_write_flush in cdc_device.c #L180

If you are only using one single interface you could change the TU_VERIFY check to exactly the endpoints you use with your descriptor to make it more precise. If you still get the same error then I do not have more ideas at the moment. But we can excluse the cause by referencing unopened entpoints.

@dobairoland
Copy link
Contributor

dobairoland commented Nov 18, 2020

I'm getting a different behavior with the new commits in this PR.

USBD Xfer Complete on EP 03 with 31 bytes
  MSC xfer callback
  SCSI Command: Test Unit Ready
  Queue EP 83 with 13 bytes ... OK
USBD Xfer Complete on EP 83 with 13 bytes
  MSC xfer callback
  SCSI Status: 0
  Queue EP 03 with 31 bytes ... OK
USBD Xfer Complete on EP 03 with 31 bytes
  MSC xfer callback
  SCSI Command: Test Unit Ready
  Queue EP 83 with 13 bytes ... OK
USBD Xfer Complete on EP 83 with 13 bytes
  MSC xfer callback
  SCSI Status: 0
  Queue EP 03 with 31 bytes ... OK

This continues until I connect to ttyACM0:
1.txt

And in loop it reboots. The exceptions are now different than before. I have the same changes in cdc_msc_freertos as before. Now it can also freeze my whole USB subsystem and I have to hard reset my computer. This haven't occurred before.

@dobairoland
Copy link
Contributor

dobairoland commented Nov 18, 2020

Here are the logs with the requested changes:
2.txt

@hathach
Copy link
Owner

hathach commented Nov 19, 2020

thanks @duempel sorry for being late on response, I am on the run with other issue. And don't worry too much about the ESP if you don't have the mcu to test with. Just focus on the actual MCU you could test with, I will do the test on esp32s2 and fix it if needed later on. This PR will be probably used very often by esp32s2 user. @dobairoland would you mind moving the log on a gist or something, the long log make pr discussion more difficult to follow.

@dobairoland
Copy link
Contributor

would you mind moving the log on a gist or something, the long log make pr discussion more difficult to follow.

I'm sorry. I moved them into files.

@@ -189,7 +189,7 @@ uint32_t tud_cdc_n_write_flush (uint8_t itf)
// Pull data from FIFO
uint16_t const count = tu_fifo_read_n(&p_cdc->tx_ff, p_cdc->epin_buf, sizeof(p_cdc->epin_buf));

if ( count && tud_cdc_n_connected(itf) )
if ( count )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, we have to add a check whether usb is enumerated to prevent executing edpt_xfer(). I will do it in a follow up PR

uint8_t const line_state = tud_cdc_get_line_state();

// Provide information that terminal did not set DTR bit
if( !(line_state & 0x01) )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary for example to send out the warning. I will do a follow up clean up

@hathach hathach marked this pull request as ready for review November 23, 2020 16:16
Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of testing does show it work rather well as currently. I didn't know any terminal that doesn't set DTR to test with with the overwritable, but the code look good enough. Since it has been going for a while, I will merge it now and do a follow up with tud_ready() check

Update: look like I could do a push to your branch which the PR is based on. Once the PR is done, I think we could finally merge this one. There is probably a bit of mixed behavior using terminal without DTR but we could fix it later on. The terminal with DTR should remain the same.

@duempel
Copy link
Collaborator Author

duempel commented Nov 23, 2020

@hathach thanks for merging your latest changes in here. If you want to test out the functionallity I could recommend HTerm to you. Usually I use this terminal on windows, but it's also available for linux. HTerm has the ability to manually control DTR and RTS bits with buttons.

Since it has been going for a while, I will merge it now and do a follow up with tud_ready() check

Alright, I feel good with this. Thanks for your help and I'm happy we finally get this to master.

@hathach
Copy link
Owner

hathach commented Nov 23, 2020

@dobairoland I couldn't reproduce your issue with my esp32s2 saola board but omitting the tud_cdc_available() in the example. If you still has the issue merged master, please open the issue and provide way to reproduce it including software terminal and os which you are using.

@hathach
Copy link
Owner

hathach commented Nov 23, 2020

@hathach thanks for merging your latest changes in here. If you want to test out the functionallity I could recommend HTerm to you. Usually I use this terminal on windows, but it's also available for linux. HTerm has the ability to manually control DTR and RTS bits with buttons.

Since it has been going for a while, I will merge it now and do a follow up with tud_ready() check

Alright, I feel good with this. Thanks for your help and I'm happy we finally get this to master.

Ah thank you, just downloaded and tested, so far so good.

@hathach
Copy link
Owner

hathach commented Nov 23, 2020

@HiFiPhile and @me-no-dev may be interested in this merged PR, let me know if there is an hiccups

@me-no-dev
Copy link
Collaborator

@HiFiPhile and @me-no-dev may be interested in this merged PR, let me know if there is an hiccups

Will try in a day or two and report back :) thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants