Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue with the BLE MTU #39

Merged
merged 2 commits into from
May 11, 2023
Merged

fix issue with the BLE MTU #39

merged 2 commits into from
May 11, 2023

Conversation

scytacki
Copy link
Contributor

@scytacki scytacki commented Apr 23, 2023

Previously the MTU (maximum size of BLE packet) was being ignored.
By default the BLE MTU is 23. Each BLE packet has 3 header bytes.
So unless the client says it can handle a larger MTU the server (WiCAN)
can only send 20 bytes at a time.

On Android, apps have to specify what size MTU they want to use, or
it stays the default. On iOS, apps can't choose. iOS always negotiates
a size between 185 and 247.

This code now pays attention to the MTU from the client.

The old code might pack multiple messages into one BLE packet,
but it would never split a message across two or more BLE packets.
This code now packs as many bytes into each BLE packet as it can.
If the message doesn't fit in one packet, it will be split across
multiple packets.

Notes I left in the code:

  • esp_ble_gatt_set_local_mtu is called, but I don't think it actually makes any difference
  • the size of the notify characteristic is smaller than the max ble_send_buf size.

I have tested this code using "Car Scanner ELM OBD2 on Android". It never sets the MTU size, so I haven't been able to test that part of the code. I modeled the ESP_GATTS_MTU_EVT off of what ble_spp_server_demo.c does, so I think it will work. However, it would definitely be worth testing this with other Android apps. And also some iOS apps. As I wrote above, iOS should always be sending a MTU. There is a logging statement on this event, so you could turn on logging to see if the app/OS you are testing is actually sending it.

I tried to follow the indentation (tabs/spaces) of the neighboring lines of code. I also tried to follow the formatting of braces, but I might have made mistakes on those things.

Without this code, the logs from carscanner get jumbled up. I eventually figured out it was because only 20 bytes of each message were being received by carscanner. This led me to learn about the BLE MTU stuff. With this code the carscanner logs look better, and it also succeeds in parsing the responses and sends many more messages.

Previously the MTU (maximum size of BLE packet) was being ignored.
By default the BLE MTU is 23. Each BLE packet has 3 header bytes.
So unless the client says it can handle a larger MTU the server (WiCAN)
can only send 20 bytes at a time.

On Android apps have to specify what size MTU they want to use, or
it stays the default. On iOS, apps can't choose. iOS always negotiates
a size between 185 and 247.

This code now pays attention to the MTU fro the client.

The old code might pack multiple messages into one BLE packet,
but it would never split a message across two or more BLE packets.
This code now packs as many bytes into each BLE packet as it  can.
If the message doesn't fit in one packet, it will be split across
multiple packets.
@scytacki scytacki marked this pull request as ready for review April 23, 2023 20:41
@scytacki scytacki marked this pull request as draft April 24, 2023 03:09
@scytacki
Copy link
Contributor Author

As I'm reviewing the logs more closely from carscanner I see the responses are getting out of sync with the commands. This wasn't happening when I had a 20ms delay after each ble_send. I'm going to test that theory and see what happens with different amounts of delay.

@meatpiHQ
Copy link
Owner

@scytacki Make sure to keep the UART log output disabled, this also might affect sync.

esp_log_level_set("*", ESP_LOG_NONE);

@scytacki
Copy link
Contributor Author

I ran into a reboot loop, so I haven't been able to resolve the out of sync yet. Looking at the UART log output is really useful so I usually keep it enabled, but I'll remember to test with it on and off.

My current theory is that it is an incompatibility with Carscanner that happens semi-randomly. The ELM327 protocol isn't clear what should happen if two commands are sent before the response to the first command goes out. Carscanner sends an ATZ and ATE0 in a single BLE packet. Currently WiCAN send a response to both commands. My hunch is that Carscanner expects just the version string response from the ATZ and not the OK response from the ATE0.

So instead of sending ELM327 v1.3a..>OK..> (which happens to be 20 bytes).
WiCAN should just send ELM327 v1.3a..>
I'm replacing \r\n with dots in the strings above.

Anyhow hopefully my reboot loop is fixed and I can get back to wrapping this up.

I did find the Python ble-serial utility which lets me set the MTU. So I'm hoping to use that to test out various MTU sizes and make sure the code handles them correctly.

@meatpiHQ
Copy link
Owner

@scytacki As mentioned here the bootloop is caused by USB power.

Carscanner sends an ATZ and ATE0 in a single BLE packet. Currently WiCAN send a response to both commands. My hunch is that Carscanner expects just the version string response from the ATZ and not the OK response from the ATE0.

Carscanner might not be looking at the response, probably just looking for '>'

@scytacki
Copy link
Contributor Author

scytacki commented Apr 28, 2023

The reboot loop is happening even when plugged into the car. The error that causes the reboot is:

W (1692) sleep_mode_init: sleep_volt: 13.10
W (1697) adc_task: 3600000000

E (1701) gdma: gdma_disconnect(282): invalid argument
E (1706) gdma: gdma_del_channel(228): invalid argument
ESP_ERROR_CHECK failed: esp_err_t 0x101 (ESP_ERR_NO_MEM) at 0x4038ba0b
0x4038ba0b: _esp_error_check_failed at /Users/scott/esp/esp-idf/components/esp_system/esp_err.c:42

file: "./main/sleep_mode.c" line 219
func: continuous_adc_init
expression: adc_digi_initialize(&adc_dma_config)

Note the ESP_ERR_NO_MEM.

My theory is that this has to do with me using ESP-IDF version 4.4.4
I was using that version because 4.4 -> 4.4.3 have issues building on my Mac with python version 3.11.3.
The issue with that is documented here:
espressif/esp-idf#10116

The reason I think it is 4.4.4 causing problems is because the logs show: (I'm using v80x for my local builds).

I (355) cpu_start: Application information:
I (358) cpu_start: Project name:     wican-fw_v808_hv300
I (364) cpu_start: App version:      v808_hv300
I (369) cpu_start: Compile time:     Apr 25 2023 15:13:16
I (375) cpu_start: ELF file SHA256:  3525f0befcd8cb2d...
I (381) cpu_start: ESP-IDF:          v4.4.4-dirty
I (387) heap_init: Initializing. RAM available for dynamic allocation:
I (394) heap_init: At 3FCA5180 len 00037590 (221 KiB): DRAM
I (400) heap_init: At 3FCDC710 len 00002950 (10 KiB): STACK/DRAM
I (407) heap_init: At 50000028 len 00001FD8 (7 KiB): RTCRAM

Note the size of the STACK/DRAM

But when I look at the output from your build I see:

I (349) cpu_start: Application information:
I (351) cpu_start: Project name:     wican-fw_v169_hv300
I (357) cpu_start: App version:      v169_hv300
I (362) cpu_start: Compile time:     Apr 10 2023 17:53:14
I (368) cpu_start: ELF file SHA256:  b5d898cb99804fdc...
I (374) cpu_start: ESP-IDF:          v4.4
I (379) heap_init: Initializing. RAM available for dynamic allocation:
I (386) heap_init: At 3FCA1A00 len 0001E600 (121 KiB): DRAM
I (393) heap_init: At 3FCC0000 len 0001F060 (124 KiB): STACK/DRAM
I (399) heap_init: At 50000020 len 00001FE0 (7 KiB): RTCRAM

What doesn't make sense is that I was able to run my custom builds before and then they just stopped working. When the very first boot loop happened I did not save the log so I'm not sure if it was reporting the same out of memory error or perhaps a different one that was related to low USB power. So my theory as to why the boot loop is still happening even with fully power, is that I updated the boot partition when trying to fix the problem. I was using OTA to install my firmware before. So I was using whatever boot partition was shipped on the hardware. If I remember right it was reporting v1.68. After the problem happened I flashed all partitions with a local build of v1.69, that didn't fix it. After flashing all partitions using your build of v1.69, the problem goes away. Then when I flash just my local build of the app, the problem comes back. So perhaps the v1.69 boot partition causes this memory issues with my v4.4.4 app, and the v1.68 boot partition did not. I'm very doubtful of this theory though. Perhaps another clue is that after flashing your build of v1.69 I can't do an OTA update, after reboot it still reports version v1.69. I haven't looked at the logs while doing this to see if there is anything useful. I also haven't tried this using full power, so perhaps it will work when plugged into the car.

In any case I'm planning to get ESP-IDF 4.4 working either with docker or downgrading my python version. Then I'll do a local build of v1.69 and see if this memory error goes away. Or at least if the heap_init messages look the same as your build.

@meatpiHQ
Copy link
Owner

@scytacki Maybe check if there's any changes in file "sdkconfig"? Also try to do a full flash erase see if that helps.

But it's best if you can use version 4.4. At some point I want to move to version 5.0.

@scytacki
Copy link
Contributor Author

I did a build using the esp-idf v4.4 image. And this fixed the memory crash. Yay!

But now I've got a problem where the UART logging stops working right after esp_wifi_start() is called. UART logging used to work fine, and I need it to verify the MTU parts of this PR are working. The software seems to be working fine otherwise. It behaves the same with and without 12V power. I also did a full flash erase.

Besides this, the usb UART seems kind of flakey on my mac, it doesn't show up as a device sometimes. That could be an issue with the mac usb-serial driver though.
Also previously idf_monitor.py would reset the WiCAN when it connected, so I could see the initial boot log. Now I have to do tricks to capture the initial boot log. Either power cycle it or use the web interface to reboot it.

All of this makes me think there is some hardware issue, so I ordered a second one and an esp32-c3 mini-1 dev board to have something to compare with.

@meatpiHQ
Copy link
Owner

@scytacki just remove this line from main file.

esp_log_level_set("*", ESP_LOG_NONE);

@scytacki
Copy link
Contributor Author

I have that commented out.

The logs stop before then. If I comment out the wifi start, just before the ble start is the last thing logged. Perhaps it is related to phy_init which I think was the last thing logged in both cases.

@scytacki
Copy link
Contributor Author

I've learned more about esp32c3 usb setup. My understanding is that the WiCAN-ODB uses the built in USB on the esp32c3 instead of an additional usb to serial chip. Is that correct? it looks like maybe the WiCAN-USB has a usb to serial chip. Is that right? I believe that also means that on the WiCAN-ODB I could use JTAG over the usb for both the logs and to debug the code.

If I'm right about this, then I found this bug report: espressif/esp-idf#8046
From what I can tell v4.4 doesn't have the fix but v4.4.4 does, so that might be why the logs continued working after wifi or ble init before. I see some solutions for sticking with v4.4 and just updating the esp_phy submodule: espressif/esp-idf#7998 (comment) I haven't tried yet though.

Also I figure I could get a usb to serial dongle and connect it to the rx and tx holes on the WiCAN-ODB board, that might be a more reliable way to get logs.

@meatpiHQ now I'm wondering what you change when you build firmware for the WiCAN-ODB? I previously found the lines in CMakeLists.txt and figured out from the code that it is using the variables there. I thought that was the only change I needed to make. If the usb serial configuration is different for the two boards, then I think sdkconfig would also have to be different for both of them. Perhaps you use sdkconfig.old when building for the WiCAN-ODB?

@meatpiHQ
Copy link
Owner

meatpiHQ commented May 1, 2023

I've learned more about esp32c3 usb setup. My understanding is that the WiCAN-ODB uses the built in USB on the esp32c3 instead of an additional usb to serial chip. Is that correct? it looks like maybe the WiCAN-USB has a usb to serial chip. Is that right? I believe that also means that on the WiCAN-ODB I could use JTAG over the usb for both the logs and to debug the code.

Yes, everything you mentioned here is correct.

Also I figure I could get a usb to serial dongle and connect it to the rx and tx holes on the WiCAN-ODB board, that might be a more reliable way to get logs.

I usually use an external serial dongle for testing. If you like I can send you a WiCAN-USB if that helps with your development, and you won't need an external dongle.

now I'm wondering what you change when you build firmware for the WiCAN-ODB? I previously found the lines in CMakeLists.txt and figured out from the code that it is using the variables there. I thought that was the only change I needed to make. If the usb serial configuration is different for the two boards, then I think sdkconfig would also have to be different for both of them. Perhaps you use sdkconfig.old when building for the WiCAN-ODB?

When building for WiCAN-OBD, my CMakelists.txt looks like this:

cmake_minimum_required(VERSION 3.5)


set(PROJECT_PREF "wican-fw")
set(PROJECT_VER "v171_hv300")
#set(PROJECT_VER "v155_hv210")

#set(PROJECT_PREF "wican-fw-usb")
#set(PROJECT_VER "v171_hv150")

set(PROJECT_VERU ${PROJECT_VER})
set(PROJECT_BIN "${PROJECT_PREF}_${PROJECT_VERU}")
include($ENV{IDF_PATH}/tools/cmake/project.cmake)
project("${PROJECT_BIN}")

Actually you might be building for WiCAN-USB which is causing this issue or you're sdkconfig has changed somehow.

@scytacki
Copy link
Contributor Author

scytacki commented May 1, 2023

@meatpiHQ Thank you for the offer of the WiCAN-USB. I think I'm good, I like the challenge.

So the only thing you change when building for the WiCAN-ODB is CMakelists.txt?
You don't change sdkconfig?

I'm currently using the sdkconfig that is in the main branch. Originally when I was using ESP-IDF 4.4.4 with the VSCode IDE, one of the install steps made some changes to the sdkconfig, probably to upgrade any settings that had changed from 4.4 to 4.4.4.

@meatpiHQ
Copy link
Owner

meatpiHQ commented May 1, 2023

@scytacki

So the only thing you change when building for the WiCAN-ODB is CMakelists.txt?
You don't change sdkconfig?

Yes I only change CMakelists.txt, sdkconfig does not change.

The elm327 issue showed up because now the full response was being sent
Previously, the response was cut off at 20 bytes, so the trailing `>`
was not sent.
@scytacki
Copy link
Contributor Author

scytacki commented May 7, 2023

I got it working again and now I think this PR is ready to review/merge.

I used the sdkconfig-old file and with its differences the usb port did not break when the wifi/ble was initialized. sdkconfig-old has CONFIG_ESP_PHY_ENABLE_USB=y, and CONFIG_ESP_CONSOLE_USB_SERIAL_JTAG=y, and does not have CONFIG_CONSOLE_UART=y. All of those things seem like good settings when the usb is the built in one instead of a chip working with the UART.

I updated the comments in the second commit and also added a fix for the elm327 Carscanner out of sync issue that showed up after I implemented the MTU code. I believe the issue showed up because now the full response to Carscanner's initial 2 commands was being sent. This response is 22 bytes so previously the last two bytes were cut off. That meant the final > was not sent.

@scytacki scytacki marked this pull request as ready for review May 7, 2023 00:33
@meatpiHQ
Copy link
Owner

meatpiHQ commented May 9, 2023

Nice work! seems to work well, I still need to do some further testing with Realdash.

@meatpiHQ meatpiHQ merged commit 47a2974 into meatpiHQ:main May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants