Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bluetooth: GATT lockup on split packets #25561

Closed
JordanYates opened this issue May 23, 2020 · 10 comments
Closed

bluetooth: GATT lockup on split packets #25561

JordanYates opened this issue May 23, 2020 · 10 comments
Assignees
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug

Comments

@JordanYates
Copy link
Collaborator

JordanYates commented May 23, 2020

Describe the bug
ATT data from a slave device that is split across multiple PHY packets appears to lockup something in the Zephyr master.

The issue occurs on the first packet that is split. In the current case, the Nth Handle Value Indication packet is sent by the Slave device, and the Zephyr master never responds with a value confirmation.
This issue is 100% repeatable on the same packet each time a connection is made (7th indication currently).

A similar issue appears when I attempt discovery before increasing the connection data length through bt_conn_le_data_len_update.
Two 128bit characteristics are present in a service, the Read By Type response is split across two packets. In this case the Bluetooth stack emits an ATT Timeout error.
Again, this issue is consistent on every connection.

In both cases, the connection is not dropped, neither side crashes.

The slave device is based upon the nRF5 S140 v6.1.1 running FreeRTOS.
Phone applications and other devices running the FreeRTOS stack can observe notifications indefinitely from the device, suggesting that the problem is not on that side.

In the zip file are two .pcapng files demonstrating both issues.
BlockingGattPcapng.zip

Expected behavior
Behavior should not be impacted by LL packets being split.
Discovery should complete, Indications should continue indefinitely.

Impact
What impact does this issue have on your progress (e.g., annoyance, showstopper)

Screenshots or console output

Console output for the discovery case:

*** Booting Zephyr OS build v2.3.0-rc1  ***
[00:00:00.020,782] <inf> fs_nvs: 8 Sectors of 4096 bytes
[00:00:00.020,782] <inf> fs_nvs: alloc wra: 1, be0
[00:00:00.020,812] <inf> fs_nvs: data wra: 1, 224
[00:00:00.031,829] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.031,860] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.031,860] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 2.3 Build 0
[00:00:00.032,226] <inf> bt_hci_core: Identity: c3:3b:7b:e8:c8:de (public)
[00:00:00.032,257] <inf> bt_hci_core: HCI: version 5.2 (0x0b) revision 0x0000, manufacturer 0x05f1
[00:00:00.032,257] <inf> bt_hci_core: LMP: version 5.2 (0x0b) subver 0xffff
[00:00:19.134,826] <inf> uc_bt_gatt: Connection created
[00:00:49.586,791] <err> bt_att: ATT Timeout

Environment (please complete the following information):

  • Zephyr 2.3rc1
CONFIG_BT=y
CONFIG_BT_BROADCASTER=y
CONFIG_BT_OBSERVER=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_DEBUG_LOG=y
CONFIG_BT_MAX_CONN=2
CONFIG_BT_HCI_ACL_FLOW_CONTROL=y
CONFIG_BT_L2CAP_RX_MTU=247
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_CTLR_TX_BUFFER_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_RX_BUF_LEN=258
CONFIG_BT_RX_STACK_SIZE=2048
CONFIG_BT_USER_DATA_LEN_UPDATE=y

This behavior was originally observed on Zephyr 2.2, and was not resolved after updating.

@JordanYates JordanYates added the bug The issue is a bug, or the PR is fixing a bug label May 23, 2020
@carlescufi
Copy link
Member

carlescufi commented May 23, 2020

@JordanYates can you update the description with the exact sample you are using as a central?

Actually:

CONFIG_BT_USER_DATA_LEN_UPDATE=y

This is a fairly new option. I assume then you are not using a standard sample. Can you share either the code of your app or a minimal example to reproduce this?

@JordanYates
Copy link
Collaborator Author

JordanYates commented May 24, 2020

CONFIG_BT_USER_DATA_LEN_UPDATE=y

Was only added so that I could increase the data length to avoid the discovery problem.
Isolating the problem further, it appears to be caused by

CONFIG_BT_L2CAP_RX_MTU=247
CONFIG_BT_L2CAP_TX_MTU=247

More specifically, the largest L2CAP packet sent by the slave during discovery is 44 bytes.
Discovery completes with no errors when EITHER of these configs are set to 43 or less, fails when BOTH are 44 or higher.

@JordanYates
Copy link
Collaborator Author

JordanYates commented May 24, 2020

I have reproduced this issue between two Zephyr devices.
Slave application has an RX and TX MTU of 240 and calls bt_gatt_exchange_mtu().
Master application has an RX and TX MTU of 45 in the failure case.
Largest L2CAP packet is again 44 bytes.
Discovery completes when the master MTU's are set to 44.

https://github.com/JordanYates/zephyr/tree/gatt_blocking/samples/bluetooth/blocking_discovery
https://github.com/JordanYates/zephyr/tree/gatt_blocking/samples/bluetooth/blocking_slave

@cvinayak
Copy link
Contributor

Please set CONFIG_BT_ACL_RX_COUNT=10 if you set CONFIG_BT_HCI_ACL_FLOW_CONTROL=y (and thinking aloud, hope that peer does not fragment 255 bytes in 1 byte PDUs, then we will need 255 buffers). This should allow for recombination.

But, why do you need CONFIG_BT_HCI_ACL_FLOW_CONTROL=y in a host+controller combined builds?

@JordanYates
Copy link
Collaborator Author

JordanYates commented May 26, 2020

CONFIG_BT_L2CAP_RX_MTU and CONFIG_BT_L2CAP_TX_MTU both depend on CONFIG_BT_HCI_ACL_FLOW_CONTROL.
CONFIG_BT_CTLR_RX_BUFFERS, which sets the default for CONFIG_BT_ACL_RX_COUNT, looks like what I need, I will give it a shot.
As an aside, CONFIG_BT_CTLR_RX_BUFFERS is not mentioned anywhere in the documentation, or used in any of the sample applications.
Maybe I've been barking up the wrong tree this whole time, but all I want to do is setup a GATT connection that can send 200 byte buffers back and forth in single packets.
Have I missed some crucial KConfig settings that make this easy?
Thanks for the input.

@JordanYates
Copy link
Collaborator Author

Presumably at some point the Bluetooth controller is trying to get a buffer and failing.
Is it feasible to add some form of error output when it fails?

@JordanYates
Copy link
Collaborator Author

Beautiful, problem is resolved by increasing CONFIG_BT_CTLR_RX_BUFFERS.

@cvinayak
Copy link
Contributor

CONFIG_BT_L2CAP_RX_MTU and CONFIG_BT_L2CAP_TX_MTU both depend on CONFIG_BT_HCI_ACL_FLOW_CONTROL.

You should not use CONFIG_BT_HCI_ACL_FLOW_CONTROL to enable setting CONFIG_BT_L2CAP_RX_MTU and CONFIG_BT_L2CAP_RX_MTU

CONFIG_BT_HCI_ACL_FLOW_CONTROL is intended for HCI controller or Host only builds.

Increasing CONFIG_BT_CTLR_RX_BUFFERS would unnecessarily double PDU RAM usage, 1 buffer in controller is sufficient to be recombined by use of CONFIG_BT_ACL_RX_COUNT buffers in host.

@JordanYates
Copy link
Collaborator Author

Alright, that makes sense. I based the original configuration on the NCS GATT throughput example, which uses Nordics proprietary controller.
I am a bit lost in determining exactly which CONFIG's need to be enabled to increase the MTU and get the larger packets.
As far as I can tell there are no Zephyr samples which do this, or at least none that call bt_gatt_exchange_mtu, which I believe is a required part of the setup process.
The GATT documentation https://docs.zephyrproject.org/latest/reference/bluetooth/gatt.html is also not particularly helpful with respect to MTU's.

Is there some other reference that I am missing?

@cvinayak
Copy link
Contributor

As this is no more a bug, I request you to close this issue and open a new enhancement request to have sample/documentation around use of larger packets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

No branches or pull requests

5 participants